1 programming the apache lifecycle geoffrey young [email protected]
TRANSCRIPT
http://www.modperlcookbook.org/ 2
Overview
• Apache and mod_perl 101
• mod_perl Handler Basics
• Using the Apache Framework
• Advanced mod_perl API Features
http://www.modperlcookbook.org/ 3
Apache's Pre-fork Model
• Apache parent process forks multiple child processes
httpd (parent)
httpd (child) httpd (child) httpd (child)httpd (child)
http://www.modperlcookbook.org/ 4
Roles and Responsibilities
• httpd parent processes no actual requests
• all requests are served by the child processes
• requests are handled by any available child process, not necessarily the one that handled the previous request
– remember, the HTTP is stateless by default
http://www.modperlcookbook.org/ 5
Nice, Responsible Children
• each httpd child processes one incoming request at a time
• only when the request is over is the child free to serve the next request
• over time httpd child processes are terminated and replaced with fresh children
http://www.modperlcookbook.org/ 6
Request Phases
• Apache breaks down request processing into separate, logical parts called phases
client request
URI-based init
content
MIME setting
fixups
URI translation
file-based init
resource control
logging
http://www.modperlcookbook.org/ 7
Request Phases
• Apache breaks down request processing into separate, logical parts called phases
• each request is stepped through the phases until...– all processing is complete
– somebody throws an "error"
• developers are given the chance to hook into each phase to add custom processing
http://www.modperlcookbook.org/ 8
So What?
• most Apache users don't worry about the request cycle too much...
• ...but they do use modules that plug into it
http://www.modperlcookbook.org/ 9
... for instanceclient request
URI-based init
URI translationmod_rewrite:RewriteRule /favicon.ico$ /images/favicon.ico
http://www.modperlcookbook.org/ 10
... for instanceclient request
URI-based init
URI translation
file-based init
resource controlmod_auth:AuthUserFile .htpasswd
http://www.modperlcookbook.org/ 11
... for instanceclient request
URI-based init
content
MIME setting
fixups
URI translation
file-based init
resource control
mod_cgi:SetHandler cgi-script
http://www.modperlcookbook.org/ 12
That's great, but...
• breaking down the request into distinct phases has many benefits– gives each processing point a role that
can be easily managed and programmed
– makes Apache more like an application framework rather than a content engine
• but you have to code in C
http://www.modperlcookbook.org/ 13
Enter mod_perl
• mod_perl offers an interface to each phase of the request cycle
• opens up the Apache API to Perl code
• allows you to program targeted parts of the request cycle using Perl
• we like Perl
http://www.modperlcookbook.org/ 14
Apache Request Cycleclient request
URI-based init
content
MIME setting
fixups
URI translation
file-based init
resource control
logging
http://www.modperlcookbook.org/ 15
mod_perl Interfaceclient request
URI-based init
content
MIME setting
fixups
URI translation
file-based init
resource control
logging
PerlPostReadRequestHandler
PerlTransHandler
PerlHeaderParserHandler
PerlAccessHandler PerlAuthenHandler PerlAuthzHandler
PerlTypeHandler
PerlFixupHandler
PerlHandler
PerlLogHandler
PerlCleanupHandler
http://www.modperlcookbook.org/ 16
What is mod_perl?
• mod_perl is the Perl interface to the Apache API
• a C extension module, just like mod_cgi or mod_rewrite
• creates a persistent perl environment embedded within Apache
http://www.modperlcookbook.org/ 17
What's the Big Deal?
• mod_perl allows you to interact with and directly alter server behavior
• gives you the ability to "program within Apache's framework instead of around it"
• allows you to intercept basic Apache functions and replace them with your own (sometimes devious) Perl substitutes
• let's you do it in Perl instead of C
http://www.modperlcookbook.org/ 18
What's a handler?
• the term handler refers to processing that occurs during any of the Apache runtime phases
• this includes the request-time phases as well as the other parts of the Apache runtime, such as restarts
• the use of "handler" for all processing hooks is mod_perl specific
http://www.modperlcookbook.org/ 19
Why use handlers?
• gives each processing point a role that can be easily managed and programmed– process request at the proper phase
– meaningful access to the Apache API
– break up processing into smaller parts
– modularize and encapsulate processing
• makes Apache more like an application framework rather than a content engine
• CPAN
http://www.modperlcookbook.org/ 20
Registry is just a handler
• Apache::Registry is merely an (incredibly clever and amazing) mod_perl handler
• its performance gains are made possible due to what mod_perl really is
• let's take a peek inside...
http://www.modperlcookbook.org/ 21
Apache::Registry
• Client side– http://localhost/perl-bin/bar.pl
• Server side– mod_perl intercepts content generation
– searches @INC for Apache/Registry.pm
– calls Apache::Registry::handler(Apache->request)
– inserts wizardry
– returns response to client
http://www.modperlcookbook.org/ 22
Wizardry, you say?• the wizardry is basically just putting the
CGI script into it's own packagepackage Apache::ROOT::perl_2dbin::foo_2epl;
sub handler { BEGIN { $^W = 1; }; $^W = 1; ... your script here...}1;
• because the perl interpreter is persistent the (compiled) package is already in memory when called
http://www.modperlcookbook.org/ 23
"The dream is always the same"
• the basic process for mod_perl is the same for the other request phases as for content generation (eg, Registry)– Apache passes control to mod_perl
– mod_perl passes control to your Perl handler
– your Perl subroutine defines the status
– mod_perl passes status back to Apache
– Apache continues along
http://www.modperlcookbook.org/ 24
Apache Request Cycleclient request
URI-based init
content
MIME setting
fixups
URI translation
file-based init
resource control
logging
http://www.modperlcookbook.org/ 25
Key Questions
• What is the Apache default behavior for the phase?
• What is a typical mod_perl usage of the phase?
• What happens on success?
• What happens on error?
http://www.modperlcookbook.org/ 26
The Client Requestclient request
GET /perl-status HTTP/1.1Accept: text/html, image/png, image/jpeg, image/gif, image/x-xbitmap, */*Accept-Encoding: deflate, gzip, x-gzip, identity, *;q=0Accept-Language: enCache-Control: no-cacheConnection: Keep-Alive, TEHost: www.example.comUser-Agent: Mozilla/4.0 (compatible; MSIE 5.0; Windows 2000) Opera 5.12 [en]
http://www.modperlcookbook.org/ 27
URI-based Initializationclient request
URI-based init
http://www.modperlcookbook.org/ 28
URI-based Initialization
• Request URI and headers are known
• Apache request record has been populated
• The first place you can insert processing
• Apache has no default behavior
• All configured handlers are run
http://www.modperlcookbook.org/ 29
URI-based Initializationclient request
PerlPostReadRequestHandler
http://www.modperlcookbook.org/ 30
URI-based Initializationclient request
PerlInitHandler
http://www.modperlcookbook.org/ 31
PerlPostReadRequestHandler
• Non-specific hook
• Useful for adding processing that needs to occur on every request
• every request means every request
http://www.modperlcookbook.org/ 32
A sample...
• Object: to protect our name-based virtual hosts from HTTP/1.0 requests Apache can't handle
http://www.modperlcookbook.org/ 33
HTTP/1.0 and Host
• HTTP/1.0 does not require a Host header
• assumes a “one host per IP" configuration
• this limitation "breaks" name-based virtual host servers for browsers that follow HTTP/1.0 to the letter– most send the Host header, so all is well
http://www.modperlcookbook.org/ 34
A sample...
• Object: to protect our name-based virtual hosts from HTTP/1.0 requests Apache can't handle
• Method: intercept every request prior to content-generation and return an error unless...– there is a Host header
– the request is an absolute URI
http://www.modperlcookbook.org/ 35
Anatomy of a Handler
• a mod_perl handler is just an ordinary Perl module
• visible through @INC– including mod_perl extra paths
• contains a package declaration
• has at least one subroutine– typically the handler() subroutine
http://www.modperlcookbook.org/ 36
package Cookbook::TrapNoHost;
use Apache::Constants qw(DECLINED BAD_REQUEST);use Apache::URI;
use strict;
sub handler {
my $r = shift;
# Valid requests for name based virtual hosting are: # requests with a Host header, or # requests that are absolute URIs.
unless ($r->headers_in->get('Host') || $r->parsed_uri->hostname) {
$r->custom_response(BAD_REQUEST, "Oops! Did you mean to omit a Host header?\n");
return BAD_REQUEST; }
return DECLINED;}1;
http://www.modperlcookbook.org/ 37
Setup
• add TrapNoHost.pm to @INCServerRoot/lib/perl/Cookbook/TrapNoHost.pm
• add to httpd.confPerlModule Cookbook::TrapNoHostPerlInitHandler Cookbook::TrapNoHost
• that's it!
http://www.modperlcookbook.org/ 38
Apache Request Cycleclient request
URI-based init
content
MIME setting
fixups
URI translation
file-based init
resource control
logging
http://www.modperlcookbook.org/ 39
Intercept the Requestclient request
HTTP/1.1 400 Bad RequestDate: Tue, 04 Jun 2002 01:17:52 GMTServer: Apache/1.3.25-dev (Unix) mod_perl/1.27_01-dev Perl/v5.8.0Connection: closeContent-Type: text/html; charset=iso-8859-1
Oops! Did you mean to omit a Host header?
PerlPostReadRequestHandler
http://www.modperlcookbook.org/ 40
Intercept the Requestclient request
logging PerlPostReadRequestHandler
http://www.modperlcookbook.org/ 41
Key Concepts
• The Apache request object, $r
• The Apache::Table class
• the Apache::URI class
• Return values and the Apache::Constants class
http://www.modperlcookbook.org/ 42
Apache Request Object
• passed to handlers or available via Apache->request()$r = shift; # from @_ passed to handler()
$r = Apache->request();
• provides access to the Apache class, which provides access to request attributes
• singleton-like constructor, always returning the same object
http://www.modperlcookbook.org/ 43
Apache::Table
• the Apache::Table class provides the underlying API for the following request attributes...– $r->headers_in()
– $r->headers_out()
– $r->err_headers_out()
– $r->dir_config()
– $r->subprocess_env()
– $r->notes()
– Apache::Request::param()
• Apache::Request::parms() in old releases
http://www.modperlcookbook.org/ 44
Apache::Table
• to manipulate Apache::Table objects, you use the provided methods– get()
– set()
– add()
– unset()
– do()
– merge()
– new()
– clear()
http://www.modperlcookbook.org/ 45
Apache Table Properties
• Apache tables have some nice properties– case insensitive
– allow for multi-valued keys
• they also have one important limitation– can contain only simple strings
– use pnotes() for Perl scalars
http://www.modperlcookbook.org/ 46
keeps the Net flowin'...
• both case-insensitivity and multiple values are key to manipulating headers$r->headers_out->set('Set-Cookie' => 'name=foo');
$r->headers_out->add('set-cookie' => 'name=bar');
my @cookies = $r->headers_out->get('Set-cookie');
http://www.modperlcookbook.org/ 47
Table Iteration
• Apache::Table objects use a special idiom when you need to operate on every item in a tablemy $input = $apr->param; # Apache::Table object
$input->do(sub {
my ($key, $value) = @_;
$log->info("input: name = $key, value = $value");
1;
});
http://www.modperlcookbook.org/ 48
More Apache::Table Fun
• TiredPerlSetVar Sails "jib"
my $sail = $r->dir_config('Sails');
• WiredPerlSetVar Sails "spinnaker"PerlAddVar Sails "blooper"
my @sails = $r->dir_config->get('Sails');
http://www.modperlcookbook.org/ 49
Trickery
• really understanding the Apache::Table class will make you a better mod_perl programmer
• every table can be set– $r->headers_in()
– $r->headers_out()
– $r->err_headers_out()
– $r->dir_config()
– $r->subprocess_env()
– $r->notes()
– Apache::Request::param()
http://www.modperlcookbook.org/ 50
Trickery
• handle "what if?" cases$r->headers_in->set('Set-Cookie' => 'name=foo');
my $sub = $r->lookup_uri('/scripts/foo.html');
• gratuitous exploitation# configure "PerlSetVar Filter On" on-the-fly
$r->dir_config->set(Filter => 'On');
http://www.modperlcookbook.org/ 51
All About URIs
• when Apache parses the incoming request, it puts parts of the URI into the request record
• for just digging out the request URI you typically want the request attribute $r->uri()my $uri = $r->uri; # $uri is "/index.html"
• sometimes you need all the URI parts, like the scheme or port
http://www.modperlcookbook.org/ 52
Apache::URI
• mod_perl provides the Apache::URI utility class for handling URIs
• allows for manipulating the current URI as well as constructing a new URI
• unfortunately, it has two distinct interfaces that contain very subtle differences
• our code uses those differences to our advantage
http://www.modperlcookbook.org/ 53
it all boils down to...
• the differences betweenApache::URI->parse($r) and $r->parsed_uri() are very confusing
• use Apache::URI->parse($r) for creating a self-referential URI that needs to point to the same server
• use $r->parsed_uri() for accessing request attributes
http://www.modperlcookbook.org/ 54
Apache::Constants
• Apache::Constants class provides over 90 runtime constantsuse Apache::Constants qw(DECLINED BAD_REQUEST);
• the most common are:OK
SERVER_ERROR
REDIRECT
DECLINED
http://www.modperlcookbook.org/ 55
Return Values
• handlers are expected to return a value
• the return value of the handler defines the status of the request
• Apache defines three "good" return values
OK – all is well
DECLINED – forget about me
DONE – we're finished, start to log
• All other values are "errors" and trigger the ErrorDocument cycle
http://www.modperlcookbook.org/ 56
When handlers turn bad...
• "error" return codes are not always errors
• instead, they indicate a new route for the request
• errors codes take effect immediately– other scheduled handlers are not run
http://www.modperlcookbook.org/ 57
package My::Redirect;
use Apache::Constants qw(REDIRECT);
sub handler {
my $r = shift;
$r->headers_out->set(Location => '/foo');
return REDIRECT;
};
1;
http://www.modperlcookbook.org/ 58
Key Concepts
• The Apache request object, $r
• The Apache::Table class
• the Apache::URI class
• Return values and the Apache::Constants class
http://www.modperlcookbook.org/ 59
URI Translationclient request
URI-based init
URI translation
http://www.modperlcookbook.org/ 60
URI Translation
• Apache needs to map the URI to a physical file on disk
• Default is to prepend DocumentRoot to the URIDocumentRoot /usr/local/apache/htdocs
http://www.modperlcookbook.org/ 61
URI Translation
• Directives like Alias override the defaultDocumentRoot /usr/local/apache/htdocs
Alias /manual/ /usr/local/apache/manual/
<Directory /usr/local/apache/manual> ...
• Some URIs have no associated file, but Apache tries anyway<Location server-status>
...
http://www.modperlcookbook.org/ 62
URI Translationclient request
URI-based init
PerlTransHandler
http://www.modperlcookbook.org/ 63
PerlTransHandler
• Useful for overriding the Apache default
• Allows you to be extremely devious
• There are a few pitfalls of which to be aware
http://www.modperlcookbook.org/ 64
Simple PerlTransHandler
• Object: be rid of those silly favicon.ico requests that end up 404
• Method: translate the incoming URI to a common place if it matches favicon.ico
http://www.modperlcookbook.org/ 65
package Cookbook::Favicon;
use Apache::Constants qw(DECLINED);use strict;
sub handler {
my $r = shift;
$r->uri("/images/favicon.ico") if $r->uri =~ m!/favicon\.ico$!;
return DECLINED;}1;
http://www.modperlcookbook.org/ 66
Setup
• add Favicon.pm to @INCServerRoot/lib/perl/Cookbook/Favicon.pm
• add to httpd.confPerlModule Cookbook::FaviconPerlTransHandler Cookbook::Favicon
• that's it!
http://www.modperlcookbook.org/ 67
Why Not Use mod_rewrite?
• our Cookbook::Favicon is pretty much the same asRewriteRule /favicon.ico$ /images/favicon.ico
• Let's look at a more clever example...
http://www.modperlcookbook.org/ 68
Mischievous Behavior
• Simple URI re-mapping is only the beginning
• Apache has this neat, built-in functionality called proxying– provided you have mod_proxy installed
• With mod_perl and mod_proxy you can proxy just about anything...
http://www.modperlcookbook.org/ 69
Advanced PerlTransHandler
• Object: create a proxy that uses our local Apache documentation instead of ASF servers
• Method: intercept proxy requests and silently replace calls to http://httpd.apache.org/docs with /usr/local/apache/htdocs/manual
http://www.modperlcookbook.org/ 70
Client Setup
http://www.modperlcookbook.org/ 71
package My::ManualProxy;
use Apache::Constants qw(OK DECLINED);use strict;
sub handler {
my $r = shift;
return DECLINED unless $r->proxyreq;
my (undef, $file) = $r->uri =~ m!^http://(www|httpd).apache.org/(.*)!;
if ($file =~ m!^docs/!) {
$file =~ s!^docs/!manual/!;
$file = join "/", ($r->document_root, $file);
if (-f $file) {
$r->filename($file); # use local disk
$r->proxyreq(0); # fool mod_mime
return OK; } } return DECLINED;}1;
http://www.modperlcookbook.org/ 72
Proxy in Action
GET http://httpd.apache.org/docs/mod/directives.html HTTP/1.1Accept: text/html, image/png, image/jpeg, image/gif, image/x-xbitmap, */*Accept-Encoding: deflate, gzip, x-gzip, identity, *;q=0Accept-Language: enConnection: Keep-Alive, TEHost: httpd.apache.orgUser-Agent: Mozilla/4.0 (compatible; MSIE 5.0; Windows 2000) Opera 5.12 [en]
HTTP/1.1 200 OKLast-Modified: Sat, 25 May 2002 22:15:27 GMTETag: "240c0-3360-3cf00cff"Accept-Ranges: bytesContent-Length: 13152Keep-Alive: timeout=15, max=100Connection: Keep-AliveContent-Type: text/html
http://www.modperlcookbook.org/ 73
Winner Takes All
• The first URI translation handler to return OK ends the phase
• Return OK only when you map the file to disk yourself
• Return DECLINED all other times
http://www.modperlcookbook.org/ 74
File-based Initializationclient request
URI-based init
URI translation
file-based init
http://www.modperlcookbook.org/ 75
File-based Initialization
• URI has been mapped to a file$r->filename is now known
• We also know to which <Location> the request belongs
• Apache has no default behavior
• All configured handlers are run
http://www.modperlcookbook.org/ 76
File-based Initializationclient request
URI-based init
URI translation
PerlHeaderParserHandler
http://www.modperlcookbook.org/ 77
File-based Initializationclient request
URI-based init
URI translation
PerlInitHandler
http://www.modperlcookbook.org/ 78
PerlHeaderParserHandler
• Non-specific hook
• Useful for adding processing that needs to occur on every request to a given URI
http://www.modperlcookbook.org/ 79
Sample Usage...
• Parsing out the query string or POST data on each request is a pain
• For the most part, you know you need it to every request to a given <Location>
• Modularize the parsing code
http://www.modperlcookbook.org/ 80
Apache::RequestNotes
• Apache::RequestNotes parses cookies and input parameters
• stores the data in pnotes() for later retrieval
http://www.modperlcookbook.org/ 81
SetupAlias /perl-bin /usr/local/apache/perl-bin
<Location /perl-bin/> SetHandler perl-script PerlHandler Apache::Registry Options +ExecCGI
PerlInitHandler Apache::RequestNotes</Location>
http://www.modperlcookbook.org/ 82
<Location> Processingclient request
URI-based init
URI translation
PerlInitHandler
my $input = $r->pnotes('INPUT'); # Apache::Table reference
my $uploads = $r->pnotes('UPLOADS'); # Apache::Upload array ref
my $cookies = $r->pnotes('COOKIES'); # hash reference
http://www.modperlcookbook.org/ 83
Resource Controlclient request
URI-based init
URI translation
file-based init
resource control
http://www.modperlcookbook.org/ 84
Resource Control
• Request is inside a particular <Location> container
• Apache provides three different layers of control to determine who gets access to the resource
• Client access checker
• User ID checker
• User authorization checker
http://www.modperlcookbook.org/ 85
Resource Controlclient request
URI-based init
URI translation
file-based init
Client Access
User ID
User Authorization
http://www.modperlcookbook.org/ 86
Resource Controlclient request
URI-based init
URI translation
file-based init
PerlAccessHandler
PerlAuthenHandler
PerlAuthzHandler
http://www.modperlcookbook.org/ 87
User Access
• Used to make access decisions based on Client information– Client IP
– Client User-Agent
– Request URI
• mod_access controls Apache's default– Allow from localhost
• All configured handlers run
http://www.modperlcookbook.org/ 88
PerlAccessHandler
• Useful for making same decisions as mod_auth
• do it in Perl
http://www.modperlcookbook.org/ 89
Simple PerlAccessHandler
• Object: get debugging telnet sessions past Basic authentication
• Method: set the Authorization header to a known user if coming from localhost
http://www.modperlcookbook.org/ 90
package My::DefaultLogin;
use Apache::Constants qw(OK);
use MIME::Base64 ();use Socket qw(sockaddr_in inet_ntoa);
use strict;
sub handler {
my $r = shift;
my $c = $r->connection;
my $local_ip = inet_ntoa((sockaddr_in($c->local_addr))[1]);
if ($c->remote_ip eq $local_ip) { my $user = 'bug'; my $passwd = 'squashing';
# Join user and password and set the incoming header. my $credentials = MIME::Base64::encode(join(':', $user, $passwd));
$r->headers_in->set(Authorization => "Basic $credentials"); }
return OK;}1;
http://www.modperlcookbook.org/ 91
Setup
• add DefaultLogin.pm to @INCServerRoot/lib/perl/My/DefaultLogin.pm
• add to httpd.confPerlModule My::DefaultLogin
PerlAccessHandler My::DefaultLogin
• that's it!
http://www.modperlcookbook.org/ 92
How to Deny Access
• Each access handler returns OK if the client meets its conditions
• Access handlers return FORBIDDEN to decline access
http://www.modperlcookbook.org/ 93
Reality...
• In the real world, you could accomplish the same thing with the core Satisfy directiveAuthType BasicAuthName "cookbook"AuthUserFile .htpasswdRequire valid-userAllow from localhostSatisfy any
http://www.modperlcookbook.org/ 94
Resource Controlclient request
URI-based init
URI translation
file-based init
Client Access
User ID
http://www.modperlcookbook.org/ 95
User Authentication
• Apache default authentication mechanism is mod_auth
• Winner takes all
• uses a password file generated using Apache's htpasswd utilitygeoff:zzpEyL0tbgwwk
http://www.modperlcookbook.org/ 96
User Authentication
• configuration placed in .htaccess file or httpd.conf
AuthUserFile .htpasswd AuthName "cookbook" AuthType Basic Require valid-user
http://www.modperlcookbook.org/ 97
User Authentication
• configuration placed in .htaccess file or httpd.conf
AuthUserFile .htpasswd AuthName "cookbook" AuthType Basic Require valid-user
http://www.modperlcookbook.org/ 98
How Authentication Works
• client requests a documentGET /perl-status HTTP/1.1Accept: text/xml, image/png, image/jpeg, image/gif, text/plainAccept-Charset: ISO-8859-1, utf-8;q=0.66, *;q=0.66Accept-Encoding: gzip, deflate, compress;q=0.9Accept-Language: en-usConnection: keep-aliveHost: www.example.comKeep-Alive: 300User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US)
• server denies requestHTTP/1.1 401 Authorization RequiredWWW-Authenticate: Basic realm="my site"Keep-Alive: timeout=15, max=100Connection: Keep-AliveTransfer-Encoding: chunkedContent-Type: text/html; charset=iso-8859-1
http://www.modperlcookbook.org/ 99
How Authentication Works
client request
URI-based init
URI translation
file-based init
client request
Client Access
User ID
http://www.modperlcookbook.org/ 100
How Authentication Works
client request
URI-based init
URI translation
file-based init
client request
HTTP/1.1 401 Authorization Required
Client Access
User ID
http://www.modperlcookbook.org/ 101
How Authentication Works
client request
URI-based init
URI translation
file-based init
logging
client request
Client Access
User ID
http://www.modperlcookbook.org/ 102
How Authentication Works
• client sends a new requestGET /perl-status HTTP/1.1Accept: text/xml, image/png, image/jpeg, image/gif, text/plainAccept-Charset: ISO-8859-1, utf-8;q=0.66, *;q=0.66Accept-Encoding: gzip, deflate, compress;q=0.9Accept-Language: en-usAuthorization: Basic Z2VvZmY6YWZha2VwYXNzd29yZA==Connection: keep-aliveHost: www.example.comKeep-Alive: 300User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US)
• server sends documentHTTP/1.1 200 OKKeep-Alive: timeout=15, max=99Connection: Keep-AliveTransfer-Encoding: chunkedContent-Type: text/html
http://www.modperlcookbook.org/ 103
Resource Controlclient request
URI-based init
URI translation
file-based init
Client Access
PerlAuthenHandler
http://www.modperlcookbook.org/ 104
Who Uses Flat Files?
• flat files are limiting, hard to manage, difficult to integrate, and just plain boring
• we can use the Apache API and Perl to replace flat files with our own authentication mechanism
http://www.modperlcookbook.org/ 105
Do it in Perl
• since mod_perl gives us the ability to intercept the request cycle before Apache, we can authenticate using Perl instead
• Apache provides an API, making the job easy
• mod_perl provides access to the Apache API
http://www.modperlcookbook.org/ 106
package My::Authenticate;
use Apache::Constants qw(OK DECLINED AUTH_REQUIRED);use strict;
sub handler {
my $r = shift;
# Let subrequests pass. return DECLINED unless $r->is_initial_req;
# Get the client-supplied credentials. my ($status, $password) = $r->get_basic_auth_pw;
return $status unless $status == OK;
# Perform some custom user/password validation. return OK if authenticate_user($r->user, $password);
# Whoops, bad credentials. $r->note_basic_auth_failure; return AUTH_REQUIRED;}
http://www.modperlcookbook.org/ 107
Configuration
• change AuthUserFile .htpasswd AuthName "cookbook" AuthType Basic Require valid-user
• to PerlAuthenHandler My::Authenticate AuthName "cookbook" AuthType Basic Require valid-user
http://www.modperlcookbook.org/ 108
The Choice is Yours
• how you decide to authenticate is now up to you
sub authenticate_user { my ($user, $pass) = @_;
return $user eq $pass;}
• are you seeing the possibilities yet?
http://www.modperlcookbook.org/ 109
The Power of CPAN
• over 25 Apache:: shrink-wrapped modules on CPAN for authentication– SecureID
– Radius
– SMB
– LDAP
– NTLM
http://www.modperlcookbook.org/ 110
To Infinity and Beyond!
• this example only covered Basic authentication via popup box
• the same techniques can be used to authenticate via a login form plus cookies, munged URLs, or hidden fields
• extended to use Digest authentication as well
http://www.modperlcookbook.org/ 111
User Authorizationclient request
URI-based init
URI translation
file-based init
Client Access
User ID
User Authorization
http://www.modperlcookbook.org/ 112
User Authorization
• We now know the user has supplied a valid password
• now it's up to us to decide if we want the user to have access
http://www.modperlcookbook.org/ 113
User Authorization
• Apache's default behavior varies, depending on the syntax of Require– Require valid-user
– Require user foo
– Require group bar
– Require file-owner
– Require file-group
• Winner takes all
http://www.modperlcookbook.org/ 114
User Authorizationclient request
URI-based init
URI translation
file-based init
Client Access
User ID
PerlAuthzHandler
http://www.modperlcookbook.org/ 115
PerlAuthzHandler
• Key is the requires() method• $r->requires() returns an array of
hashes representing all Require directivesRequire user grier ryanRequire group admiral
[ { requirement => 'user grier ryan', method => -1},
{ requirement => 'group admiral', method => -1},];
http://www.modperlcookbook.org/ 116
PerlAuthzHandler
• Once you get the Require directive back, you can decide which users meet the authorization requirement
foreach my $requires (@{$r->requires}) {
my ($directive, @list) = split " ", $requires->{requirement};
# We're ok if only valid-user was required. return OK if lc($directive) eq 'valid-user';
# Likewise if the user requirement was specified and # we match based on what we already know. return OK if lc($directive) eq 'user' && grep { $_ eq $r->user } @list;}
http://www.modperlcookbook.org/ 117
MIME-type Checkingclient request
URI-based init
MIME setting
URI translation
file-based init
resource control
http://www.modperlcookbook.org/ 118
MIME-type Checking
• We now have the physical resource and have decided the user can see it
• time to set the Content-Type header
• Apache's default is mod_mime, which examines the file extension
• mod_mime also decides which content handler will run– AddHandler server-parsed
• Winner takes all
http://www.modperlcookbook.org/ 119
MIME-type Checkingclient request
URI-based init
PerlTypeHandler
URI translation
file-based init
resource control
http://www.modperlcookbook.org/ 120
PerlTypeHandler
• mod_mime has a stranglehold on the request– if you set the Content-Type and return OK, mod_mime won't set the content handler
– if you set the Content-Type and return DECLINED, mod_mime will clobber the Content-Type
• Best to just forget about the PerlTypeHandler
http://www.modperlcookbook.org/ 121
Fixupsclient request
URI-based init
URI translation
file-based init
resource controlMIME setting
fixups
http://www.modperlcookbook.org/ 122
Fixups
• The final chance to fiddle with the request before content is written to the client
• non-specific phase
• Apache has no default behavior
• All configured handlers will be run
http://www.modperlcookbook.org/ 123
Fixupsclient request
URI-based init
URI translation
file-based init
resource controlMIME setting
PerlFixupHandler
http://www.modperlcookbook.org/ 124
PerlFixupHandler
• Good place to do anything you might have wanted to do in the PerlTypeHandler
• especially setting $r->handler()
http://www.modperlcookbook.org/ 125
Sample Fixup
• Object: re-implement XBitHack in Perl
• Method: after some basic checks, turn the request over to mod_include using $r->handler()
http://www.modperlcookbook.org/ 126
package Cookbook::XBitHack;
use Apache::Constants qw(OK DECLINED OPT_INCLUDES);use Apache::File;use Fcntl qw(S_IXUSR S_IXGRP);use strict;
sub handler {
my $r = shift;
return DECLINED unless (-f $r->finfo && # the file exists $r->content_type eq 'text/html' && # and is HTML $r->allow_options & OPT_INCLUDES); # and we have Options +Includes
# Find out the user and group execution status. my $mode = (stat _)[2];
# We have to be user executable specifically. return DECLINED unless ($mode & S_IXUSR); # Set the Last-Modified header if group executable. $r->set_last_modified((stat _)[9]) if ($mode & S_IXGRP);
# Make sure mod_include picks it up. $r->handler('server-parsed');
return OK;}1;
http://www.modperlcookbook.org/ 127
Content Generationclient request
URI-based init
MIME setting
URI translation
file-based init
resource control
content
fixups
http://www.modperlcookbook.org/ 128
Content Generation
• Apache's default content handler is default-handler, which takes care of all HTTP/1.1 events– byteserving
– cache headers
• one and only one C module gets to handle content-generation– mod_cgi
– mod_perl
– mod_include
http://www.modperlcookbook.org/ 129
Content Generationclient request
URI-based init
URI translation
file-based init
resource controlMIME setting
PerlHandler
fixups
http://www.modperlcookbook.org/ 130
Sample PerlHandler
• create a handler that uses CPAN module HTML::Clean to "clean" outgoing documents
• alter the handler to take advantage of more advanced mod_perl features
http://www.modperlcookbook.org/ 131
package TPC::Clean;
use Apache::Constants qw(OK DECLINED);use Apache::File;use HTML::Clean;use strict;
sub handler {
my $r = shift;
my $fh = Apache::File->new($r->filename) or return DECLINED;
my $dirty = do {local $/; <$fh>};
my $h = HTML::Clean->new(\$dirty); $h->level(3); $h->strip;
$r->send_http_header('text/html'); print ${$h->data};
return OK;}1;
http://www.modperlcookbook.org/ 132
Configuration
• add directives to httpd.conf to mirror DocumentRoot
Alias /clean /usr/local/apache/htdocs
<Location /clean> SetHandler perl-script PerlHandler TPC::Clean</Location>
http://www.modperlcookbook.org/ 133
Results
• original: 202 bytes<html>
<body>
<form method="GET" action="/foo">
Text: <input type="text" name="foo"><br>
<input type="submit">
</form>
<strong>hi there </strong>
</body>
</html>
http://www.modperlcookbook.org/ 134
The Choice is Yours!
• Content filtering with Apache::Filter?
• Using proper cache-friendly headers?
http://www.modperlcookbook.org/ 135
Magic
• filtered content generation impossible with Apache 1.3– can't send CGI output through mod_ssi
– reason for output filters in Apache 2.0
• mod_perl has had output filtering for years– possible due to Perl's TIEHANDLE interface
http://www.modperlcookbook.org/ 136
TIEHANDLE in mod_perl
• mod_perl tie()s STDOUT to the Apache class prior to the content generation phase
• you can tie() STDOUT as well and override mod_perl's default behavior
• very useful with stacked handlers
http://www.modperlcookbook.org/ 137
Stacked Handlers
• for each phase of the request, mod_perl will run any registered Perl handlers for that phase
• you can register more than one Perl handler per phase
• whether all handlers are called depends on the syntax of the phase itself in Apache
http://www.modperlcookbook.org/ 138
Stacked Handlers
• some phase run until the handler list is exhausted
PerlLogHandler My::DBLoggerPerlLogHandler My::FileLogger
• some phases run until one handler returns OK
PerlTransHandler My::TranslateHTMLPerlTransHandler My::TranslateText
• all phases terminate on "error"
http://www.modperlcookbook.org/ 139
Stacked Content Handlers
• for the content generation phase, running multiple Perl handlers can be incredibly powerful
• Apache::Filter implements a simple interface for pipelining content handlers
• uses TIEHANDLE in the background
http://www.modperlcookbook.org/ 140
Now for the Fun Part
• modify our handler to work either standalone or as part of a handler chain
• easy using Apache::Filter
http://www.modperlcookbook.org/ 141
Apache::Filter Changes
• change: my $fh = Apache::File->new($r->filename) or return DECLINED;
• to: my $fh = undef;
if (lc $r->dir_config('Filter') eq 'on') { $r = $r->filter_register;
($fh, my $status) = $r->filter_input; return $status unless $status == OK } else { $fh = Apache::File->new($r->filename) or return NOT_FOUND; }
http://www.modperlcookbook.org/ 142
Configuration• change
Alias /clean /usr/local/apache/htdocs<Location /clean> SetHandler perl-script PerlHandler My::Clean</Location>
• toPerlModule Apache::Filter
Alias /clean /usr/local/apache/htdocs<Location /clean> SetHandler perl-script PerlHandler My::Clean PerlSetVar Filter On</Location>
http://www.modperlcookbook.org/ 143
Apache::Filter
• Apache::Filter adds methods to the Apache class– do not have to use Apache::Filter;
– do have to PerlModule Apache::Filter
• because filtering content is tricky, the interface is quirky$r = $r->filter_register;
http://www.modperlcookbook.org/ 144
Apache::Filter
• to get at filtered input, call $r->filter_input()
• returns an open filehandle on the input stream
• if the first filter, the filehandle is for $r->filename()
• all filters use the same API, regardless of their position in the chain
http://www.modperlcookbook.org/ 145
So What?
• new Apache::Filter aware code works the same as the standalone module
• can be used as part of a PerlHandler chain
• can be any part of the chain
http://www.modperlcookbook.org/ 146
Compressing Output
• use Apache::Compress– available from CPAN
– Checks the Accept-Encoding header
– uses Compress::Zlib to compress output
– Apache::Filter aware
http://www.modperlcookbook.org/ 147
Configuration
• changeAlias /clean /usr/local/apache/htdocs<Location /clean> SetHandler perl-script PerlHandler My::Clean PerlSetVar Filter On</Location>
• toAlias /clean /usr/local/apache/htdocs<Location /clean> SetHandler perl-script PerlHandler My::Clean Apache::Compress PerlSetVar Filter On</Location>
http://www.modperlcookbook.org/ 148
• http://perl.apache.org/index.html
– straight HTML• 35668 bytes
– My::Clean• 28162 bytes (79%)
– Apache::Compress• 8177 bytes (23%)
– My::Clean + Apache::Compress• 7458 bytes (21%)
Stacked Power
http://www.modperlcookbook.org/ 149
Caveats
• using Apache::Filter is actually a bit more complex than this...
• see the recent version of Apache::Clean to get an idea
http://www.modperlcookbook.org/ 150
Cache Headers
• we often think of dynamic content as "could be different on any given access"
• "dynamic" content can also be static with clearly defined factors that can change its meaning
• by properly managing HTTP/1.1 cache headers, we can reduce strain on our servers
http://www.modperlcookbook.org/ 151
Conditional GET Request
• HTTP/1.1 allows for a conditional GET request
• clients are allowed to use cached content based on information about the resource
• information is provided by both the client and the server
http://www.modperlcookbook.org/ 152
GET /manual/index.html HTTP/1.1Accept: text/html, image/png, image/jpeg, image/gif, image/x-xbitmap, */*Accept-Charset: windows-1252;q=1.0, utf-8;q=1.0, utf-16;q=1.0, iso-8859-1;q=0.6, *;q=0.1Accept-Encoding: deflate, gzip, x-gzip, identity, *;q=0Accept-Language: enConnection: Keep-Alive, TEHost: mainsheet.laserlink.netTE: deflate, gzip, chunked, identity, trailersUser-Agent: Mozilla/4.0 (compatible; MSIE 5.0; Windows XP) Opera 6.03 [en]
HTTP/1.1 200 OKLast-Modified: Thu, 01 Nov 2001 16:35:27 GMTETag: "4c949-2434-3be179cf"Accept-Ranges: bytesContent-Length: 9268Connection: closeContent-Type: text/html
http://www.modperlcookbook.org/ 153
Conditional GET Request
• for static documents, Apache takes care of making our response cache-friendly
• since the file is on disk, Apache can determine when the file was last changed
• with static files, local modification is the only factor
• still too many rules to keep straight
• Apache provides an API to use so we don't have to think too much
http://www.modperlcookbook.org/ 154
Now for the Fun Part
• modify our handler to be "cache friendly"
• send 304 when the document hasn't changed
• properly handle If-* header comparisons
http://www.modperlcookbook.org/ 155
How do you define change?
• when dynamically altering static documents there are a number of factors to consider– when the file changes on disk
– when the code changes
– when the options to the code changes
• all of these affect the "freshness" of the document
http://www.modperlcookbook.org/ 156
Code Changes
• in order to determine when the code itself changes, we need to mark the modification time of the package
• at request time, we call an API to compare the package modification to the If-Modified-Since header
• on reloads, we regenerate the package modification time
http://www.modperlcookbook.org/ 157
package My::Clean;
use Apache::Constants qw(OK DECLINED);use Apache::File;use HTML::Clean;use strict;
# Get the package modification time...(my $package = __PACKAGE__) =~ s!::!/!g;my $package_mtime = (stat $INC{"$package.pm"})[9];
http://www.modperlcookbook.org/ 158
Configuration Changes
• in order to determine when the options to the code change, we need to mark the modification time of httpd.conf
• at request time, we call an API to compare the configuration modification to the If-Modified-Since header
• on restarts, we regenerate the configuration modification time
http://www.modperlcookbook.org/ 159
package My::Clean;
use Apache::Constants qw(OK DECLINED);use Apache::File;use HTML::Clean;use strict;
# Get the package modification time...(my $package = __PACKAGE__) =~ s!::!/!g;my $package_mtime = (stat $INC{"$package.pm"})[9];
# ...and when httpd.conf was last modifiedmy $conf_mtime = (stat Apache->server_root_relative('conf/httpd.conf'))[9];
# When the server is restarted we need to# make sure we recognize config file changes and propigate# them to the client to clear the client cache if necessary.Apache->server->register_cleanup(sub { $conf_mtime = (stat Apache->server_root_relative('conf/httpd.conf'))[9];});
http://www.modperlcookbook.org/ 160
Resource Changes
• in order to determine when the resources changes, we need to mark the modification time of $r->filename
• at request time, we call an API to compare the resource modification to the If-Modified-Since header
• resource modification is checked on each request
http://www.modperlcookbook.org/ 161
package My::Clean;
use Apache::Constants qw(OK DECLINED);use Apache::File;use HTML::Clean;use strict;
# Get the package modification time...(my $package = __PACKAGE__) =~ s!::!/!g;my $package_mtime = (stat $INC{"$package.pm"})[9];
# ...and when httpd.conf was last modifiedmy $conf_mtime = (stat Apache->server_root_relative('conf/httpd.conf'))[9];
# When the server is restarted we need to# make sure we recognize config file changes and propigate# them to the client to clear the client cache if necessary.Apache->server->register_cleanup(sub { $conf_mtime = (stat Apache->server_root_relative('conf/httpd.conf'))[9];});
sub handler {
...
}
1;
http://www.modperlcookbook.org/ 162
sub handler {
my $r = shift;
my $fh = Apache::File->new($r->filename) or return DECLINED;
my $dirty = do {local $/; <$fh>};
my $h = HTML::Clean->new(\$dirty); $h->level(3); $h->strip;
$r->update_mtime($package_mtime); $r->update_mtime((stat $r->finfo)[9]); $r->update_mtime($conf_mtime);
$r->set_last_modified; $r->set_etag; $r->set_content_length(length ${$h->data});
# only send the file if it meets cache criteria if ((my $status = $r->meets_conditions) == OK) { $r->send_http_header('text/html'); } else { return $status; }
print ${$h->data}; return OK;}
http://www.modperlcookbook.org/ 163
Loggingclient request
URI-based init
MIME setting
URI translation
file-based init
resource control
fixups
content
logging
http://www.modperlcookbook.org/ 164
Logging
• Apache's default is to use mod_log_config in common formatLogFormat "%h %l %u %t \"%r\" %>s %b" commonCustomLog logs/access_log common
• Most people tweak this to combinedLogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined
• The connection to the client is still open!
• All configured handlers run
http://www.modperlcookbook.org/ 165
Loggingclient request
URI-based init
URI translation
file-based init
resource controlMIME setting
fixups
content
PerlLogHandler
http://www.modperlcookbook.org/ 166
PerlLogHandler
• Useful for logging using interfaces in which Perl shines– like databases
http://www.modperlcookbook.org/ 167
Logging to a Database
http://www.modperlcookbook.org/ 168
Logging to a Database
• Logging directly to a database makes life easier if you have an application for which you need lots of reports
• DBI rules
http://www.modperlcookbook.org/ 169
package Cookbook::SiteLog;
use Apache::Constants qw(OK);use DBI;use strict;
sub handler { my $r = shift;
my $dbh = DBI->connect($r->dir_config('DBASE'), {RaiseError => 1, AutoCommit => 1, PrintError => 1}) or die $DBI::errstr;
my %columns = ( status => $r->status, bytes => $r->bytes_sent, language => $r->headers_in->get('Accept-Language'), );
my $fields = join "$_,", keys %columns; my $values = join ', ', ('?') x values %columns;
my $sql = qq( insert into www.sitelog (hit, servedate, $fields) values (hitsequence.nextval, sysdate, $values) );
my $sth = $dbh->prepare($sql); $sth->execute(values %columns);
return OK;}1;
http://www.modperlcookbook.org/ 170
Cleanupsclient request
URI-based init
MIME setting
URI translation
file-based init
resource control
fixups
content
logging
cleanups
http://www.modperlcookbook.org/ 171
Cleanups
• Apache doesn't really have a cleanup phase
• It calls a function when the request memory pool is destroyed
• The connection to the client is closed
http://www.modperlcookbook.org/ 172
Cleanupsclient request
URI-based init
MIME setting
URI translation
file-based init
resource control
fixups
content
logging
PerlCleanupHandler
http://www.modperlcookbook.org/ 173
PerlCleanupHandler
• Generally used to do any end of request cleanups– Apache::File::tmpfile() removes its
temporary file here
• Also good for logging– no active browsers
http://www.modperlcookbook.org/ 174
Debugging
• Let's examine a very conceptual debugging cleanup handler– I actually did use it for a while
http://www.modperlcookbook.org/ 175
package Cookbook::TraceError;
use Apache::Constants qw(OK SERVER_ERROR DECLINED);use Apache::Log;use strict;
sub handler { my $r = shift;
# Don't do anything unless the main process errors. return DECLINED unless $r->is_initial_req && $r->status == SERVER_ERROR;
my $old_loglevel = $r->server->loglevel(Apache::Log::DEBUG); my $old_trace = DBI->trace(2);
# Start the debuggging request. my $sub = $r->lookup_uri($r->uri);
# run() would ordinarily send content to the client, but # since we're in cleanup, the connection is already closed. $sub->run;
# Reset things back to their original state - # loglevel(N) will persist for the lifetime of the child process. DBI->trace($old_trace); $r->server->loglevel($old_loglevel);
return OK;}1;
http://www.modperlcookbook.org/ 176
Fine Manuals• Writing Apache Modules with Perl and C
– http://www.modperl.com/
• mod_perl Developer's Cookbook– http://www.modperlcookbook.org/
• Practical mod_perl– http://www.modperlbook.org/
• mod_perl Pocket Reference– http://www.refcards.com/
• mod_perl Guide– http://perl.apache.org/guide/
• mod_perl at the ASF– http://perl.apache.org/
http://www.modperlcookbook.org/ 177
Materials
These slides freely available at
http://www.modperlcookbook.org/~geoff/
http://www.modperlcookbook.org/ 178
Book Signing, Chat, etc.
Thursday, 2 PM at Powell's