1 programming the apache lifecycle geoffrey young [email protected]

178
http://www.modperlcookbook.org/ 1 Programming the Apache Lifecycle Geoffrey Young [email protected]

Upload: jonas-green

Post on 19-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 1

Programming the Apache Lifecycle

Geoffrey Young

[email protected]

Page 2: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 2

Overview

• Apache and mod_perl 101

• mod_perl Handler Basics

• Using the Apache Framework

• Advanced mod_perl API Features

Page 3: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 3

Apache's Pre-fork Model

• Apache parent process forks multiple child processes

httpd (parent)

httpd (child) httpd (child) httpd (child)httpd (child)

Page 4: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 4

Roles and Responsibilities

• httpd parent processes no actual requests

• all requests are served by the child processes

• requests are handled by any available child process, not necessarily the one that handled the previous request

– remember, the HTTP is stateless by default

Page 5: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 5

Nice, Responsible Children

• each httpd child processes one incoming request at a time

• only when the request is over is the child free to serve the next request

• over time httpd child processes are terminated and replaced with fresh children

Page 6: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 6

Request Phases

• Apache breaks down request processing into separate, logical parts called phases

client request

URI-based init

content

MIME setting

fixups

URI translation

file-based init

resource control

logging

Page 7: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 7

Request Phases

• Apache breaks down request processing into separate, logical parts called phases

• each request is stepped through the phases until...– all processing is complete

– somebody throws an "error"

• developers are given the chance to hook into each phase to add custom processing

Page 8: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 8

So What?

• most Apache users don't worry about the request cycle too much...

• ...but they do use modules that plug into it

Page 9: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 9

... for instanceclient request

URI-based init

URI translationmod_rewrite:RewriteRule /favicon.ico$ /images/favicon.ico

Page 10: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 10

... for instanceclient request

URI-based init

URI translation

file-based init

resource controlmod_auth:AuthUserFile .htpasswd

Page 11: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 11

... for instanceclient request

URI-based init

content

MIME setting

fixups

URI translation

file-based init

resource control

mod_cgi:SetHandler cgi-script

Page 12: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 12

That's great, but...

• breaking down the request into distinct phases has many benefits– gives each processing point a role that

can be easily managed and programmed

– makes Apache more like an application framework rather than a content engine

• but you have to code in C

Page 13: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 13

Enter mod_perl

• mod_perl offers an interface to each phase of the request cycle

• opens up the Apache API to Perl code

• allows you to program targeted parts of the request cycle using Perl

• we like Perl

Page 14: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 14

Apache Request Cycleclient request

URI-based init

content

MIME setting

fixups

URI translation

file-based init

resource control

logging

Page 15: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 15

mod_perl Interfaceclient request

URI-based init

content

MIME setting

fixups

URI translation

file-based init

resource control

logging

PerlPostReadRequestHandler

PerlTransHandler

PerlHeaderParserHandler

PerlAccessHandler PerlAuthenHandler PerlAuthzHandler

PerlTypeHandler

PerlFixupHandler

PerlHandler

PerlLogHandler

PerlCleanupHandler

Page 16: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 16

What is mod_perl?

• mod_perl is the Perl interface to the Apache API

• a C extension module, just like mod_cgi or mod_rewrite

• creates a persistent perl environment embedded within Apache

Page 17: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 17

What's the Big Deal?

• mod_perl allows you to interact with and directly alter server behavior

• gives you the ability to "program within Apache's framework instead of around it"

• allows you to intercept basic Apache functions and replace them with your own (sometimes devious) Perl substitutes

• let's you do it in Perl instead of C

Page 18: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 18

What's a handler?

• the term handler refers to processing that occurs during any of the Apache runtime phases

• this includes the request-time phases as well as the other parts of the Apache runtime, such as restarts

• the use of "handler" for all processing hooks is mod_perl specific

Page 19: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 19

Why use handlers?

• gives each processing point a role that can be easily managed and programmed– process request at the proper phase

– meaningful access to the Apache API

– break up processing into smaller parts

– modularize and encapsulate processing

• makes Apache more like an application framework rather than a content engine

• CPAN

Page 20: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 20

Registry is just a handler

• Apache::Registry is merely an (incredibly clever and amazing) mod_perl handler

• its performance gains are made possible due to what mod_perl really is

• let's take a peek inside...

Page 21: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 21

Apache::Registry

• Client side– http://localhost/perl-bin/bar.pl

• Server side– mod_perl intercepts content generation

– searches @INC for Apache/Registry.pm

– calls Apache::Registry::handler(Apache->request)

– inserts wizardry

– returns response to client

Page 22: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 22

Wizardry, you say?• the wizardry is basically just putting the

CGI script into it's own packagepackage Apache::ROOT::perl_2dbin::foo_2epl;

sub handler { BEGIN { $^W = 1; }; $^W = 1; ... your script here...}1;

• because the perl interpreter is persistent the (compiled) package is already in memory when called

Page 23: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 23

"The dream is always the same"

• the basic process for mod_perl is the same for the other request phases as for content generation (eg, Registry)– Apache passes control to mod_perl

– mod_perl passes control to your Perl handler

– your Perl subroutine defines the status

– mod_perl passes status back to Apache

– Apache continues along

Page 24: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 24

Apache Request Cycleclient request

URI-based init

content

MIME setting

fixups

URI translation

file-based init

resource control

logging

Page 25: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 25

Key Questions

• What is the Apache default behavior for the phase?

• What is a typical mod_perl usage of the phase?

• What happens on success?

• What happens on error?

Page 26: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 26

The Client Requestclient request

GET /perl-status HTTP/1.1Accept: text/html, image/png, image/jpeg, image/gif, image/x-xbitmap, */*Accept-Encoding: deflate, gzip, x-gzip, identity, *;q=0Accept-Language: enCache-Control: no-cacheConnection: Keep-Alive, TEHost: www.example.comUser-Agent: Mozilla/4.0 (compatible; MSIE 5.0; Windows 2000) Opera 5.12 [en]

Page 27: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 27

URI-based Initializationclient request

URI-based init

Page 28: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 28

URI-based Initialization

• Request URI and headers are known

• Apache request record has been populated

• The first place you can insert processing

• Apache has no default behavior

• All configured handlers are run

Page 29: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 29

URI-based Initializationclient request

PerlPostReadRequestHandler

Page 30: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 30

URI-based Initializationclient request

PerlInitHandler

Page 31: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 31

PerlPostReadRequestHandler

• Non-specific hook

• Useful for adding processing that needs to occur on every request

• every request means every request

Page 32: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 32

A sample...

• Object: to protect our name-based virtual hosts from HTTP/1.0 requests Apache can't handle

Page 33: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 33

HTTP/1.0 and Host

• HTTP/1.0 does not require a Host header

• assumes a “one host per IP" configuration

• this limitation "breaks" name-based virtual host servers for browsers that follow HTTP/1.0 to the letter– most send the Host header, so all is well

Page 34: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 34

A sample...

• Object: to protect our name-based virtual hosts from HTTP/1.0 requests Apache can't handle

• Method: intercept every request prior to content-generation and return an error unless...– there is a Host header

– the request is an absolute URI

Page 35: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 35

Anatomy of a Handler

• a mod_perl handler is just an ordinary Perl module

• visible through @INC– including mod_perl extra paths

• contains a package declaration

• has at least one subroutine– typically the handler() subroutine

Page 36: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 36

package Cookbook::TrapNoHost;

use Apache::Constants qw(DECLINED BAD_REQUEST);use Apache::URI;

use strict;

sub handler {

my $r = shift;

# Valid requests for name based virtual hosting are: # requests with a Host header, or # requests that are absolute URIs.

unless ($r->headers_in->get('Host') || $r->parsed_uri->hostname) {

$r->custom_response(BAD_REQUEST, "Oops! Did you mean to omit a Host header?\n");

return BAD_REQUEST; }

return DECLINED;}1;

Page 37: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 37

Setup

• add TrapNoHost.pm to @INCServerRoot/lib/perl/Cookbook/TrapNoHost.pm

• add to httpd.confPerlModule Cookbook::TrapNoHostPerlInitHandler Cookbook::TrapNoHost

• that's it!

Page 38: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 38

Apache Request Cycleclient request

URI-based init

content

MIME setting

fixups

URI translation

file-based init

resource control

logging

Page 39: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 39

Intercept the Requestclient request

HTTP/1.1 400 Bad RequestDate: Tue, 04 Jun 2002 01:17:52 GMTServer: Apache/1.3.25-dev (Unix) mod_perl/1.27_01-dev Perl/v5.8.0Connection: closeContent-Type: text/html; charset=iso-8859-1

Oops! Did you mean to omit a Host header?

PerlPostReadRequestHandler

Page 40: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 40

Intercept the Requestclient request

logging PerlPostReadRequestHandler

Page 41: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 41

Key Concepts

• The Apache request object, $r

• The Apache::Table class

• the Apache::URI class

• Return values and the Apache::Constants class

Page 42: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 42

Apache Request Object

• passed to handlers or available via Apache->request()$r = shift; # from @_ passed to handler()

$r = Apache->request();

• provides access to the Apache class, which provides access to request attributes

• singleton-like constructor, always returning the same object

Page 43: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 43

Apache::Table

• the Apache::Table class provides the underlying API for the following request attributes...– $r->headers_in()

– $r->headers_out()

– $r->err_headers_out()

– $r->dir_config()

– $r->subprocess_env()

– $r->notes()

– Apache::Request::param()

• Apache::Request::parms() in old releases

Page 44: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 44

Apache::Table

• to manipulate Apache::Table objects, you use the provided methods– get()

– set()

– add()

– unset()

– do()

– merge()

– new()

– clear()

Page 45: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 45

Apache Table Properties

• Apache tables have some nice properties– case insensitive

– allow for multi-valued keys

• they also have one important limitation– can contain only simple strings

– use pnotes() for Perl scalars

Page 46: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 46

keeps the Net flowin'...

• both case-insensitivity and multiple values are key to manipulating headers$r->headers_out->set('Set-Cookie' => 'name=foo');

$r->headers_out->add('set-cookie' => 'name=bar');

my @cookies = $r->headers_out->get('Set-cookie');

Page 47: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 47

Table Iteration

• Apache::Table objects use a special idiom when you need to operate on every item in a tablemy $input = $apr->param; # Apache::Table object

$input->do(sub {

my ($key, $value) = @_;

$log->info("input: name = $key, value = $value");

1;

});

Page 48: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 48

More Apache::Table Fun

• TiredPerlSetVar Sails "jib"

my $sail = $r->dir_config('Sails');

• WiredPerlSetVar Sails "spinnaker"PerlAddVar Sails "blooper"

my @sails = $r->dir_config->get('Sails');

Page 49: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 49

Trickery

• really understanding the Apache::Table class will make you a better mod_perl programmer

• every table can be set– $r->headers_in()

– $r->headers_out()

– $r->err_headers_out()

– $r->dir_config()

– $r->subprocess_env()

– $r->notes()

– Apache::Request::param()

Page 50: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 50

Trickery

• handle "what if?" cases$r->headers_in->set('Set-Cookie' => 'name=foo');

my $sub = $r->lookup_uri('/scripts/foo.html');

• gratuitous exploitation# configure "PerlSetVar Filter On" on-the-fly

$r->dir_config->set(Filter => 'On');

Page 51: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 51

All About URIs

• when Apache parses the incoming request, it puts parts of the URI into the request record

• for just digging out the request URI you typically want the request attribute $r->uri()my $uri = $r->uri; # $uri is "/index.html"

• sometimes you need all the URI parts, like the scheme or port

Page 52: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 52

Apache::URI

• mod_perl provides the Apache::URI utility class for handling URIs

• allows for manipulating the current URI as well as constructing a new URI

• unfortunately, it has two distinct interfaces that contain very subtle differences

• our code uses those differences to our advantage

Page 53: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 53

it all boils down to...

• the differences betweenApache::URI->parse($r) and $r->parsed_uri() are very confusing

• use Apache::URI->parse($r) for creating a self-referential URI that needs to point to the same server

• use $r->parsed_uri() for accessing request attributes

Page 54: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 54

Apache::Constants

• Apache::Constants class provides over 90 runtime constantsuse Apache::Constants qw(DECLINED BAD_REQUEST);

• the most common are:OK

SERVER_ERROR

REDIRECT

DECLINED

Page 55: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 55

Return Values

• handlers are expected to return a value

• the return value of the handler defines the status of the request

• Apache defines three "good" return values

OK – all is well

DECLINED – forget about me

DONE – we're finished, start to log

• All other values are "errors" and trigger the ErrorDocument cycle

Page 56: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 56

When handlers turn bad...

• "error" return codes are not always errors

• instead, they indicate a new route for the request

• errors codes take effect immediately– other scheduled handlers are not run

Page 57: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 57

package My::Redirect;

use Apache::Constants qw(REDIRECT);

sub handler {

my $r = shift;

$r->headers_out->set(Location => '/foo');

return REDIRECT;

};

1;

Page 58: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 58

Key Concepts

• The Apache request object, $r

• The Apache::Table class

• the Apache::URI class

• Return values and the Apache::Constants class

Page 59: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 59

URI Translationclient request

URI-based init

URI translation

Page 60: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 60

URI Translation

• Apache needs to map the URI to a physical file on disk

• Default is to prepend DocumentRoot to the URIDocumentRoot /usr/local/apache/htdocs

Page 61: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 61

URI Translation

• Directives like Alias override the defaultDocumentRoot /usr/local/apache/htdocs

Alias /manual/ /usr/local/apache/manual/

<Directory /usr/local/apache/manual> ...

• Some URIs have no associated file, but Apache tries anyway<Location server-status>

...

Page 62: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 62

URI Translationclient request

URI-based init

PerlTransHandler

Page 63: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 63

PerlTransHandler

• Useful for overriding the Apache default

• Allows you to be extremely devious

• There are a few pitfalls of which to be aware

Page 64: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 64

Simple PerlTransHandler

• Object: be rid of those silly favicon.ico requests that end up 404

• Method: translate the incoming URI to a common place if it matches favicon.ico

Page 65: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 65

package Cookbook::Favicon;

use Apache::Constants qw(DECLINED);use strict;

sub handler {

my $r = shift;

$r->uri("/images/favicon.ico") if $r->uri =~ m!/favicon\.ico$!;

return DECLINED;}1;

Page 66: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 66

Setup

• add Favicon.pm to @INCServerRoot/lib/perl/Cookbook/Favicon.pm

• add to httpd.confPerlModule Cookbook::FaviconPerlTransHandler Cookbook::Favicon

• that's it!

Page 67: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 67

Why Not Use mod_rewrite?

• our Cookbook::Favicon is pretty much the same asRewriteRule /favicon.ico$ /images/favicon.ico

• Let's look at a more clever example...

Page 68: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 68

Mischievous Behavior

• Simple URI re-mapping is only the beginning

• Apache has this neat, built-in functionality called proxying– provided you have mod_proxy installed

• With mod_perl and mod_proxy you can proxy just about anything...

Page 69: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 69

Advanced PerlTransHandler

• Object: create a proxy that uses our local Apache documentation instead of ASF servers

• Method: intercept proxy requests and silently replace calls to http://httpd.apache.org/docs with /usr/local/apache/htdocs/manual

Page 70: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 70

Client Setup

Page 71: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 71

package My::ManualProxy;

use Apache::Constants qw(OK DECLINED);use strict;

sub handler {

my $r = shift;

return DECLINED unless $r->proxyreq;

my (undef, $file) = $r->uri =~ m!^http://(www|httpd).apache.org/(.*)!;

if ($file =~ m!^docs/!) {

$file =~ s!^docs/!manual/!;

$file = join "/", ($r->document_root, $file);

if (-f $file) {

$r->filename($file); # use local disk

$r->proxyreq(0); # fool mod_mime

return OK; } } return DECLINED;}1;

Page 72: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 72

Proxy in Action

GET http://httpd.apache.org/docs/mod/directives.html HTTP/1.1Accept: text/html, image/png, image/jpeg, image/gif, image/x-xbitmap, */*Accept-Encoding: deflate, gzip, x-gzip, identity, *;q=0Accept-Language: enConnection: Keep-Alive, TEHost: httpd.apache.orgUser-Agent: Mozilla/4.0 (compatible; MSIE 5.0; Windows 2000) Opera 5.12 [en]

HTTP/1.1 200 OKLast-Modified: Sat, 25 May 2002 22:15:27 GMTETag: "240c0-3360-3cf00cff"Accept-Ranges: bytesContent-Length: 13152Keep-Alive: timeout=15, max=100Connection: Keep-AliveContent-Type: text/html

Page 73: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 73

Winner Takes All

• The first URI translation handler to return OK ends the phase

• Return OK only when you map the file to disk yourself

• Return DECLINED all other times

Page 74: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 74

File-based Initializationclient request

URI-based init

URI translation

file-based init

Page 75: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 75

File-based Initialization

• URI has been mapped to a file$r->filename is now known

• We also know to which <Location> the request belongs

• Apache has no default behavior

• All configured handlers are run

Page 76: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 76

File-based Initializationclient request

URI-based init

URI translation

PerlHeaderParserHandler

Page 77: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 77

File-based Initializationclient request

URI-based init

URI translation

PerlInitHandler

Page 78: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 78

PerlHeaderParserHandler

• Non-specific hook

• Useful for adding processing that needs to occur on every request to a given URI

Page 79: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 79

Sample Usage...

• Parsing out the query string or POST data on each request is a pain

• For the most part, you know you need it to every request to a given <Location>

• Modularize the parsing code

Page 80: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 80

Apache::RequestNotes

• Apache::RequestNotes parses cookies and input parameters

• stores the data in pnotes() for later retrieval

Page 81: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 81

SetupAlias /perl-bin /usr/local/apache/perl-bin

<Location /perl-bin/> SetHandler perl-script PerlHandler Apache::Registry Options +ExecCGI

PerlInitHandler Apache::RequestNotes</Location>

Page 82: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 82

<Location> Processingclient request

URI-based init

URI translation

PerlInitHandler

my $input = $r->pnotes('INPUT'); # Apache::Table reference

my $uploads = $r->pnotes('UPLOADS'); # Apache::Upload array ref

my $cookies = $r->pnotes('COOKIES'); # hash reference

Page 83: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 83

Resource Controlclient request

URI-based init

URI translation

file-based init

resource control

Page 84: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 84

Resource Control

• Request is inside a particular <Location> container

• Apache provides three different layers of control to determine who gets access to the resource

• Client access checker

• User ID checker

• User authorization checker

Page 85: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 85

Resource Controlclient request

URI-based init

URI translation

file-based init

Client Access

User ID

User Authorization

Page 86: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 86

Resource Controlclient request

URI-based init

URI translation

file-based init

PerlAccessHandler

PerlAuthenHandler

PerlAuthzHandler

Page 87: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 87

User Access

• Used to make access decisions based on Client information– Client IP

– Client User-Agent

– Request URI

• mod_access controls Apache's default– Allow from localhost

• All configured handlers run

Page 88: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 88

PerlAccessHandler

• Useful for making same decisions as mod_auth

• do it in Perl

Page 89: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 89

Simple PerlAccessHandler

• Object: get debugging telnet sessions past Basic authentication

• Method: set the Authorization header to a known user if coming from localhost

Page 90: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 90

package My::DefaultLogin;

use Apache::Constants qw(OK);

use MIME::Base64 ();use Socket qw(sockaddr_in inet_ntoa);

use strict;

sub handler {

my $r = shift;

my $c = $r->connection;

my $local_ip = inet_ntoa((sockaddr_in($c->local_addr))[1]);

if ($c->remote_ip eq $local_ip) { my $user = 'bug'; my $passwd = 'squashing';

# Join user and password and set the incoming header. my $credentials = MIME::Base64::encode(join(':', $user, $passwd));

$r->headers_in->set(Authorization => "Basic $credentials"); }

return OK;}1;

Page 91: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 91

Setup

• add DefaultLogin.pm to @INCServerRoot/lib/perl/My/DefaultLogin.pm

• add to httpd.confPerlModule My::DefaultLogin

PerlAccessHandler My::DefaultLogin

• that's it!

Page 92: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 92

How to Deny Access

• Each access handler returns OK if the client meets its conditions

• Access handlers return FORBIDDEN to decline access

Page 93: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 93

Reality...

• In the real world, you could accomplish the same thing with the core Satisfy directiveAuthType BasicAuthName "cookbook"AuthUserFile .htpasswdRequire valid-userAllow from localhostSatisfy any

Page 94: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 94

Resource Controlclient request

URI-based init

URI translation

file-based init

Client Access

User ID

Page 95: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 95

User Authentication

• Apache default authentication mechanism is mod_auth

• Winner takes all

• uses a password file generated using Apache's htpasswd utilitygeoff:zzpEyL0tbgwwk

Page 96: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 96

User Authentication

• configuration placed in .htaccess file or httpd.conf

AuthUserFile .htpasswd AuthName "cookbook" AuthType Basic Require valid-user

Page 97: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 97

User Authentication

• configuration placed in .htaccess file or httpd.conf

AuthUserFile .htpasswd AuthName "cookbook" AuthType Basic Require valid-user

Page 98: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 98

How Authentication Works

• client requests a documentGET /perl-status HTTP/1.1Accept: text/xml, image/png, image/jpeg, image/gif, text/plainAccept-Charset: ISO-8859-1, utf-8;q=0.66, *;q=0.66Accept-Encoding: gzip, deflate, compress;q=0.9Accept-Language: en-usConnection: keep-aliveHost: www.example.comKeep-Alive: 300User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US)

• server denies requestHTTP/1.1 401 Authorization RequiredWWW-Authenticate: Basic realm="my site"Keep-Alive: timeout=15, max=100Connection: Keep-AliveTransfer-Encoding: chunkedContent-Type: text/html; charset=iso-8859-1

Page 99: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 99

How Authentication Works

client request

URI-based init

URI translation

file-based init

client request

Client Access

User ID

Page 100: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 100

How Authentication Works

client request

URI-based init

URI translation

file-based init

client request

HTTP/1.1 401 Authorization Required

Client Access

User ID

Page 101: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 101

How Authentication Works

client request

URI-based init

URI translation

file-based init

logging

client request

Client Access

User ID

Page 102: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 102

How Authentication Works

• client sends a new requestGET /perl-status HTTP/1.1Accept: text/xml, image/png, image/jpeg, image/gif, text/plainAccept-Charset: ISO-8859-1, utf-8;q=0.66, *;q=0.66Accept-Encoding: gzip, deflate, compress;q=0.9Accept-Language: en-usAuthorization: Basic Z2VvZmY6YWZha2VwYXNzd29yZA==Connection: keep-aliveHost: www.example.comKeep-Alive: 300User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US)

• server sends documentHTTP/1.1 200 OKKeep-Alive: timeout=15, max=99Connection: Keep-AliveTransfer-Encoding: chunkedContent-Type: text/html

Page 103: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 103

Resource Controlclient request

URI-based init

URI translation

file-based init

Client Access

PerlAuthenHandler

Page 104: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 104

Who Uses Flat Files?

• flat files are limiting, hard to manage, difficult to integrate, and just plain boring

• we can use the Apache API and Perl to replace flat files with our own authentication mechanism

Page 105: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 105

Do it in Perl

• since mod_perl gives us the ability to intercept the request cycle before Apache, we can authenticate using Perl instead

• Apache provides an API, making the job easy

• mod_perl provides access to the Apache API

Page 106: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 106

package My::Authenticate;

use Apache::Constants qw(OK DECLINED AUTH_REQUIRED);use strict;

sub handler {

my $r = shift;

# Let subrequests pass. return DECLINED unless $r->is_initial_req;

# Get the client-supplied credentials. my ($status, $password) = $r->get_basic_auth_pw;

return $status unless $status == OK;

# Perform some custom user/password validation. return OK if authenticate_user($r->user, $password);

# Whoops, bad credentials. $r->note_basic_auth_failure; return AUTH_REQUIRED;}

Page 107: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 107

Configuration

• change AuthUserFile .htpasswd AuthName "cookbook" AuthType Basic Require valid-user

• to PerlAuthenHandler My::Authenticate AuthName "cookbook" AuthType Basic Require valid-user

Page 108: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 108

The Choice is Yours

• how you decide to authenticate is now up to you

sub authenticate_user { my ($user, $pass) = @_;

return $user eq $pass;}

• are you seeing the possibilities yet?

Page 109: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 109

The Power of CPAN

• over 25 Apache:: shrink-wrapped modules on CPAN for authentication– SecureID

– Radius

– SMB

– LDAP

– NTLM

Page 110: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 110

To Infinity and Beyond!

• this example only covered Basic authentication via popup box

• the same techniques can be used to authenticate via a login form plus cookies, munged URLs, or hidden fields

• extended to use Digest authentication as well

Page 111: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 111

User Authorizationclient request

URI-based init

URI translation

file-based init

Client Access

User ID

User Authorization

Page 112: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 112

User Authorization

• We now know the user has supplied a valid password

• now it's up to us to decide if we want the user to have access

Page 113: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 113

User Authorization

• Apache's default behavior varies, depending on the syntax of Require– Require valid-user

– Require user foo

– Require group bar

– Require file-owner

– Require file-group

• Winner takes all

Page 114: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 114

User Authorizationclient request

URI-based init

URI translation

file-based init

Client Access

User ID

PerlAuthzHandler

Page 115: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 115

PerlAuthzHandler

• Key is the requires() method• $r->requires() returns an array of

hashes representing all Require directivesRequire user grier ryanRequire group admiral

[ { requirement => 'user grier ryan', method => -1},

{ requirement => 'group admiral', method => -1},];

Page 116: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 116

PerlAuthzHandler

• Once you get the Require directive back, you can decide which users meet the authorization requirement

foreach my $requires (@{$r->requires}) {

my ($directive, @list) = split " ", $requires->{requirement};

# We're ok if only valid-user was required. return OK if lc($directive) eq 'valid-user';

# Likewise if the user requirement was specified and # we match based on what we already know. return OK if lc($directive) eq 'user' && grep { $_ eq $r->user } @list;}

Page 117: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 117

MIME-type Checkingclient request

URI-based init

MIME setting

URI translation

file-based init

resource control

Page 118: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 118

MIME-type Checking

• We now have the physical resource and have decided the user can see it

• time to set the Content-Type header

• Apache's default is mod_mime, which examines the file extension

• mod_mime also decides which content handler will run– AddHandler server-parsed

• Winner takes all

Page 119: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 119

MIME-type Checkingclient request

URI-based init

PerlTypeHandler

URI translation

file-based init

resource control

Page 120: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 120

PerlTypeHandler

• mod_mime has a stranglehold on the request– if you set the Content-Type and return OK, mod_mime won't set the content handler

– if you set the Content-Type and return DECLINED, mod_mime will clobber the Content-Type

• Best to just forget about the PerlTypeHandler

Page 121: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 121

Fixupsclient request

URI-based init

URI translation

file-based init

resource controlMIME setting

fixups

Page 122: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 122

Fixups

• The final chance to fiddle with the request before content is written to the client

• non-specific phase

• Apache has no default behavior

• All configured handlers will be run

Page 123: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 123

Fixupsclient request

URI-based init

URI translation

file-based init

resource controlMIME setting

PerlFixupHandler

Page 124: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 124

PerlFixupHandler

• Good place to do anything you might have wanted to do in the PerlTypeHandler

• especially setting $r->handler()

Page 125: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 125

Sample Fixup

• Object: re-implement XBitHack in Perl

• Method: after some basic checks, turn the request over to mod_include using $r->handler()

Page 126: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 126

package Cookbook::XBitHack;

use Apache::Constants qw(OK DECLINED OPT_INCLUDES);use Apache::File;use Fcntl qw(S_IXUSR S_IXGRP);use strict;

sub handler {

my $r = shift;

return DECLINED unless (-f $r->finfo && # the file exists $r->content_type eq 'text/html' && # and is HTML $r->allow_options & OPT_INCLUDES); # and we have Options +Includes

# Find out the user and group execution status. my $mode = (stat _)[2];

# We have to be user executable specifically. return DECLINED unless ($mode & S_IXUSR); # Set the Last-Modified header if group executable. $r->set_last_modified((stat _)[9]) if ($mode & S_IXGRP);

# Make sure mod_include picks it up. $r->handler('server-parsed');

return OK;}1;

Page 127: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 127

Content Generationclient request

URI-based init

MIME setting

URI translation

file-based init

resource control

content

fixups

Page 128: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 128

Content Generation

• Apache's default content handler is default-handler, which takes care of all HTTP/1.1 events– byteserving

– cache headers

• one and only one C module gets to handle content-generation– mod_cgi

– mod_perl

– mod_include

Page 129: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 129

Content Generationclient request

URI-based init

URI translation

file-based init

resource controlMIME setting

PerlHandler

fixups

Page 130: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 130

Sample PerlHandler

• create a handler that uses CPAN module HTML::Clean to "clean" outgoing documents

• alter the handler to take advantage of more advanced mod_perl features

Page 131: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 131

package TPC::Clean;

use Apache::Constants qw(OK DECLINED);use Apache::File;use HTML::Clean;use strict;

sub handler {

my $r = shift;

my $fh = Apache::File->new($r->filename) or return DECLINED;

my $dirty = do {local $/; <$fh>};

my $h = HTML::Clean->new(\$dirty); $h->level(3); $h->strip;

$r->send_http_header('text/html'); print ${$h->data};

return OK;}1;

Page 132: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 132

Configuration

• add directives to httpd.conf to mirror DocumentRoot

Alias /clean /usr/local/apache/htdocs

<Location /clean> SetHandler perl-script PerlHandler TPC::Clean</Location>

Page 133: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 133

Results

• original: 202 bytes<html>

<body>

<form method="GET" action="/foo">

Text: <input type="text" name="foo"><br>

<input type="submit">

</form>

<strong>hi there </strong>

</body>

</html>

Page 134: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 134

The Choice is Yours!

• Content filtering with Apache::Filter?

• Using proper cache-friendly headers?

Page 135: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 135

Magic

• filtered content generation impossible with Apache 1.3– can't send CGI output through mod_ssi

– reason for output filters in Apache 2.0

• mod_perl has had output filtering for years– possible due to Perl's TIEHANDLE interface

Page 136: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 136

TIEHANDLE in mod_perl

• mod_perl tie()s STDOUT to the Apache class prior to the content generation phase

• you can tie() STDOUT as well and override mod_perl's default behavior

• very useful with stacked handlers

Page 137: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 137

Stacked Handlers

• for each phase of the request, mod_perl will run any registered Perl handlers for that phase

• you can register more than one Perl handler per phase

• whether all handlers are called depends on the syntax of the phase itself in Apache

Page 138: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 138

Stacked Handlers

• some phase run until the handler list is exhausted

PerlLogHandler My::DBLoggerPerlLogHandler My::FileLogger

• some phases run until one handler returns OK

PerlTransHandler My::TranslateHTMLPerlTransHandler My::TranslateText

• all phases terminate on "error"

Page 139: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 139

Stacked Content Handlers

• for the content generation phase, running multiple Perl handlers can be incredibly powerful

• Apache::Filter implements a simple interface for pipelining content handlers

• uses TIEHANDLE in the background

Page 140: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 140

Now for the Fun Part

• modify our handler to work either standalone or as part of a handler chain

• easy using Apache::Filter

Page 141: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 141

Apache::Filter Changes

• change: my $fh = Apache::File->new($r->filename) or return DECLINED;

• to: my $fh = undef;

if (lc $r->dir_config('Filter') eq 'on') { $r = $r->filter_register;

($fh, my $status) = $r->filter_input; return $status unless $status == OK } else { $fh = Apache::File->new($r->filename) or return NOT_FOUND; }

Page 142: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 142

Configuration• change

Alias /clean /usr/local/apache/htdocs<Location /clean> SetHandler perl-script PerlHandler My::Clean</Location>

• toPerlModule Apache::Filter

Alias /clean /usr/local/apache/htdocs<Location /clean> SetHandler perl-script PerlHandler My::Clean PerlSetVar Filter On</Location>

Page 143: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 143

Apache::Filter

• Apache::Filter adds methods to the Apache class– do not have to use Apache::Filter;

– do have to PerlModule Apache::Filter

• because filtering content is tricky, the interface is quirky$r = $r->filter_register;

Page 144: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 144

Apache::Filter

• to get at filtered input, call $r->filter_input()

• returns an open filehandle on the input stream

• if the first filter, the filehandle is for $r->filename()

• all filters use the same API, regardless of their position in the chain

Page 145: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 145

So What?

• new Apache::Filter aware code works the same as the standalone module

• can be used as part of a PerlHandler chain

• can be any part of the chain

Page 146: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 146

Compressing Output

• use Apache::Compress– available from CPAN

– Checks the Accept-Encoding header

– uses Compress::Zlib to compress output

– Apache::Filter aware

Page 147: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 147

Configuration

• changeAlias /clean /usr/local/apache/htdocs<Location /clean> SetHandler perl-script PerlHandler My::Clean PerlSetVar Filter On</Location>

• toAlias /clean /usr/local/apache/htdocs<Location /clean> SetHandler perl-script PerlHandler My::Clean Apache::Compress PerlSetVar Filter On</Location>

Page 148: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 148

• http://perl.apache.org/index.html

– straight HTML• 35668 bytes

– My::Clean• 28162 bytes (79%)

– Apache::Compress• 8177 bytes (23%)

– My::Clean + Apache::Compress• 7458 bytes (21%)

Stacked Power

Page 149: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 149

Caveats

• using Apache::Filter is actually a bit more complex than this...

• see the recent version of Apache::Clean to get an idea

Page 150: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 150

Cache Headers

• we often think of dynamic content as "could be different on any given access"

• "dynamic" content can also be static with clearly defined factors that can change its meaning

• by properly managing HTTP/1.1 cache headers, we can reduce strain on our servers

Page 151: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 151

Conditional GET Request

• HTTP/1.1 allows for a conditional GET request

• clients are allowed to use cached content based on information about the resource

• information is provided by both the client and the server

Page 152: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 152

GET /manual/index.html HTTP/1.1Accept: text/html, image/png, image/jpeg, image/gif, image/x-xbitmap, */*Accept-Charset: windows-1252;q=1.0, utf-8;q=1.0, utf-16;q=1.0, iso-8859-1;q=0.6, *;q=0.1Accept-Encoding: deflate, gzip, x-gzip, identity, *;q=0Accept-Language: enConnection: Keep-Alive, TEHost: mainsheet.laserlink.netTE: deflate, gzip, chunked, identity, trailersUser-Agent: Mozilla/4.0 (compatible; MSIE 5.0; Windows XP) Opera 6.03 [en]

HTTP/1.1 200 OKLast-Modified: Thu, 01 Nov 2001 16:35:27 GMTETag: "4c949-2434-3be179cf"Accept-Ranges: bytesContent-Length: 9268Connection: closeContent-Type: text/html

Page 153: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 153

Conditional GET Request

• for static documents, Apache takes care of making our response cache-friendly

• since the file is on disk, Apache can determine when the file was last changed

• with static files, local modification is the only factor

• still too many rules to keep straight

• Apache provides an API to use so we don't have to think too much

Page 154: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 154

Now for the Fun Part

• modify our handler to be "cache friendly"

• send 304 when the document hasn't changed

• properly handle If-* header comparisons

Page 155: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 155

How do you define change?

• when dynamically altering static documents there are a number of factors to consider– when the file changes on disk

– when the code changes

– when the options to the code changes

• all of these affect the "freshness" of the document

Page 156: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 156

Code Changes

• in order to determine when the code itself changes, we need to mark the modification time of the package

• at request time, we call an API to compare the package modification to the If-Modified-Since header

• on reloads, we regenerate the package modification time

Page 157: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 157

package My::Clean;

use Apache::Constants qw(OK DECLINED);use Apache::File;use HTML::Clean;use strict;

# Get the package modification time...(my $package = __PACKAGE__) =~ s!::!/!g;my $package_mtime = (stat $INC{"$package.pm"})[9];

Page 158: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 158

Configuration Changes

• in order to determine when the options to the code change, we need to mark the modification time of httpd.conf

• at request time, we call an API to compare the configuration modification to the If-Modified-Since header

• on restarts, we regenerate the configuration modification time

Page 159: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 159

package My::Clean;

use Apache::Constants qw(OK DECLINED);use Apache::File;use HTML::Clean;use strict;

# Get the package modification time...(my $package = __PACKAGE__) =~ s!::!/!g;my $package_mtime = (stat $INC{"$package.pm"})[9];

# ...and when httpd.conf was last modifiedmy $conf_mtime = (stat Apache->server_root_relative('conf/httpd.conf'))[9];

# When the server is restarted we need to# make sure we recognize config file changes and propigate# them to the client to clear the client cache if necessary.Apache->server->register_cleanup(sub { $conf_mtime = (stat Apache->server_root_relative('conf/httpd.conf'))[9];});

Page 160: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 160

Resource Changes

• in order to determine when the resources changes, we need to mark the modification time of $r->filename

• at request time, we call an API to compare the resource modification to the If-Modified-Since header

• resource modification is checked on each request

Page 161: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 161

package My::Clean;

use Apache::Constants qw(OK DECLINED);use Apache::File;use HTML::Clean;use strict;

# Get the package modification time...(my $package = __PACKAGE__) =~ s!::!/!g;my $package_mtime = (stat $INC{"$package.pm"})[9];

# ...and when httpd.conf was last modifiedmy $conf_mtime = (stat Apache->server_root_relative('conf/httpd.conf'))[9];

# When the server is restarted we need to# make sure we recognize config file changes and propigate# them to the client to clear the client cache if necessary.Apache->server->register_cleanup(sub { $conf_mtime = (stat Apache->server_root_relative('conf/httpd.conf'))[9];});

sub handler {

...

}

1;

Page 162: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 162

sub handler {

my $r = shift;

my $fh = Apache::File->new($r->filename) or return DECLINED;

my $dirty = do {local $/; <$fh>};

my $h = HTML::Clean->new(\$dirty); $h->level(3); $h->strip;

$r->update_mtime($package_mtime); $r->update_mtime((stat $r->finfo)[9]); $r->update_mtime($conf_mtime);

$r->set_last_modified; $r->set_etag; $r->set_content_length(length ${$h->data});

# only send the file if it meets cache criteria if ((my $status = $r->meets_conditions) == OK) { $r->send_http_header('text/html'); } else { return $status; }

print ${$h->data}; return OK;}

Page 163: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 163

Loggingclient request

URI-based init

MIME setting

URI translation

file-based init

resource control

fixups

content

logging

Page 164: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 164

Logging

• Apache's default is to use mod_log_config in common formatLogFormat "%h %l %u %t \"%r\" %>s %b" commonCustomLog logs/access_log common

• Most people tweak this to combinedLogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined

• The connection to the client is still open!

• All configured handlers run

Page 165: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 165

Loggingclient request

URI-based init

URI translation

file-based init

resource controlMIME setting

fixups

content

PerlLogHandler

Page 166: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 166

PerlLogHandler

• Useful for logging using interfaces in which Perl shines– like databases

Page 167: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 167

Logging to a Database

Page 168: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 168

Logging to a Database

• Logging directly to a database makes life easier if you have an application for which you need lots of reports

• DBI rules

Page 169: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 169

package Cookbook::SiteLog;

use Apache::Constants qw(OK);use DBI;use strict;

sub handler { my $r = shift;

my $dbh = DBI->connect($r->dir_config('DBASE'), {RaiseError => 1, AutoCommit => 1, PrintError => 1}) or die $DBI::errstr;

my %columns = ( status => $r->status, bytes => $r->bytes_sent, language => $r->headers_in->get('Accept-Language'), );

my $fields = join "$_,", keys %columns; my $values = join ', ', ('?') x values %columns;

my $sql = qq( insert into www.sitelog (hit, servedate, $fields) values (hitsequence.nextval, sysdate, $values) );

my $sth = $dbh->prepare($sql); $sth->execute(values %columns);

return OK;}1;

Page 170: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 170

Cleanupsclient request

URI-based init

MIME setting

URI translation

file-based init

resource control

fixups

content

logging

cleanups

Page 171: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 171

Cleanups

• Apache doesn't really have a cleanup phase

• It calls a function when the request memory pool is destroyed

• The connection to the client is closed

Page 172: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 172

Cleanupsclient request

URI-based init

MIME setting

URI translation

file-based init

resource control

fixups

content

logging

PerlCleanupHandler

Page 173: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 173

PerlCleanupHandler

• Generally used to do any end of request cleanups– Apache::File::tmpfile() removes its

temporary file here

• Also good for logging– no active browsers

Page 174: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 174

Debugging

• Let's examine a very conceptual debugging cleanup handler– I actually did use it for a while

Page 175: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 175

package Cookbook::TraceError;

use Apache::Constants qw(OK SERVER_ERROR DECLINED);use Apache::Log;use strict;

sub handler { my $r = shift;

# Don't do anything unless the main process errors. return DECLINED unless $r->is_initial_req && $r->status == SERVER_ERROR;

my $old_loglevel = $r->server->loglevel(Apache::Log::DEBUG); my $old_trace = DBI->trace(2);

# Start the debuggging request. my $sub = $r->lookup_uri($r->uri);

# run() would ordinarily send content to the client, but # since we're in cleanup, the connection is already closed. $sub->run;

# Reset things back to their original state - # loglevel(N) will persist for the lifetime of the child process. DBI->trace($old_trace); $r->server->loglevel($old_loglevel);

return OK;}1;

Page 176: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 176

Fine Manuals• Writing Apache Modules with Perl and C

– http://www.modperl.com/

• mod_perl Developer's Cookbook– http://www.modperlcookbook.org/

• Practical mod_perl– http://www.modperlbook.org/

• mod_perl Pocket Reference– http://www.refcards.com/

• mod_perl Guide– http://perl.apache.org/guide/

• mod_perl at the ASF– http://perl.apache.org/

Page 177: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 177

Materials

These slides freely available at

http://www.modperlcookbook.org/~geoff/

Page 178: 1  Programming the Apache Lifecycle Geoffrey Young geoff@modperlcookbook.org

http://www.modperlcookbook.org/ 178

Book Signing, Chat, etc.

Thursday, 2 PM at Powell's