###########################################

You must rewrite the 

performance.pod
=head2 Preload Perl modules - Real Numbers

It's a crap!!!

First define the goal of testing and probably using GTop to make the
process easier.

###########################################

debug.pod:

Collect all these notes about stack traces generation (here in this file)

the stacktrace looks right.  it would be more useful to see the line
number, which you can see if you follow this tip for building mod_perl 
from the SUPPORT doc:

=item CORE DUMPS

If you get a core dump, please send a backtrace if possible.
Before you try, build mod_perl with perl Makefile.PL PERL_DEBUG=1
which will:
 -add `-g' to EXTRA_CFLAGS
 -turn on PERL_TRACE
 -set PERL_DESTRUCT_LEVEL=2 (additional checks during Perl cleanup)
 -link against libperld if it exists



###########################################

Add a new section: 

Transitioning from Apache::Registry to Apache handlers

That should be pretty easy to convert, since you already have your main
program logic off in a separate module.

> Should I just use Apache::Request instead of CGI.pm? (I do use the
> cookie & checkbox/pulldown functionality often). 

Use Apache::Request and Apache::Cookie (both in the libapreq
distribution).  I don't use the sticky forms stuff in CGI.pm, so I don't
know if there's a handy replacement.  Check CPAN or just roll your own.
Maybe you could make a subclass of Apache::Request that adds somemethods
for this.

> If there are any tutorials out there, I'd love some links. 

It really isn't difficult enough to merit a tutorial, in my opinion.  It's
just ordinary perl module stuff. 



###########################################

eagle book appendix B lists all the Makefile.PL options (we better
include them all)

###########################################

> > META: Does it matter where we declare Apache::Peek, i.e. before/after
> > Apache::Status?
> > 
> > No, it doesn't.
> 
> I asked this because when you talked about Apache::Status you made it
> very clear that it *had to be first* so I kind of felt that since you'd
> been explicit about that you ought to have said that the order of
> Apache::Peek didn't matter -- since both Apache::Status and Apache::DBI
> have order dependencies I think you should say which one's don't have  
> order dependencies.


###########################################

A reader suggested:

> It might be helpful to replicate some information about the the Apache Life
> Cycle ( Eagle book pg 56 ), before talking about server startup file.

config.pod

###########################################

debug: =head3 Safe Resource Locking

You must review the section and correct this issue:

If the file is reopened in the next script invocation, the previous fh
will be closed and unlocked, but only from within the same process. 

Regarding leakages, if you use open IN, ... probably there is no
leakage because it's the same handler. In case of gensym or IO::File
use you should check this issue.

All the above is tentative and should be validated.

###########################################

I think that the Syg::Signal is being duplicated in two places!

###########################################

From: "Joseph R. Junkin" <jjunkin@datacrawler.com>
To: Stas Bekman <sbekman@stason.org>
Subject: Re: [summary+rfc] When One Machine is not Enough...


I doubt it will help, but you are free to look at a talk I gave:

Analysis of the Open Source Application Platform
http://www.datacrawler.com/talk/tech/platform/

It addresses the scalability issues you have already covered.


###########################################

die() issue:

merge the 
porting.html#die_and_mod_perl
snippets.html#Redirecting_Errors_to_the_Client

add notes from Matt's email: 
Warning: $SIG{__DIE__} considered dangerous
regarding of eval {} if $@ try/catch use

###########################################

===>

===>

===>

===>

===>

===>

===>

===>

===>

===>

===>

===>

===>

===>

===>

===>


===>

===>

###########################################

Check whether this one is documented somewhere else too:

install: What Compiler Should Be Used to Build mod_perl?

I'm not sure, but Ged says it is.

###########################################

CREDITS! CREDITS! CREDITS! put all the credits in place, especially
Ged!!!

###########################################

Mention the Apache::Watchdog::RunAway in the debug.pod where you have
a META to fill.

###########################################

There are lots of META tags in the Guide, it's a time to go and fill
in the missing details


###########################################




> Are you familiar with Apache::Resource and setrlimit? If so what do the
> soft and hard limits mean? Is soft is the one that can be temperarely
> be lower that the current usage of something? And the hard limit is the
> one that if reached it triggers the killoff?

>From BSD's man page for setrlimit:

     A resource limit is specified as a soft limit and a hard limit.  When a
     soft limit is exceeded a process may receive a signal (for example, if
     the cpu time or file size is exceeded), but it will be allowed to contin-
     ue execution until it reaches the hard limit (or modifies its resource
     limit).  The rlimit structure is used to specify the hard and soft limits
     on a resource,
     ...
     The system refuses to extend the data or stack space when the limits
     would be exceeded in the normal way: a brk(2) call fails if the data
     space limit is reached.  When the stack limit is reached, the process re-
     ceives a segmentation fault (SIGSEGV); if this signal is not caught by a
     handler using the signal stack, this signal will kill the process.

     A file I/O operation that would create a file larger that the process'
     soft limit will cause the write to fail and a signal SIGXFSZ to be gener-
     ated; this normally terminates the process, but may be caught.  When the
     soft cpu time limit is exceeded, a signal SIGXCPU is sent to the offend-
     ing process.

###########################################

control.pod: 

> > META: Stas: I would not have renamed apachectl to httpd_perl if I had
> > META: Stas: renamed httpd to httpd_perl.  This will be confusing for a
> > META: Stas: beginner.  How about 'httpd_perl_ctl start' etc?
> >
> > Neither symbolic links nor httpd_perl/docs themselves are visiable from
> > PATH. Do you want to change the symlink name?
>
> I don't know if I've misunderstood you, or if I've failed to make my
> point well.  It wasn't about PATH.  My thought was that if our reader
> creates a binary called `/usr/whatever/httpd_perl' and also renames a
> script from `/etc/init.d/apachectl' to `/etc/init.d/httpd_perl' he has
> *two* files called httpd_perl and that could confuse him, at a time
> when he's already likely to be confused about many other things.  The
> symlinks are OK because they all have a `S87...' (or whatever) prefix.
















###########################################


===>

===>

===>

===>

===>

===>

===>

===>


===>

===>

===>


===>

===>

###########################################

Jeffrey W. Baker said:

How about mod_backhand?  http://www.backhand.org/.  It is capable of not
only buffering, but also load balancing and failover.

If all you want is buffering, it would be very easy to write a small
program that accepted http requests and forwarded them to another
daemon, then buffered the response.  Perhaps this would be a good
weekend project for one of us.

I've been tinkering with the idea of writing an httpd.  As much as I
like Apache, there are many things I don't like about it, and a ton of
functionality that I won't ever need.  All I really want is a web server
that:

1) Is multithreaded or otherwise highly parallel.
2) Has a request stage interface like Apache's
3) Uses async I/O

Number 3 would releive us of this squid hack that we are all using. 
That doesn't seem too hard.


###########################################

Server split issue:

(Greg Stark)

I've learned the hard way that a proxy does not completely replace the need to
put images and other other static components on a separate server. There are
two reasons that you really really want to be serving images from another
server (possibly running on the same machine of course).

1) Netscape/IE won't intermix slow dynamic requests with fast static requests
   on the same keep-alive connection

2) static images won't be delayed when the proxy gets bogged down waiting on
   the backend dynamic server.

Both of these result in a very slow user experience if the dynamic content
server gets at all slow -- even out of proportion to the slowdown. 

Eg, if the dynamic content generation becomes slow enough to cause a 2s
backlog of connections for dynamic content, then a proxy will not protect the
static images from that delay. Netscape or IE may queue those requests after
another dynamic content request, and even if they don't the proxy server will
eventually have every slot taken up waiting on the dynamic server. 

So *every* image on the page will have another 2s latency, instead of just a
2s latency for the entire page. This is worst in Netscape of course course
where the page can't draw until all the images sizes are known.

This doesn't mean having a proxy is a bad idea. But it doesn't replace putting
your images on pics.mydomain.foo even if that resolves to the same address and
run a separate apache instance for them.

===>


> > I think if you can avoid hitting a mod_perl server for the images,
> > you've won more than half the battle, especially on a graphically
> > intensive site.
> 
> I've learned the hard way that a proxy does not completely replace the need
to
> put images and other other static components on a separate server. There are
> two reasons that you really really want to be serving images from another
> server (possibly running on the same machine of course).

I agree that it is correct to serve images from a lightweight server
but I don't quite understand how these points relate.  A proxy should
avoid the need to hit the backend server for static content if the
cache copy is current unless the user hits the reload button and
the browser sends the request with 'pragma: no-cache'.

> 1) Netscape/IE won't intermix slow dynamic requests with fast static requests
>    on the same keep-alive connection

I thought they just opened several connections in parallel without regard
for the type of content.

> 2) static images won't be delayed when the proxy gets bogged down waiting on
>    the backend dynamic server.

Is this under NT where mod_perl is single threaded?  Serving a new request
should not have any relationship to delays handling other requests on
unix unless you have hit your child process limit.

> Eg, if the dynamic content generation becomes slow enough to cause a 2s
> backlog of connections for dynamic content, then a proxy will not protect the
> static images from that delay. Netscape or IE may queue those requests after
> another dynamic content request, and even if they don't the proxy server will
> eventually have every slot taken up waiting on the dynamic server. 

A proxy that already has the cached image should deliver it with no
delay, and a request back to the same server should be serviced
immediately anyway.

> So *every* image on the page will have another 2s latency, instead of just a
> 2s latency for the entire page. This is worst in Netscape of course course
> where the page can't draw until all the images sizes are known.

Putting the sizes in the IMG SRC tag is a good idea anyway.

> This doesn't mean having a proxy is a bad idea. But it doesn't replace
putting
> your images on pics.mydomain.foo even if that resolves to the same address
and
> run a separate apache instance for them.

This is a good idea because it is easy to move to a different machine
if the load makes it necessary.  However, a simple approach is to
use a non-mod_perl apache as a non-caching proxy front end for the
dynamic content and let it deliver the static pages directly.  A
short stack of RewriteRules can arrange this if you use the 
[L] or [PT] flags on the matches you want the front end to serve
and the [P] flag on the matches to proxy.

===>

> I agree that it is correct to serve images from a lightweight server
> but I don't quite understand how these points relate.  A proxy should
> avoid the need to hit the backend server for static content if the
> cache copy is current unless the user hits the reload button and
> the browser sends the request with 'pragma: no-cache'.

I'll try to expand a bit on the details:

> > 1) Netscape/IE won't intermix slow dynamic requests with fast static
requests
> >    on the same keep-alive connection
> 
> I thought they just opened several connections in parallel without regard
> for the type of content.

Right, that's the problem. If the two types of content are coming from the
same proxy server (as far as NS/IE is concerned) then they will intermix the
requests and the slow page could hold up several images queued behind it. I
actually suspect IE5 is cleverer about this, but you still know more than it
does.

By putting them on different hostnames the browser will open a second set of
parallel connections to that server and keep the two types of requests
separate.

> > 2) static images won't be delayed when the proxy gets bogged down waiting
on
> >    the backend dynamic server.

Picture the following situation: The dynamic server normally generates pages
in about 500ms or about 2/s; the mod_perl server runs 10 processes so it can
handle 20 connections per second. The mod_proxy runs 200 processes and it
handles static requests very quickly, so it can handle some huge number of
static requests, but it can still only handle 20 proxied requests per second.

Now something happens to your mod_perl server and it starts taking 2s to
generate pages. The proxy server continues to get up to 20 requests per second
for proxied pages, for each request it tries to connect to the mod_perl
server. The mod_perl server can now only handle 5 requests per second though.
So the proxy server processes quickly end up waiting in the backlog queue. 

Now *all* the mod_proxy processes are in "R" state and handling proxied
requests. The result is that the static images -- which under normal
conditions are handled quicly -- become delayed until a proxy process is
available to handle the request. Eventually the backlog queue will fill up and
the proxy server will hand out errors.

> This is a good idea because it is easy to move to a different machine
> if the load makes it necessary.  However, a simple approach is to
> use a non-mod_perl apache as a non-caching proxy front end for the
> dynamic content and let it deliver the static pages directly.  A
> short stack of RewriteRules can arrange this if you use the 
> [L] or [PT] flags on the matches you want the front end to serve
> and the [P] flag on the matches to proxy.

That's what I thought. I'm trying to help others avoid my mistake :)

Use a separate hostname for your pictures, it's a pain on the html authors but
it's worth it in the long run.

=====>

> > > 1) Netscape/IE won't intermix slow dynamic requests with fast static
requests
> > >    on the same keep-alive connection
> > 
> > I thought they just opened several connections in parallel without regard
> > for the type of content.
> 
> Right, that's the problem. If the two types of content are coming from the
> same proxy server (as far as NS/IE is concerned) then they will intermix the
> requests and the slow page could hold up several images queued behind it. I
> actually suspect IE5 is cleverer about this, but you still know more than it
> does.

They have a maximum number of connections they will open at once
but I don't think there is any concept of queueing involved. 

> > > 2) static images won't be delayed when the proxy gets bogged down waiting
on
> > >    the backend dynamic server.
> 
> Picture the following situation: The dynamic server normally generates pages
> in about 500ms or about 2/s; the mod_perl server runs 10 processes so it can
> handle 20 connections per second. The mod_proxy runs 200 processes and it
> handles static requests very quickly, so it can handle some huge number of
> static requests, but it can still only handle 20 proxied requests per second.
> 
> Now something happens to your mod_perl server and it starts taking 2s to
> generate pages.


The 'something happens' is the part I don't understand.  On a unix
server, nothing one httpd process does should affect another
one's ability to serve up a static file quickly, mod_perl or
not.  (Well, almost anyway). 

> The proxy server continues to get up to 20 requests per second
> for proxied pages, for each request it tries to connect to the mod_perl
> server. The mod_perl server can now only handle 5 requests per second though.
> So the proxy server processes quickly end up waiting in the backlog queue. 

If you are using squid or a caching proxy, those static requests
would not be passed to the backend most of the time anyway. 

> Now *all* the mod_proxy processes are in "R" state and handling proxied
> requests. The result is that the static images -- which under normal
> conditions are handled quicly -- become delayed until a proxy process is
> available to handle the request. Eventually the backlog queue will fill up
and
> the proxy server will hand out errors.

But only if it doesn't cache or know how to serve static content itself.

> Use a separate hostname for your pictures, it's a pain on the html authors
but
> it's worth it in the long run.

That depends on what happens in the long run. If your domain name or
vhost changes, all of those non-relative links will have to be
fixed again.

==>

> Welcome to the real world however where "something" can and does happen.
> Developers accidentally put untuned SQL code in a new page that takes too
long
> to run. Database backups slow down normal processing. Disks crash slowing
down
> the RAID array (if you're lucky). Developers include dependencies on services
> like mail directly in the web server instead of handling mail asynchronously
> and mail servers slow down for no reason at all. etc.

Of course.  I have single httpd processes screw up all the time.  They
don't affect the speed of other httpd processes unless they consume
all of the machine's resources or lock something in common.  I suppose
if you have a small limit on the number of backend programs you
could get to a point where they are all busy doing something wrong. 

> > If you are using squid or a caching proxy, those static requests
> > would not be passed to the backend most of the time anyway. 
> 
> Please reread the analysis more carefully. I explained that. That is
> precisely the scenario I'm describing faults in.

I read it, but just wasn't convinced.  I'd like to understand this
better, though.  What did you do to show that there is a difference
when netscape accesses different hostnames for fast static content
as opposed to the same one where a cache responds quickly but
dynamic content is slow?  I thought Netscape would open 6 or so
separate connections regardless and would only wait if all 6
were used.  That is, it should not make anything wait unless you
have dynamically-generated images (or redirects) tying up the
other connections besides the one supplying the main html.  Do
you have some reason to think it will open fewer connections 
if they are all to the same host? 


###########################################


Makefile.PL: build.pl gets called twice when running from CPAN
shell. Because it does 'make' and 'make install' and both call
'manifypods'. Need to add a source change control, so if manifypods
was called once for make, it won't be called for 'make install'


###########################################


###########################################

When you do use locking, be very very careful if you use Apache::DBI or
similar persistent connections. MySQL threads keep tables locked until
the thread ends (connection is closed) or the tables are unlocked. If your
session die()'s while tables are locked, they will stay neatly locked as
your connection won't be closed either .... This was a nasty one I bumped
in to ...

###########################################

scenario: "One Light and One Heavy Server where ALL htmls are
Perl-Generated" introduced a lot of info duplication in its tricks
section! remove/modify/merge it.

###########################################

What's needed in order to sucessfully debug segfaulting modules under gdb:

Apache::DB/ httpd -X -D DEBUG

> if you set OPTMIZE => '-g', in the Makefile.PL and start httpd under gdb,
> it's easy to debug.
###########################################


###########################################


###########################################


###########################################
(merge of status.pod and debug.pod)

I think of merging the Apache::Status and the Debug sections since
both are very relative and Apache::Status allows you to debug the code
in some extend.

 META: I think to move here the code to trap errors, to produce nice
 messages to user: when the error occurs because of the user mistake,
 or something went wrong on the server side. We user errors, add some
 code that deployes a CGI's stickiness of variables, so you display
 just the erroneous fields (give an code snippet from User Subscribe
 from singlesheaven). Notice that error handling is an art, if you are
 really concerned for users to be loyal and stay with your service.

like Doug said:

A virtuous Apache module must let at least two people know when a
problem has occurred: you, the module's author, and the remote user.
You can communicate errors and other exception conditions to yourself
by writing out entries to the server log.  For alerting the user when
a problem has occurred, you can take advantage of the simple but
flexible Apache ErrorDocument system, use I<CGI::Carp>, or roll your
own error handler.

#####################################################################

Important for both book and guide: The strategy chapter talks about
performance improve among other things. The performance chapter
doesn't mention it (refer to it). But this is a very important part of
it.

#####################################################################

Include very important performance improve notes from:

http://www.apache.org/docs/misc/perf-tuning.html
http://www.apache.org/docs/misc/perf.html

#####################################################################

#####################################################################

add benchmarks with keep-alive and without them!


#####################################################################

Notice this package in debug section.

Devel::Symdump - dump symbol names or the symbol table

Apache::Symdump - shows 

Apache::Status uses it to show the process' internals



#####################################################################


#####################################################################


#####################################################################

#####################################################################

#####################################################################

Describe the BackLog (performance...)

On that note you might want to set the BackLog parameter (I forget the precise
name), it depends on whether you want users to wait indefinitely or just get
an error.




#####################################################################

> > What is the best way to have a Location directive apply to an entire
> > site except for a single directory?
> 
> Set the site-wide handler in a <Location "/"> and override the handler
> for the "register" dir by setting the default handler in <Location
> "/register">.  Unfortuntaly, I don't know the name of the default
> handler.

   SetHandler default-handler



#####################################################################

META: add a section about setting and passing environment variables:
It includes and merges (PerlSetVar, SetVar and Pass*), %ENV, (creating
your own directives?), subprocess

Notes:


* I'd suggest using $r->subprocess_env() instead.
I guess %ENV will work in many situations, but it might bite you later
when you can't figure out why a particular env variable isn't getting set
in certain situations (speaking from experience).


* I was going to suggest that too.  %ENV controls the environment
of the currently running Perl process, but child processes come from
the "subprocess env", which only the call above sets.




#####################################################################


#######################################################################

Add a MOD_PERL_TRACE=all example...

An email:

> > > Any suggestions?  How might I debug this?
> > 
> > hmm, can you put a warn() trace in your sub SiteMap, I wonder if it's
> > called the first time, but util.pm is not reloaded when Apache restarts
> > itself on startup.  
> > any difference if you turn Off PerlFreshRestart?
> > is mod_perl configured as a dso or static?
> > 
> > -Doug
> 
> mod_perl is static (my initial message included commands I used to build
> mod_perl/apache).
> 
> PerlFreshRestart Off  has no effect.
> 
> It does look like it's failing to load on the second pass, though, since I
> get one response from the "warn" you suggested:
> 
>       # bin/httpd -X
>       util.pm: MSELproxy::util about to bootstrap MSELproxy::util ...
>       [Fri Oct  1 00:43:05 1999] null: ...saw SiteMap...
>       Syntax error on line 14 of /usr/local/apache/conf/perl.conf:
>       Invalid command 'SiteMap', perhaps mis-spelled or defined by a
>       module not included in the server configuration



... more evidence ...  output of 
# MOD_PERL_TRACE=all bin/httpd -X

perl_parse args: '/dev/null' ...allocating perl interpreter...ok
constructing perl interpreter...ok
ok
running perl interpreter...ok
mod_perl: 0 END blocks encountered during server startup
perl_cmd_require: conf/perl-startup.pl
attempting to require `conf/perl-startup.pl'
loading perl module 'Apache::Constants::Exports'...ok
loading perl module 'Apache'...ok
loading perl module 'MSELproxy::util'...[Fri Oct  1 00:54:26 1999]
        util.pm: MSELproxy::util about to bootstrap MSELproxy::util ...
ok
loading perl module 'Apache'...ok
loading perl module 'MSELproxy::AccessManager'...ok
loading perl module 'Apache'...ok
loading perl module 'MSELproxy::OCLC'...ok
loading perl module 'Apache'...ok
loading perl module 'MSELproxy::RLG'...ok
blessing cmd_parms=(0xbfffdb2c)
[Fri Oct  1 00:54:26 1999] null: ...saw SiteMap...              <---
[root@pembroke apache]# loading perl module 'Apache'...ok
perl_startup: perl aleady running...ok
loading perl module 'Apache'...ok
cmd_cleanup: SvREFCNT($MSELproxy::util::$obj) == 1
cmd_cleanup: SvREFCNT($MSELproxy::util::$obj) == 1
loading perl module 'Apache'...ok
perl_cmd_require: conf/perl-startup.pl
attempting to require `conf/perl-startup.pl'
loading perl module 'Apache'...ok
loading perl module 'MSELproxy::util'...ok
loading perl module 'Apache'...ok
loading perl module 'MSELproxy::AccessManager'...ok
loading perl module 'Apache'...ok
loading perl module 'MSELproxy::OCLC'...ok
loading perl module 'Apache'...ok
loading perl module 'MSELproxy::RLG'...ok
Syntax error on line 14 of /usr/local/apache/conf/perl.conf:
Invalid command 'SiteMap', perhaps mis-spelled or defined by a module not
included in the server configuration

#######################################################################


###################################

###################################


###################################

This is a stuff to be integrated into the DB section, all mostly by jwb:

Date: Thu, 14 Oct 1999 17:21:18 -0700
From: Jeffrey Baker <jwb@cp.net>
To: modperl@apache.org, dbi-users@isc.org
Cc: sbekman@iil.intel.com
Subject: More on web application performance with DBI

Hi all,

I have converted my text-only guide to web application performance
using mod_perl and DBI into HTML.  The guide now lives alongside my
DBI examples page at http://www.saturn5.com/~jwb/dbi-performance.html
.

I have also conducted a silly benchmark to see how all of these
optimization affect performance.  Please remember that it is dangerous
to extrapolate the results of a benchmark, especially one as
rudimentary as this.  With that said please consider the following
data.

Environment:
DB Server: Oracle 8.0.6, Sun Ultra2, 2 CPUs, 2GB RAM, Sun A1000 disks
App Server: Linux, PII 350, 128MB RAM, Apache 1.3.6, mod_perl 1.19
Benchmark Client: ApacheBench on same machine as application server

Each benchmark consisted of a single request selecting one row from
the database with a randomly selected primary key.  The benchmark was
run through 1000 requests with 10 simultaneous clients.  The results
were recorded using each level of optimization from my tutorial.

Zero optimization: 41.67 requests/second
Stage 1 (persistent connections): 140.17 requests/second
Stage 2 (bound parameters): 139.20 requests/second
Stage 3 (persistent statement handles): 251.13 requests/second

It is interesting that the Stage 2 optimization didn't gain anything
over Stage 1.  I think this is because of the relative simplicity of
my query, the small size of the test database (1000 rows), and the
lack of other clients connecting to the database at the same time.  In
a real application, the cache thrashing that is caused by dynamic SQL
statements would probably be detrimental to performance.  In any case
Stage 2 paves the way for Stage 3, which certainly does increase the
request rate!

So, check it out at http://www.saturn5.com/~jwb/dbi-performance.html

Date: Wed, 23 Feb 2000 23:18:23 -0800
From: Jeffrey W. Baker <jwbaker@acm.org>
To: modperl@apache.org
Subject: Re: Database connection pooling... (beyond Apache::DBI)

Greg Stark wrote:
> 
> Sean Chittenden <sean@serverninjas.com> writes:
> 
> >       Howdy.  We're all probably pretty familiar with Apache::DBI and
> > the fact that it opens a database connection per apache process.  Sounds
> > groovy and works well with only one or two servers.  Everything is gravy
> > until you get a cluster of servers, ie 20-30 machines, each with 300+
> > processes.
> 
> 300+ perl processes per machine? No way. The only way that would make _any_
> sense is if your perl code is extremely i/o dependent and your perl code is
> extremely light. Even then you're way better off having the i/o operations
> queued quickly and processed asynchronously.

This conversation happens on an approximately biweekly schedule,
either on modperl or dbi-users, or some other list I have the
misfortune of frequenting.  Please allow me to expand upon this
subject a bit.

I have not yet gotten a satisfactory answer from anyone who starts
these threads regarding why they want connection pooling.  I suspect
that people think it is needed because everyone else (Netscape,
Microsoft, Bea) is doing it.  There is a particular kind of
application where pooled connections are useful, and there are
particular situations where it is a waste.  Every project I have ever
done falls into the latter category, and I can only think of a few
cases that fall under the former.

Connection pooling is a system where your application server threads
or processes, which number n on a single machine, share a pool of
database connections which number fewer than n.  This is done to
minimize the number of database connections which are open at once,
which in turn is supposed to reduce the load on the database server.
This is effective when database activity is a small fraction of the
total load on the application server.  For example, if your
application server mostly performs matrix manipulation, and only
occassionally hits the database, it would make sense for it to
relinquish the connection when it is not in use.

The downside to connection pooling is that it imposes some overhead.
The connections must be managed properly, and the scheme should be
transparent to the programmer whose code is using it.  So when a piece
of code requests a database connection, the connection manager needs
to decide which one to return.  It may have to wait for one to free
up, or it may have to open one based on some low-water-mark hueristic.
It may also need to decide that a connection consumer has died or gone
away, possibly taking the connection with it.  So you can see that
opening a pooled connection is more computationally expensive than
opening a dedicated connection.

This pooling overhead is a total waste of time when the majority of
what your application is doing is database-related.  If your program
will issue 100 queries and performa transaction during the course of
fulfilling a request, pooled connections will not make sense.  The
reason is that Apache already provides a mechanism for killing off
database connections in this scenario.  If a process or thread is
sitting about idle, Apache will come along an terminate it, freeing
the database connection in the process.  For database-bound or
transactional programs, the one-to-one mapping of processes to
database connections is ideal.

Pooling is also less attractive because modern databases can handle
many connections.  Oracle with MTS will run fine with just as many
connections as you care to open.  The application designer should
study how many connections he realistically plans to open.  If your
application is bound by database performance, it makes sense to cap
the number of clients, so you would not allow you applications to open
too many connections.  If your application is transactional, you don't
have any choice but to give each processes its own dedicated
connection.  If your application is compute-bound, then your database
is lightly loaded and you won't mind opening a lot of connections.

The summary is that if your application is database-bound, or is
processing transactions, you don't need or even want connection
pooling.



###################################

=> Security

It's a good idea to protect your various monitors like perl-status and
alike by password. The less information you provide for intruders, the
harder their break in task would be!!! (One of the biggest helps you
can provide for these bad guys is showing them all the scripts you use
if some of them are in public domain, while they can find out most of
them by browsing your site. The moment they know the name of the
script, they can grab the source of the script from the web (where the
script has come from) and learn the source and probably find a few or
even many security breaches. Security but obscurity doesn't really
works against a determined intruder but it definitely helps to wave
away some of the less determined malicious fellas.

e.g:

<Location /sys-monitor>
  SetHandler perl-script
  PerlHandler Apache::VMonitor
  AuthUserFile /home/httpd/perl/.htpasswd
  AuthGroupFile /dev/null
  AuthName "SH Admin"
  AuthType Basic
  <Limit GET POST>
    require user foo bar
  </Limit>
</Location>

And the passwd file:
  /home/httpd/perl/.htpasswd:
  foo:1SA3h/d27mCp
  bar:WbWQhZM3m4kl

###################################


> THere's nothing wrong with Ralf's guide per se, but I think
> you should mention in your Adding a proxy server section that
> mod_rewrite might be necessary if dynamic content is intermixed
> with static content.
  
That sounds reasonable indeed. I'll add it. Don't understand me wrong -   
I'm not against adding more things, I'm again duplication, which creates a
mess. So once you made it clear, we need that - I'll certainly add it.

Would you add something about using mod_rewrite to handle my scenario
to the guide?

Perhaps what you're looking for resembles this:

RewriteRule ^/(images|static)/ - [S=1]
RewriteRule (.+) http://backend$1 [P,L]

John D Groenveld wrote:
> 
> I've been using mod_proxy
> to proxypass my static content away from my /modperl
> directories. Now, I'd like to make my root
> dynamic and thus pass everything except /images and
> /static.
> I've looked at the guide and tuning  docs, as well
> as the mod_proxy docs, but I must be missing
> something.


###################################


###################################


###################################


###################################


###################################




###################################

###################################


###################################

Just a snippet to try...

try this (in the mod_perl-x.xx directory):

% make start_httpd
% strace -o strace.out -p `cat t/logs/httpd.pid` &
% make run_tests
% grep open stace.out | grep .htaccess > send_to_modperl_list
% make kill_httpd

and send us that file.  I have the feeling there's a .htaccess in your
tree that the process can't read.

###################################

Apache::RegistryNG is just waiting for more people to bang on it.  so, if
you make your module a sub-class of Apache::RegistryNG, that will help
things move forward a bit :)

###################################

At the strategy sections put (first work on it):

=head1 REDUCING THE NUMBER OF LARGE PROCESSES

Unfortunately, simply reducing the size of each HTTPD process is not
enough on a very busy site.  You also need to reduce the quantity of
these processes.  This reduces memory consumption even more, and
results in fewer processes fighting for the attention of the CPU.  If
you can reduce the quantity of processes to fit into RAM, your
response time is increased even more.

The idea of the techniques outlined below is to offload the normal   
document delivery (such as static HTML and GIF files) from the
mod_perl HTTPD, and let it only handle the mod_perl requests.  This 
way, your large mod_perl HTTPD processes are not tied up delivering
simple content when a smaller process could perform the same job more
efficiently.

In the techniques below where there are two HTTPD configurations, the 
same httpd executable can be used for both configurations; there is no
need to build HTTPD both with and without mod_perl compiled into it.
With Apache 1.3 this can be done with the DSO configuration -- just  
configure one httpd invocation to dynamically load mod_perl and the 
other not to do so.  
 
These approaches work best when most of the requests are for static
content rather than mod_perl programs.  Log file analysis become a bit
of a challenge when you have multiple servers running on the same
host, since you must log to different files.

=head2 TWO MACHINES

The simplest way is to put all static content on one machine, and all
mod_perl programs on another.  The only trick is to make sure all
links are properly coded to refer to the proper host.  The static
content will be served up by lots of small HTTPD processes (configured
I<not> to use mod_perl), and the relatively few mod_perl requests
can be handled by the smaller number of large HTTPD processes on the
other machine.

The drawback is that you must maintain two machines, and this can get
expensive.  For extremely large projects, this is the best way to go.

=head2 TWO IP ADDRESSES

Similar to above, but one HTTPD runs bound to one IP address, while
the other runs bound to another IP address.  The only difference is 
that one machine runs both servers.  Total memory usage is reduced 
because the majority of files are served by the smaller HTTPD
processes, so there are fewer large mod_perl HTTPD processes sitting
around.

This is accomplished using the F<httpd.conf> directive C<BindAddress> 
to make each HTTPD respond only to one IP address on this host.  One
will have mod_perl enabled, and the other will not.

=head2 USING ProxyPass WITH TWO SERVERS

To overcome the limitation of the alternate port above, you can use
dual Apache HTTPD servers with just slight difference in
configuration.  Essentially, you set up two servers just as you would
with the two port on same IP address method above.  However, in your
primary HTTPD configuration you add a line like this:

 ProxyPass /programs http://localhost:8042/programs

Where your mod_perl enabled HTTPD is running on port 8042, and has
only the directory F<programs> within its DocumentRoot.  This assumes 
that you have included the mod_proxy module in your server when it was
built.

Now, when you access http://www.domain.com/programs/printenv it will
internally be passed through to your HTTPD running on port 8042 as the
URL http://localhost:8042/programs/printenv and the result relayed
back transparently.  To the client, it all seems as if it is just one
server running.  This can also be used on the dual-host version to
hide the second server from view if desired.

=begin html
<P>
A complete configuration example of this technique is provided by
two HTTPD configuration files.
<A HREF="httpd.conf.txt">httpd.conf</A> is for the main server for all
regular pages, and <A HREF="httpd%2bperl.conf.txt">httpd+perl.conf</A> is
for the mod_perl programs accessed in the <CODE>/programs</CODE> URL. 


The directory structure assumes that F</var/www/documents> is the
C<DocumentRoot> directory, and the the mod_perl programs are in
F</var/www/programs> and F</var/www/rprograms>.  I start them as
follows:

 daemon httpd
 daemon httpd -f conf/httpd+perl.conf

=head2 SQUID ACCELERATOR

Another approach to reducing the number of large HTTPD processes on
one machine is to use an accelerator such as Squid (which can be found
at http://squid.nlanr.net/Squid/ on the web) between the clients and
your large mod_perl HTTPD processes.  The idea here is that squid will
handle the static objects from its cache while the HTTPD processes 
will handle mostly just the mod_perl requests once the cache is
primed.  This reduces the number of HTTPD processes and thus reduces
the amount of memory used.

To set this up, just install the current version of Squid (at this  
writing, this is version 1.1.22) and use the RunAccel script to start
it.  You will need to reconfigure your HTTPD to use an alternate port,
such as 8042, rather than its default port 80.  To do this, you can
either change the F<httpd.conf> line C<Port> or add a C<Listen>   
directive to match the port specified in the F<squid.conf> file.  
Your URLs do not need to change.  The benefit of using the C<Listen>  
directive is that redirected URLs will still use the default port 80
rather than your alternate port, which might reveal your real server 
location to the outside world and bypass the accelerator.

In the F<squid.conf> file, you will probably want to add C<programs>
and C<perl> to the C<cache_stoplist> parameter so that these are
always passed through to the HTTPD server under the assumption that
they always produce different results.

This is very similar to the two port, ProxyPass version above, but the
Squid cache may be more flexible to fine tune for dynamic documents
that do not change on every view.  The Squid proxy server also seems
to be more stable and robust than the Apache 1.2.4 proxy module.

One drawback to using this accelerator is that the logfiles will   
always report access from IP address 127.0.0.1, which is the local
host loopback address.  Also, any access permissions or other user
tracking that requires the remote IP address will always see the local
address.  The following code uses a feature of recent mod_perl
versions (tested with mod_perl 1.16 and Apache 1.3.3) to trick Apache
into logging the real client address and giving that information to
mod_perl programs for their purposes.


First, in your F<startup.perl> file add the following code:

 use Apache::Constants qw(OK);

 sub My::SquidRemoteAddr ($) {
   my $r = shift;

   if (my ($ip) = $r->header_in('X-Forwarded-For') =~ /([^,\s]+)$/) {
     $r->connection->remote_ip($ip);
   }

   return OK;
 }

Next, add this to your F<httpd.conf> file:

 PerlPostReadRequestHandler My::SquidRemoteAddr

This will cause every request to have its C<remote_ip> address
overridden by the value set in the C<X-Forwarded-For> header added by
Squid.  Note that if you have multiple proxies between the client and
the server, you want the IP address of the last machine before your
accelerator.  This will be the right-most address in the
X-Forwarded-For header (assuming the other proxies append their
addresses to this same header, like Squid does.)
   
If you use apache with mod_proxy at your frontend, you can use Ask
Bjrn Hansen's mod_proxy_add_forward module from
ftp://ftp.netcetera.dk/pub/apache/ to make it insert the
C<X-Forwarded-For> header.
  

###################################

###################################



###################################


###################################

config.pod:

use Eric's presentation:

http://conferences.oreilly.com/cd/apache/presentations/echolet/contents.html

###################################

mod_perl Humour.

* mod_perl for embedded devices:

Q: mod_perl for my Palm Pilot dumps core when built as a DSO, and
the Palm lacks the memory to build statically, what should I do?

A: you should get another Palm Pilot to act as a reverse proxy

by Eric Cholet.


#################################################

#################################################

DBI tips to improve performance:

Need to work on the snippets below:


What if the user_id has something that needs to be quoted?  I speak
of the general case.  User data should not get anywhere *near* an SQL
line... it should always be inserted via placeholders or very very
careful consideration to quoting.


Ahh, I see. I basically do the latter, with $dbh->quote. The contents of
$Session are entirely system-generated. The user gives a ticket through
the URL, yes, but that is parsed and validated and checked for presence in
the DB before you even get to code that works like I had described.

I agree - but you should always be aware of the issues with using
placeholders for the database engine that you use. Sybase in
particular has a deficient implementation, which tends to run out of
space and creates locking contention. Using stored procs instead is a
lot better (although it doesn't solve the quoting problems).

OTOH, Oracle caches compiled SQL, and using placeholders means it's not
caching SQL with specific data in it. The values can get bound into the
compiled SQL just as easily, and it speeds things up by a noticable amount
(factor of ~3 in my tests)

If we are on this topic, I have a few questions. I've just read the DBI
manpage, there is a prepare_cached() call. It's useless in mod_cgi if used
only once with the same params across the script. If I use Apache::DBI,
and replace all prepare statements (which include placeholders) with
prepare_cached(). Does it mean that like with modules preloading , the
prepare will be called only once per unique statements thru the whole life
of the child?

Otherwise a usage of placeholders is useless, if you do only one execute() 
call per unique prepare() statement. The only benefit is of DBI taking
handle of quoting the values for you. 

I don't remember someone mentioned prepare_cached() ever. What's the
verdict?



Simply adding the "_cached" to "prepare()" in one of my utilities
increased the performance eight fold (Oracle non-mod_perl environment).

I don't know the fine points of if it is possible to share cached
prepares across children (can you even fork with db connections?), but
if your code is doing the same query(ies) over and over, definitly give
it a try. 

Not necessarily; it depends on your database. Oracle does caching which
persists until it needs the space for something else; if you're finding
information about customers, it's much more efficinet for there to be one
entry in the library cache like this:

        select * from customers where customer_id = :p1

than it is for there to be lots of them like:

        select * from customers where customer_id = 123
        select * from customers where customer_id = 465
        select * from customers where customer_id = 789

since Oracle has to parse, compile and cache each one separatley. 

I don't know if other databases do this kind of caching. 

Ok, this makes sense. I just read the MySQL manual - with all grief, it
doesn't cache :(

So, I still think to use prepare_cached() to cache on the DBI behalf, but
it's said to work thru the life of $dbh and since my $dbh is my() lexicall
variable, I don't understand whether I get this benefit or not? I know
that Apache::DBI maintains a pool of connections, does it preserver the
cache of prepare statements as well (I mean does it preserve the whole 
$dbh object )? If it does, I get a speedup at least a speedup for the
whole life of a single connection. I think that the speedup is even better
than the one you have been talking about, since if Oracle caches the
prepare statement, DBI still reachs out for Oracle, if it's local cache we
get a little more save ups. 

Anyone deployes the scenario I have tried to present here? Seems like a
good candidate for a performance chapter of the guide if it really
makes speed better...

The statement cursors will be cached per $dbh, which Apache::DBI
caches, so there is an extreme performance boost... as your 
application runs caching all its cursors, database queries will 
become execution speed, no query parsing will be involved anymore.

On Oracle, the performance improvement I saw was 100% by using
prepare_cached functionality.

If you have just a small number of web servers, the caching difference
between Oracle & MySQL will be small on the db end.  Its when you have
a lot of DBI handles that things might get inefficient.  But I'm 
sure you are running a proxy front end, right Stas? :)

Be warned: there are some pitfalls associated with prepare_cached().
It actually gives you a reference to the *same* cached statement
handle, not just a similar copy.  So you can't do this:

my $sth1 = $dbh->prepare_cached('select name from table where id=?');
my $sth2 = $dbh->prepare_cached('select name from table where id=?');

$sth1 & $sth2 are now the same object!  If you try to use them
independently, they'll stomp all over each other.

That said, prepare_cached() can be a huge win when using a slow
database like Oracle.  For mysql, it doesn't seem to help much, since
mysql is so darn fast at preparing its statements.

Sometimes you have to be careful about that, yes.  For instance, I was
repeatedly executing a statement to insert data into a varchar column.  The
first value to insert just happened to be a number, so DBD::mysql thought that
it was a numeric column, and subsequent insertions failed using that same
statement handle.

I'm not sure what the correct solution should have been in that case, but I
reverted back to calling $dbh->quote($val) and putting it directly into the
SQL.  My opinion is that mysql should do a better job of figuring out which
fields are actually numeric and which are strings - i.e. get the info from the
database, not from the format of the data I'm passing it.



Actually, I'm a big fan of placeholders.  I think they make the
programming task a lot easier, since you don't have to worry about
quoting data values.  They can also be quite nice when you've got
values in a nice data structure and you want to pass them all to the
database - just put them in the bound-vars list, and forget about
constructing some big SQL string.

I believe mysql just emulates true placeholders by doing the quoting,
etc. behind the scenes.  So it's probably not much faster to use
placeholders than direct embedded values.  But I think placeholders
are cleaner, generally, and more fun.

In my experience, prepare_cached() is just a judgment call.  It hasn't
seemed to be a big performance win for mysql, so sometimes I use it,
sometimes I don't.  I always use it with Oracle, though.

prepare_cached is implemented by the database handle (and really the
database itself).  For example, in Oracle it speeds things up.  In MySQL,
it is exactly the same as prepare() because DBD::mysql does not implement
it because MySQL itself has no mechanism for doing this.

As I said in a previous message, prepare_cached() don't cache anything
under MySQL.  However, you can implement your own statement handle caching
scheme pretty easily by either subclassing DBI or writing a DB
access module of your own (my preferred method).

my $db = MyDB->new;

my $sql = 'SELECT 1';
my $sth = $db->get_sth($sql);

$sth->execute or die $dbh->errstr;
my ($numone) = $sth->fetchrow_array;
$sth->finish or die $dbh->errstr;  # This is doubly necessary with this
caching scheme!

sub get_sth
{
    my $self = shift;
    my $sql = shift;

    return $self->{sth_cache}->{$sql} if exists $self->{sth_cache}->{$sql};

    $self->{sth_cache}->{$sql} = $self->{dbh}->prepare($sql) 
	or die $self->{dbh}->errstr;

    return $self->{sth_cache}->{$sql};
}

I've used that in a few situations and it appears to speed things up a
bit.

For mod_perl, we would probably want to make $self->{sth_cache} global.


You know, I just benchmarked this on a machine running PostgreSQL and it
didn't actually speed things up (or slow it down).  However, I suspect
that under mod_perl if this were something that were globally shared
inside a child process it might make a difference.  Plus it also depends
on the database used.

(Contributors: Randal L. Schwartz, Steve Willer, Michael Peppler, Mark
Cogan, Eric Hammond, Russell D. Weiss, Joshua Chamas, Ken Williams, Peter Grimes)

#################################################

As a quick side note, I actually found that it's faster to write the logs
directly into a .gz, and read them out of the .gz, through pipes.  It takes
longer (significantly, by my experience) to read 100 megs from the drive
than it does to compress or uncompress 5 megs of data.

#################################################


#################################################

#################################################

performance.pod - extend on Apache::TimeIt package

#################################################

Add a new section - contributing to the guide - with incentives and
guidelines of contributions (diff against pod...)

#################################################


#################################################

#################################################

#################################################

security.pod : add Apache:Auth* modules

#################################################


#################################################

examples of Apache::Session::DBI code:

use strict;
use DBI;
use Apache::Session::DBI;
use CGI;
use CGI::Carp qw(fatalsToBrowser);

# Recommendation from mod_perl_traps:
use Carp ();
local $SIG{__WARN__} = \&Carp::cluck;

[...]

# Initiate a session ID
my $session = ();
my $opts = {  autocommit => 0, 
              lifetime   => 3600 };     # 3600 is one hour

# Read in the cookie if this is an old session
my $r = Apache->request;
my $no_cookie = '';
my $cookie = $r->header_in('Cookie');
{
    # eliminate logging from Apache::Session::DBI's use of `warn'
    local $^W = 0;      

    if (defined($cookie) && $cookie ne '') {
        
        $cookie =~ s/SESSION_ID=(\w*)/$1/;
        $session = Apache::Session::DBI->open($cookie, $opts);
        $no_cookie = 'Y' unless defined($session);
    }

    # Could have been obsolete - get a new one
    $session = Apache::Session::DBI->new($opts) unless defined($session);

}

# Might be a new session, so let's give them a cookie back
if (! defined($cookie) || $no_cookie) {
    local $^W = 0;

    my $session_cookie = "SESSION_ID=$session->{'_ID'}";
    $r->header_out("Set-Cookie" => $session_cookie);
}


#################################################


#################################################


##########################################################################
                 

#################################################################


##################################################################


########################################################################


########################################################################


########################################################################





