Introduction:

One of the common tips to increasing Apache performance is to turn off the per-directory configuration files (aka .htaccess files) and merge them all into your main Apache server configuration file (httpd.conf).

Jeremy raised an interesting question about when the performance loss caused by using many htaccess files is offset by the ease of maintenance. He’s arguing – and I agree – that it makes sense to keep the configuration locally inside .htaccess files, despite the performance loss as these are easier to maintain.

It’s fairly logical that the multiple .htaccess file route will be slower – for every node in the request URI, the webserver has to look for an .htaccess file and merge the rules found in every one. So, we’re going to have to have a filesystem seek’n’read for every subdirectory.

However, is this a major issue? How much of a performance hit is there? Let’s find out…

Set-up:

Ok. Let’s make two docroots each with the same structure and files.

1) htdocs_access – the .htaccess version. This has one .htaccess file in the leaf directory.

2) htdocs_config – the httpd.conf version. This has the same rule as the above, but the rule is in the server-wide httpd.conf file and htaccess support is turned OFF (AllowOverride None).

Next, we need to get the .htaccess/httpd.conf files to do something ( mainly so we can see if Apache’s merged them in ). So, we’ll make a number of files in the last random directory (the leaf node), and give half of them the extension .foo, and the other half .bar. We’ll then tell Apache to process the .bar’s with PHP, and the .foo’s as text. All files will have the same content:

[php]

[/php]

Here’s the (python) code I used to generate this structure:

[python]
#!/usr/bin/env python

import os

# where we’ll place the generated structure
staging = ‘/Users/simon/server’
htdocs_access = os.path.join(staging, ‘htdocs_access’)
htdocs_config = os.path.join(staging, ‘htdocs_config’)

# how deep to go!
dir_depth = 10

# how many files in the leaf node of the dir.
num_files = 50

# what content to put in the files
content = “”

# the actual htaccess file
htaccess = “””
AddHandler application/x-httpd-php .bar
“””

# make directory structure
dir = ”
for dirnum in range( 0, dir_depth ):
dir = os.path.join( dir, str( dirnum ) )

hta = os.path.join( htdocs_access, dir )
htc = os.path.join( htdocs_config, dir )
os.makedirs( hta )
os.makedirs( htc )

# make the files…
for filenum in range( 0, num_files ):
# assign the file types – half .foo, and half .bar
if filenum % 2 == 0:
filename = ‘%d.foo’ % filenum
else:
filename = ‘%d.bar’ % filenum

f = open( os.path.join( hta, filename ), ‘w+’ )
f.write( content )
f.close()

f = open( os.path.join( htc, filename ), ‘w+’ )
f.write( content )
f.close()

# now, add the .htaccess file inside the lead htdocs_access dir
f = open( os.path.join( hta, ‘.htaccess’ ), ‘w+’ )
f.write( htaccess )
f.close()

# and we’ll place it in the root of the htdocs_config dir as
# httpd.conf to remind ourselves to add it to the httpd.conf file
f = open( os.path.join( htdocs_config, ‘httpd.conf’ ), ‘w+’ )
f.write( htaccess )
f.close()
[/python]

Here’s what we end up with:

[code]
0/
1/
2/
3/
4/
5/
6/
7/
8/
9/
0.foo
1.bar
10.foo
11.bar
(…etc…)
6.foo
7.bar
8.foo
9.bar
[/code]

Where htdocs_access has a .htaccess file in 9/ and htdocs_config doesn’t.

Server Configuration:

Here are the two httpd.conf files for the configurations:

htdocs_config httpd.conf:

[code]
### Section 1: Global Environment
ServerRoot “/usr/local/apache2”
PidFile logs/httpd.pid
Timeout 300
KeepAlive On
MaxKeepAliveRequests 100
KeepAliveTimeout 15
DirectoryIndex index.html
AccessFileName .htaccess
HostnameLookups Off

# fixes crashes on OSX??
AcceptMutex fcntl

StartServers 5
MinSpareServers 5
MaxSpareServers 5
MaxClients 100
MaxRequestsPerChild 10

### Section 2: ‘Main’ server configuration
User nobody
Group #-1

DocumentRoot “/Users/simon/server/htdocs_config”

LoadModule php5_module modules/libphp5.so

Listen 8111


Options Indexes FollowSymLinks
AllowOverride None
AddHandler application/x-httpd-php .bar

[/code]

htdocs_access httpd.conf:

[code]
### Section 1: Global Environment
ServerRoot “/usr/local/apache2”
PidFile logs/httpd.pid
Timeout 300
KeepAlive On
MaxKeepAliveRequests 100
KeepAliveTimeout 15
DirectoryIndex index.html
AccessFileName .htaccess
HostnameLookups Off

# fixes crashes on OSX??
AcceptMutex fcntl

StartServers 5
MinSpareServers 5
MaxSpareServers 5
MaxClients 100
MaxRequestsPerChild 10

### Section 2: ‘Main’ server configuration
User nobody
Group #-1

DocumentRoot “/Users/simon/server/htdocs_access”

LoadModule php5_module modules/libphp5.so

Listen 8111


Options Indexes FollowSymLinks
AllowOverride All

[/code]

Results:

Benchmarking was done with “ab” the Apache Benchmark program, which was set to access one page 1,000 times with 10 concurrencies. Each configuration was benchmarked five times in random order (to minimise the effect of any running background processes etc).

htdocs_confightdocs_access
Test:Time Taken (s):Requests per Second:Time Taken (s):Requests per Second:
112.68321378.8413.2161875.66
212.85449177.7913.57491673.67
311.77767684.9113.16329675.97
413.66839873.1612.2647581.53
513.7675376.4713.26452775.39
AVERAGE:12.978.2313.176.4

So – we’re looking at a difference of around 2.3% extra requests per second when htaccess files are disabled. This is really quite trivial, and should only be worried about when you’re really loaded.

Issues:

There are a number of areas where this could be improved:

  • Try different directory depths i.e. the more nested the directory is, the slower it should be under the .htaccess scenario. In contrast, if there’s only 2 or 3 levels then it should be faster.
  • Have multiple .htaccess files in the intermediate nodes to see how Apache handles the merging of these files. Here we’ve just used one .htaccess file, and we should probably see further slowdowns if Apache has to merge some complicated rule sets.
  • Access different files – I just requested one file repeatedly, so we might be getting a lot of interference from any caching systems (harddrive, ram, php caches etc) that I forgot about. Additionally, requesting multiple URI’s is a more realistic test case for a webserver.