Mysterious failure of Debian maxlifetime session cleanup, solved

Abstract a/k/a spoiler alert:  
Invalid directives in a php.ini file cause the silent failure of the cron job/shell script that Debian systems use to garbage-collect expired PHP session data files.

One of the things I do at my job is maintain a couple of LAMP servers for our office’s use. We recently upgraded them from a long-outdated Ubuntu server edition to Debian 7 and PHP 5.4.4. Among the duties of these boxes is running some archaic and crudely written PHP code, so I expected some compatibility nightmares. I was able to resolve most of the errors, warnings, and weird behaviors one way or another, but the session garbage collection would not work, and I spent too long trying to figure out why. I don’t generally blog about technical stuff, but this might save someone some pain in the future — possibly even myself, when I forget the lesson learned here.

If you’re familiar with Debian and progeny (like Ubuntu), then you know Debian likes to store PHP session data files in /var/lib/php5 and put permissions on it such that if you let PHP try to do garbage collection, it fails with an error like this:

session_start() [function.session-start]: ps_files_cleanup_dir: opendir(/var/lib/php5) failed: Permission denied (13)

And if your php.ini settings for session.gc_probability and session.gc_divisor are the defaults, this error will only occur 1% of the time, making it fun to debug. But you did your Google diligence and RTFM, and learned that out of the box, Debian sets up a cron job that once per 30 minutes removes session data files whose modification times are older than session.gc_maxlifetime/60 minutes old. The Debian way seems to be to disable PHP garbage collection by setting session.gc_probability to 0 and letting cron do its thing, as
we can see in /etc/cron.d/php5:

09,39 * * * * root [ -x /usr/lib/php5/maxlifetime ] && [ -d /var/lib/php5 ] && find /var/lib/php5/ -depth -mindepth 1 -maxdepth 1 -type f -ignore_readdir_race -cmin +$(/usr/lib/php5/maxlifetime) ! -execdir fuser -s {} 2>/dev/null \; -delete

Fine. The Debian dudes know what they are doing, so I went with it. Before long, however, I started wondering why none of my session files were getting deleted ever. I am not a black belt in system administration, but I really scoured the logs and tried to figure it out, and Google was no help. So I took a closer look at that cron entry. What’s it doing? In essence, it is using the output of /usr/lib/php5/maxlifetime to determine how many minutes old a session data file has to be to be eligible for deletion. OK, then let’s have a look at /usr/lib/php5/maxlifetime. Not being fluent in shell scripting, I had to squint and scratch my head a bit to follow this.

#!/bin/sh -e
if which php5 >/dev/null 2>&1; then
for sapi in apache2 apache2filter cgi fpm; do
if [ -e /etc/php5/${sapi}/php.ini ]; then
cur=$(php5 -c /etc/php5/${sapi}/php.ini -d “error_reporting=’~E_ALL'” \
-r ‘print ini_get(“session.gc_maxlifetime”);’)
[ -z “$cur” ] && cur=0
[ “$cur” -gt “$max” ] && max=$cur
for ini in /etc/php5/*/php.ini /etc/php5/conf.d/*.ini; do
cur=$(sed -n -e ‘s/^[[:space:]]*session.gc_maxlifetime[[:space:]]*=[[:space:]]*\([0-9]\+\).*$/\1/p’ $ini 2>/dev/null || true);
[ -z “$cur” ] && cur=0
[ “$cur” -gt “$max” ] && max=$cur
echo $(($max/60))
exit 0

If you are new to shell scripting and would like to understand this, pay attention. Otherwise, skip ahead. We start by setting max=1440 — that means it will be our default if we don’t overwrite it in the code that follows. That first if condition tests whether there is a php5 binary installed someplace on this system. If so, then we iterate through the standard places where php.ini files are found, and use that php5 to load that configuration file, print out its session.gc_maxlifetime setting, and assign it to the variable $cur. Then we check to make sure we actually got a non-empty $cur before comparing it to $max. By the way, it appears that

[ -z "$cur" ] && cur=0

is a popular shell scripting idiom that is faster to type than the more verbose if construct. The stuff to the right of && only executes if the test on the left of it is true. In this case, the -z tests whether $cur is a zero-length string. This is kind of counter-intuitive to the uninitiated, because one might think the && really means and rather than if. It does mean and, but only if the expression to its left evaluates to true.

If $cur is greater than $max, we assign it to $max, and the result of this iterative processing is that the highest $cur we ever see will become our $max, as long as it’s greater than the initial 1440.

OK, and what’s with the next big else block? If we don’t find a php5 executable, we use sed — the Stream EDitor — to try to parse out the session.gc_maxlifetime value from the .ini file, and otherwise do the same thing as the preceding if.

Well, you may wonder, if they think sed is guaranteed to work, why not just do it that way in the first place? Frankly, I’m not sure. It turns out, as we’ll see, the use of php5 eventually alerted me to a misconfiguration that would not have been picked up had we used sed

So, what is the output of this damn thing, which the cron job is relying on? I ran it from the command line and found that it printed exactly nothing. So I made a copy, which I loaded into a text editor, and started applying my primitive debugging techniques, i.e., using echo statements just to figure out what was getting run and to see the values of things like $max and $cur. Pretty soon I determined that the script was choking on

cur=$(php5 -c /etc/php5/${sapi}/php.ini -d "error_reporting='~E_ALL'"
-r 'print ini_get("session.gc_maxlifetime");')

Then I started looking at the exit status. When I ran /usr/lib/php5/maxlifetime; echo $? from the command line, behold, our exit status is 127. Another consultation with Mr. Google reveals that 127 means “command not found.” What the hell is the command that isn’t being found? I was baffled.

Going back to the very top of this shell script, I wondered what the -e switch means at the shebang line. Back to Google to learn that it means “if not interactive, exit immediately if any untested command fails.” OK, then let’s remove the -e and see what happens:

10: [: Illegal number:
Fatal error: Directive 'allow_call_time_pass_reference' is no longer available in PHP in Unknown on line 0

Maybe I am a little slow, but I had to think on this for a while. Who said anything about allow_call_time_pass_reference? Look again at the first thing that php5 command is doing: loading the configuration file indicated by the -c switch. It is puking because there is an invalid configuration directive in php.ini! The above output is telling us that php5 is complaining about that, and the shell script in turn is complaining because it expects a number in the comparison at line 10.

Why is allow_call_time_pass_reference even there in the first place? Because it’s my habit to recycle php.ini across PHP upgrades so I don’t have to hand-edit all my settings again. I fixed that setting and tried again, then got another complaint about register_long_arrays, another configuration setting that is no longer valid in PHP 5.4. Both of these really are things I can do without, so good riddance. The lesson is that if you are going to keep an old php.ini, you should test it after an upgrade with a php5 command that does what this does: load the config, set error reporting to the max, and see what happens.

But getting back to the cron job:

09,39 * * * * root [ -x /usr/lib/php5/maxlifetime ] && [ -d /var/lib/php5 ] && find /var/lib/php5/ -depth -mindepth 1 -maxdepth 1 -type f -ignore_readdir_race -cmin +$(/usr/lib/php5/maxlifetime) ! -execdir fuser -s {} 2>/dev/null \; -delete

Now it’s clear that -cmin was getting an empty string, which will make find choke. But cron doesn’t email me the error output because 2>/dev/null sends it into the void.

The thing that made this adventure so much fun is that the cron job was failing because the shell script was failing because the php5 command therein was failing because of an obsolete php.ini setting that had no relevance to sessions — several levels of indirection — and error suppression was making it all the harder to track down.

A final weird note: I never did figure out why I was getting exit status 127. I tried to recreate the scenario while writing this entry, and couldn’t get it to happen again; instead, I got exit status 1. Something was different. That’s one mystery I can live without solving.