Dealing with Huge Backlogs

Table of Contents | Back: Postfix Queue Management | Next: Postfix documentation

Deciding when to deliver a deferred queue file

How to deal with a large backlog of mail messages that could not be delivered? A simplistic program would open each deferred queue file, read its list of left-over destinations, and would try to deliver the message to each listed destination. This approach is taken by Sendmail prior to version 8.something. Every hour (or whatever time the sysadmin has configured) a new Sendmail process starts to make a pass over the queue. When the backlog is large enough, the new Sendmail process starts before the previous one has completed, and the machine slowly but steadily fills up with Sendmail processes.

When the backlog runs into the millions of messages, repeatedly opening each queue file and trying to deliver to each left-over recipient takes too much time. The situation can be improved by keeping some status information in quickly-accessible memory, so that the mail system knows when a deferred queue file is ready to be tried again.

Memory is a finite resource. A naive program keeps information in memory for every deferred queue file. This approach is bound to fail when the number of deferred queue files grows sufficiently large. When that happens, the mail system becomes wedged, keeps crashing due to lack of memory, and stays wedged until someone throws away enough messages that the queue fits in memory again.

Postfix backlog management

Postfix uses a limited in-memory dead site list, together with carefully chosen queue file time stamps, in order to implement exponential backoff for delayed mail. This, combined with the code that prevents thundering herd effects, reduces the number of unnecessary attempts to reach dead hosts.

When a message cannot be delivered upon the first attempt, the queue manager gives the queue file a time stamp into the future by some configurable amount of time. Queue files with future time stamps are ignored by the queue manager.

Whenever a repeat delivery attempt fails, the queue file time stamp is moved into the future by an amount of time equal to the age of the message. Thus, the time between delivery attempts doubles each time. This strategy effectively implements exponential backoff.

Table of Contents | Back: Postfix Queue Management | Next: Postfix documentation