Sendmail milter written in PERL
From Software By Jeff
Sendmail (http://sendmail.org) has a tremendously helpful capability to add mail filter programs to it. Mail filters can be used for a variety of things including investigation to reduce SPAM or UBE, anti-virus checking, and the one we needed, custom mail handling. There are some other full-featured programs that can run as mail filters, such as procmail (http://procmail.org), but they've got their own specific uses.
We had a specific need to take mail as it was received and parse it for insertion into databases. We sought first a way to do this through Java, but the one sendmail fitler program we found, jilter (http://jilter.sf.net), was not so well-documented as to allow immediate use. We'll probably follow up on this route, because if we can utilize a service we might reduce invocation overhead of our current PERL solution. More on that in a moment.
No other site had a quick and dirty discussion of how to make a simple milter. There's milter.org (http://www.milter.org), which seems to be related directly to sendmail, that provides a C API to write milters with, or Sendmail::Milter (http://sendmail-milter.sourceforge.net/) which is a PERL wrapper for the milter.org API. Looking at these made it seem more complex than it should need to be, at least for our limited needs.
No where could we find a simple discussion of what the milter environment needs to be, how to configure the sendmail server to use the milter, or what the milter program needs to do to handle the mail including what format the mail arrives in.
After some thought, we realized we had a PERL milter in place. Our mail server already handles a number of mail lists using the mailman (http://gnu.org/software/mailman) software. This software comes with a mail filter that gets installed to handle mail sent to the lists' domains. We took a peek at the milter and its configuration and in the interest of open source, seek to divulge its inner operations here. Not the operations of the mailman milter, exactly, but of a PERL milter derived from the mailman milter.
| Table of contents |
Writing the Milter
After reviewing the mailman milter, the needs of the script seemed almost laugably simple. Here's a stripped-down example of the milter.
#!/bin/perl
# Of course, use your correct PERL installation
# Space to remember the parameters
$sender = undef;
$server = undef;
@to = ();
# Space for the lines of the e-mail headers and body
@body = ();
# Read the arguments from the server
# [-r sender] server to [to...]
while ($#ARGV >= 0) {
if ($ARGV[0] eq "-r") {
$sender = $ARGV[1];
shift @ARGV;
} elsif (!defined($server)) {
$server = $ARGV[0];
} else {
push(@to, $ARGV[0]);
}
shift @ARGV;
}
# Read the message body
while ($line = <STDIN>) {
chomp($line);
push(@body, $line);
}
# Do whatever for each recepient
for $addressee (@to) {
}
Save the script, mark it executable, and provide it access permissions in accordance with your security needs. Poof, that's it.
Fundamentally, the script takes a small list of parameters, the least of which is the hostname of the system and a list of recepients. Optionally, a "-r" parameter may be followed by the sender's e-mail address. When sendmail calls the milter, with the settings as used by our borrowing the mailman configuration, the "-r sender" is always sent as the first two parameters, followed by the server name, followed by the list of recepients.
In our experimentation, however, it seems that each recepient receives a separate call to the script, even if multiple recepients to the same domain are included in the same e-mail message. This may be a configuration issue with our sendmail server, but the script, as modified, should handle multiple recepient addresses.
Note the recepients are the part of the address before the "@" symbol, since the part after the "@" symbol is passed as the server parameter.
After the mail message is streamed in from standard input, we iterate through the list of recepients, doing whatever we need to in that last for(){} loop.
As written, the script will take all of the messages and discard them, as nothing is done with them. This is a very expensive way to dump messages, so we hope something more essential is done in the final loop.
Mail Format
The body of the message is read from standard input. The format of the message, we found through experimentation, is exactly that of the mail files written by sendmail when a message is passed by all filters and is added to a user (or alias) mailbox. That format is as follows:
From sender@address DATE header: text BODY
Where the first line is a specific format that starts with the literal word "From" folowed by the fully-qualified e-mail address of the sender, and finally the date, formatted such as Mon Jan 1 00:00:00 1900 (note the date is two spaces, space pre-padded, so that our "Jan 1" actually has two spaces between the "n" and "1," which probably doesn't translate well on the web page).
The header: lines include things such as Date:, From:, To:, and Subject: with their appropriately formatted information following the colon. This formatting is textual, and is really up to the sender, although there are some specifics outlined in the RFC 822 (http://www.ietf.org/rfc/rfc0822.txt), that we'll not cover here.
The headers may be multi-line, each following line belonging to the previous header: until either the next header: line or the blank line signalling the end of the header block is encountered.
There is, as mentioned, a blank line between the headers and the body of the message.
The BODY is a plain-text representation of the message. The message may be anything sent in plain-text, including HTML pages, or encoded binaries.
Separate Header and Body Processing
The processing as we have it in our simple example doesn't consider the headers separately from the body. That's simple enough to do with something like the following:
$bodyReached = undef;
$date = "";
$to = "";
$cc ="";
$subject ="";
while ($line = <STDIN>) {
chomp($line);
if( !defined($bodyReached) ){
if( length($line) > 0 ){
# only the headers we care about
if( $line =~ m/^(Date:)/i ){
$line =~ s/Date:*\s//i;
$date = $line;
} elsif( $line =~ m/^(To:)/i ){
$line =~ s/To:*\s//i;
$to = $line;
} elsif( $line =~ m/^(Cc:)/i ){
$line =~ s/Cc:*\s//i;
$cc = $line;
} elsif( $line =~ m/^(Subject:)/i ){
$line =~ s/Subject:*\s//i;
$subject = $line;
}
} else {
$bodyReached = $line;
}
} else {
push(@body, $line);
}
}
Replace the simple while(){} we have in the simple example, and now the loop will pull out the headers for Date, To, CC, Subject, and Message-ID, ignoring the other headers. I'll leave it to the reader to refine the regexp used, or the headers grabbed to fit your application.
Configuring Sendmail
The sendmail configuration took a little deeper digestion. Sadly, there is no easy to find discussion on configuring sendmail to use a milter. We re-configured our sendmail step-by-step from the Mailman installation documentation to find the bits necessary.
This discussion will outline the steps necessary to make the filter available for use in sendmail. The simple discussion outlines how to send all mail for a host or domain to the filter. The filter will handle the mail, and sendmail will forget about it entirely when the filter is done. If your filter doesn't do something with the mail, it'll be lost. Well, with our discussion, anyway. This met our needs, so our investigation left when this was complete. We haven't taken the time to investigate any signalling to sendmail to let it continue processing the mail.
Simply, configure DNS to have an MX record pointing to the server hosting this sendmail installation. Configure the sendmail to have the mail filter script we created above enabled, and add a couple lines to configure the DNS managed host name to use the filter.
M4 Configuration
The sendmail installed needs to have the milter handling enabled. Our instructions include this. The configuration file needs to have a couple definitions in it to enable each filter.
FEATURE(`mailertable',`dbm /etc/mail/mailertable')dnl Mmilter, P=/etc/mail/script, F=rDFMhlqSu, U=user:group, S=EnvFromL, R=EnvToL/HdrToL, A=script $h $u dnl
The mailertable line enables the sendmail mailer table, defining the database file it will use to store the hash of items. More on that in a moment.
The M line contains the definition of the mail filter that we'll use when associating e-mail to our milter. It deserves a little bit of discussion, as it, too was hard to digest. Taking the line apart, we're going to pay attention to the parts in bold below:
Mmilter, P=/etc/mail/script, F=rDFMhlqSu, U=user:group, S=EnvFromL, R=EnvToL/HdrToL, A=script $h $u
The milter butted up against the defining M is the name you wish to associate with the mail filter. The name milter is actually a bad one, used here only for our abstraction. A better name would be descriptive of the purpose of the milter, perhaps even, as we do, named after the domain using the filter.
The P parameter defines the program to be run, so /etc/mail/script is the actual path and script name of the script modeled after our example PERL script. Again, script is a bad name, so choose more appropriately. Note that this must be "approved" by your installation of sendmail, which is usually simply done by adding a symlink to the script in the /var/adm/sm.bin folder as in ln -s /etc/mail/script /var/adm/sm.bin/script as root.
The U parameter specifies the user and group user:group as which the script will run. Whatever permissions that user has is what the script will be bound to. By default, it will use the user/group that sendmail runs under, so if you want to dump the U parameter, remember that. The user and group needs to exist, or the call will fail.
The A parameter defines the argument vector passed to the script. We found that script should match the script name used in the P
The F, S, and R parameters we left as Mailman gave them to us. The flags identified in the F parameters, as well as the other parameters above, are discussed in the sendmail THE WHOLE SCOOP ON THE CONFIGURATION FILE (http://www.sendmail.org/~ca/email/doc8.10/op-sh-5.html#sh-5.4) document, but we found it lacking in some respects. For example, while the S and R parameters are discussed, we could find no discussion of the parameters we were given with the Mailman script, although they seemed to work. Of course, we didn't experiment by changing the flags or parameters other than those we've discussed here.
Rebuild the sendmail.cf file. Starting or restarting sendmail will enable the milter for use.
Mailertable Configuration
Create or edit the /etc/mail/mailertable database (as configured in the FEATURE discussed above) to include the directive to use your milter.
host milter:server
Put the host name to which mail is delivered, for which you wish your filter to be used. This should be the same hostname that is expected in the e-mail address, after the @ symbol. This should also be configured in the DNS as an MX entry.
The milter should be the same name as used above, abutted to the M parameter in the M4 configuration file.
The server parameter is what ends up delivered to the script in the first position of the parameter list. If our configuration were as shown above, the word "server" actually would be sent as the third parameter to our script. That is "-r from@address server recipient" would be sent. Use this as a key for your script, if you'd like, or whatever makes sense. In our configuration, we used the actual DNS host name (e.g., mail.domain.tld) for the host part of the line, and the domain name (e.g., domain.tld) as the server part of the line.
Relay Domains Configuration
Create or edit the /etc/mail/relaydomains file to include the domains for which your filter will be used. Very simply add a line with the domain (host) for which your filter will be used.
host
This tells sendmail that it will be allowed to relay mail for the domain. Ordinarily this may be dangerous, as you could errantly enable open relaying through your system. Since we're dumping all of the mail to the hosts listed into our milter, there's not any danger, unless your milter just turns around and sends the mail out somewhere.