spugspam 1 November 28, 2005
spugspam - a confirmation-based mail filter
spugspam [options] Descriptionspugspam is a flexible sender-confirmation mail filter: it is designed to fight spammers who use fake e-mail addresses by requiring an unrecognized sender to confirm that they were, in fact, the sender by replying to a confirmation message. It is designed to be used with procmail as part of a more general mail-filtering pipeline. In its standard mode of operation, spugspam reads an e-mail message from standard input, evaluates it and marks it with a special header (X-SpugSpam-State) and writes it back out to standard output. spugspam is not a delivery agent, nor does it preempt the delivery of messages: it is a mail filter designed for use with other mail tools, notable procmail. As of version 1.1, spugspam supports the Sender Policy Framework (SPF). This
is a DNS convention that allows domains to identify hosts that are permitted to
send mail for them. SPF is an important feature for a confirm-response filter
because it prevents you from issuing confirmation requests to senders who could
not legally have sent the message based on the contents of their SPF records.
Configuration Filesspugspam stores all of its configuration and housekeeping files in a single directory tree which is, by default, $(HOME)/.spugspam. This can be overriden with the --root-dir option. This directory contains the following files and subdirectories:
Template Files"Template files" can contain variables which are expanded when the files are
used. A variable is specified as a python formatting sequence:
"%(var-name)s". Since variable expansion is actually
implemented using python formatting strings, the full power of these forms is
available
"Command files" are similar to template files, only they are used to invoke a single command and they use a stripped-down shell syntax. The contents of a command file are a sequence of words separated by whitespace. Words can be any of:
Note that single-and double quotes are ignored anywhere but the beginning of a word, so the text this-is-"a command" would expand to two words: 'this-is-"a' and 'command"', not 'this-is-a command'. Like Template Tiles, Command files can also include python formatting sequences. They are expanded in non-quoted words and in double quoted strings, but not in single quoted strings. In general, single quoted and double-quoted strings behave very much like their shell counterparts: double-quoted strings support the full set of C-style escape characters (including hex and octal character representations), single quoted strings support only "'" and "\". So, putting it all together, we might have a deliver script that looks something like this:
/usr/bin/weird-deliver-program "--deliver-to=%(localDestAddr)s"
'--extra-text=here is some extra text'
The Rules FileThe rules file contains a set of rules. A rule consists of a sequence of entries identifying matching criteria followed by an "action" entry identifying actions to take on the message if all of the criteria are met. Entries have an appearance similar to rfc822 headers. They must each be on their own line and they each consist of a tag followed by a colon and then a value which is a regular expression to be matched against a line in the message. All regular expressions are matched case-insensitive, and they begin matching at the beginning of the line - to match elsewhere in the line, start the expression with ".*" You may include comments in the file by starting a line with a '#', but a comment must be on its own line and must be the only thing on that line - you can not mix comments and rule lines. The following entry headers are supported:
A rule consists of a sequence of one or more header and body
entries followed by a single action entry.
The following rule file might be used to allow you to receive mailings from the "spugspam" and "SpamAssassin" mailing lists, and to block chain letters from a particularly annoying relative: # allow all messages from the spugspam list header: list-id:.*spugspam action: allow # allow all messages from the SpamAssassing list header: list-id:.*spamassassin action: allow # ignore cousin Fred's silly chain letters header: from:.*cuzinfred@hotmail.com body: .*send this message to (3|three) people action: deny # all messages from me with a special body signature are control # messages - evaluate commands in their body header: from:.*myaddress@myhost.com body: password=z3cr3t action: control-command The Received Patterns FileSPF checking uses the "received" headers to determine where the message was sent from. "Received" is a header added by MTA's to record the routing of the message. Unfortunately, its form is not entirely consistent accross different MTA's. Fortunately, you only need to be concerned with the MTA's that you have control over: those between you and the outside world. Received headers are parsed from top to bottom in a message. Generally there are one or more received headers that you want to ignore (the "local" headers) followed by one header that you're very interested in (the "remote" header, the address of the foreign host that actually is sending the e-mail). Local headers are inserted by your delivery agent and any MTA's that are either within your domain or allowed to forward to your domain. Remote headers are inserted by the outermost trusted MTA, and these contain the address information that you want to do SPF verification on. The recvdpat has two keywords to accomodate these different kinds of files: "local" and "remote". Each is specified on its own line and is followed by a colon and the regular expression that you wish to match. Trailing and leading whitespace is ignored. The local keyword may also include an integer indicating the maximum number of repetitions of the header to match, or an asterisk indicating that an unlimited number of repetitions may be matched. If neither is specifed, the pattern will match at most one repetition. It is always possible for a pattern to be ignored if it does not match: there is no way to indicate that a pattern must match some minimum number of headers. You can have as many "remote" keywords as you like. They must follow all "local" lines, and the program will stop processing received headers at the first matching "remote" line. If no remote rules are specified, two common styles of received header are used as defaults:
remote: from \S+ \((?P<ip>\d+(\.\d+){3})\) \(HELO (?P<host>\S+)\)
remote: from (?P<host>\S+) \(\[(?P<ip>\d+(\.\d+){3})\]\)
An an example, lets say that your mail is received by myhost.com, and you also have a forwarding account on friend.com. Your recdpat file might look something like this:
# match for your MTA
local: \(delivered to myaccount by foomailer\)
# optional match for messages received from friend.com
local: from .*\.friend\.com \(192\.168\..*\) (HELO friend.com)
# match for external hosts - first match will be used
remote: from .* \((?P<ip>[^)]+)\) \(HELO (?P<host>[^)])\)
remote: from (?P<host>\S+) \(\[(?P<ip>\d+(\.\d+){3})\]\)
There are two special groups in the remote expression, (?P<ip>[^)]+) and (?P<host>[^)]). These match the IP address and host name (HELO/EHLO domain name, actually) of the remote address. These must be present in any remote pattern that you supply, their values are used as areguments to the SPF check. (they need not match such lame expressions, though, (?P<ip>d+(.d+){3}) might be more appropriate for an IP address) If friend.com had a number of internal relay hosts, we might want to change the second rule to look more like this: # optional match for messages received from friend.com - relayed by any # number of hosts within friend.com. local*: from .*\.friend\.com \(192\.168\..*\) (HELO friend.com) As stated earlier, the "local*" usage allows an unlimited number of matches. If we wanted to limit this to, say 3 matches, we could have used "local3" instead. It is completely legal for a message to contain only local received headers:
in this case it will be assumed that the message originated locally and no SPF
check will be performed.
Certain configuration options to spugspam are specified in the config file (named config). If it is present, config is parsed and executed by the python interpreter, so you have the full power of the programming language in it. That said, it only supports two variables at this time, so doing anything fancy with it probably doesn't make very much sense. The variables supported in config are:
Example config file: # enable SPF (which is the default anyway) spf_enabled = True # ten second timeout spf_timeout = 10 Message StatesAfter analyzing a message, spugspam brands the message with a message state. The message state is stored in the X-SpugSpam-State header, followed by an md5 signature created from the rest of the message and the user's inner key: From: "Test Dude" <test1@bogus.com> To: mmuller@enduden.com Subject: test message X-SpugSpam-State: allow,whitelist:vKsIrl5UACqA9Xr5giyuqA== This is a test message The md5 signature is very important because spugspam reads this state header looking for information conveyed from a previous spugspam instance - without the signature, a spammer could simply add an X-SpugSpam-State: allow header and get a free pass through the system Other programs on the mail pipeline (e.g. procmail) should scan for the state header and make a decision as to what to do with the message based on it. In general, you want to deliver messages with an "allow" state, ignore messages with a "deny" or "unrecognized" state, and possibly do special stuff with the others. Some of the states are followed by a comma and a substate: the substate gives more information as to how the message state was determined. This is the set of all message states:
Control MessagesIn addition to command line options, spugspam supports "control messages" as a management technique. These are messages which spugspam recognizes as containing control information. There are two kinds of control messages, a control request and a control command. A control command is a message containing commands to be executed. A control request performs no actions of its own, but it replaces the body of the input message with a body containing a special signature (actually, a signed timestamp) and instructions listing all of the available commands. The recipient edits this, inserting actual commands to be executed, and sends back a reply which is a control command. The rationale for splitting control messages into this request/command set is:
There are two ways of causing a message to be treated as a control message:
InstallationFirst of all, you'll need a fairly recent version of the Python interpreter. spugspam was developed on Python 2.2, and has been tested on Python 2.3 and 2.4. Edit the spugspam script so that the first line contains an acceptable method of bootstrapping your local python interpreter. Now copy the script to some place on your $PATH ( /usr/local/bin is the recommended location). You should probably be able to run the tester script at this point
and see "0 tests failed" at the end of it (no guarantees here, as tester is a
shell script and I am not certain of its portability).
Create a ".spugspam" directory under your home directory (you can put it somewhere else if you want to, if you do be sure to use the "--root-dir" option to identify this location in the places where you run spugspam). Create all of the required files identified in Configuration Files in your .spugspam directory. Examples follow: The innerkey file need only consist of a string of data that is not easily guessed. If you are particularly paranoid, and have /dev/urandom on your system, you might want to just do this: (umask 077; head -c 64 </dev/urandom >~/.spugspam/innerkey) Alternately, creating the file with a unique passphrase in your favorite editor should work just as well. Be sure that your innerkey file is not world readable so that nobody can get your inner key and use it to trick their way through the system. In fact, you might want to make your entire .spugspam directory unreadable, as it is likely to contain some of your e-mail at various points in time. The "confmsg" file is just a template for your confirmation message. An example follows:
From: %(from)s
To: %(to)s
Subject: Confirmation request %(sig)s
Hi, this is a one-time confirmation message to verify that you are the
sender of the message below. If you are, please reply with the following
text in the subject or body of the message:
%(sig)s
In most cases, just hitting "reply" should work.
Thank you,
The deliver file usually just invokes your mail delivery agent. Be warned that if you use procmail as a delivery agent, and are running spugspam from procmail, you will get a nasty hang if spugspam is invoked recursively - your best bet is probably to specify an alternate .procmailrc file. Given this, your deliver file would look something like this: procmail /home/myname/.procmailrc-inner The sender file should normally just invoke your mail transfer agent. If you are using sendmail, it would look something like this: /usr/sbin/sendmail -t And that's all. If you have special needs, you can set up the other files as
well.
If you are using spugspam with procmail, you will want to invoke it from your .procmailrc file and filter its messages afterwards. In the most simple case, your .procmailrc file should look something like this: :0fw | spugspam :0 * x-spugspam-state: allow mymailfile :0 /dev/null This example assumes that you trust spugspam implicitly and only want to receive messages that it allows and everything else goes into the bit-bucket. In reality, you probably want to just use spugspam to mark messages for a week or two so you can see the results. If you are using spugspam with other mail filters that modify the message (e.g. SpamAssassin), you may want to run spugspam twice: # only do the basic checks of the rules and white/black lists :0fw | spugspam --check-lists # anything that spugspam recognizes gets to go right on through :0 * x-spugspam-state: allow mymailfile # run it through another spam filter :0fw | spamc # filter out stuff that other spam filter marked as spam :0 * x-spam-state: SPAM /dev/null # --force-accept-state makes it read the state information from the first # pass even though the message may have been modified. :0fw | spugspam --force-accept-state AuthorMichael A. Muller Portions contributed by Sam Lantinga.
Report bugs to mmuller@enduden.com |
|