[previous: Getting Started]

JunkEmail Education Project

Procmail Scripts

(updated August 2003) A number of complete scripts are shown below. The circumstance of their application are noted. The environmental variables required at the beginning of each script have not been shown (see the [starting up] file for details).

The first two scripts are all you need to institute something like the JunkEmail Education Project.

And the following is the procmail application of Spam Assassin:


	# SA4 - catch severe spam garbage
        	:0
        	* ^X-Spam-Level:.*\*\*\*\**
        	/dev/null

The next few may also be of interest.

Auto Respond to Junkmail

(updated June 15 2002) This script auto-respond to junkmail. This handles incoming e-mail, refusing (actually 'deleting') any e-mail over a certain size limit, and any e-mail not addressed directly to you. It refuses "Cc" also. It sends you a Bcc of the reply. All other e-mail is delivered to the spooler:


	FLAGGED="  
	 We will not accept large e-mails or unsolicited promotional
	 messages. To accept such e-mail costs us money; to delete 
	 such e-mail costs us time. If this was not your intention, 
	 request access directions."

	# Accept small stuff _and_ explicitly addressed.
		:0
		* < 8000
		* ^To:.*account
		$DEFAULT
	# defeat loops and accept the BCC copies
		:0
		* ^X-Loop
		$DEFAULT
	# extract the From header
		BOGUS_FROM = `formail -x"From:"`
	# send nasty reply for either BCC stuff or oversized
	# an negated set of conditions - list friends here
		:0hw
		* ! ^From.*Friendly
		* ! ^From.*folks
		* ! ^From.*Nemo
		* ! ^From.*Franklyn
		* ! ^From.*fruitcake
		* ! ^From.*fklein
		| (formail -Y -rt \
		-I "Precedence: junk" \
		-I "From: JunkEmail Education Project <me at domain>" \
		-I "X-Loop: spam" \
		-I "X-Bogus-From: ${BOGUS_FROM}" \
		-I "Bcc: me at localhost" ; \
		echo -e " ${FLAGGED}\n"; \
			) | $SENDMAIL -oi -t
	# large pieces from legit people get delivered.
		:0:
		$DEFAULT

Substitute your account name where needed above. The reply (the "To:" header the email is sent to) is determined by formail (the -r flag) and might as likely use the "Reply-to:" header. You will also notice that only "To:" is accepted, not "Cc:". You could add the passing of CC copies as follows ..


		* ^To:|Cc:.*account

You could also use the formail supplied regex "TO_" or "TO", although I find them to be a bit too general, that is, they seem to allow e-mail which I do not consider as "directly addressed to me." "TO_" is used to find addresses (like ^TO_foo at foo.bar), "TO" is used to find words (like ^TOfoo) (See "man formail").

$FLAGGED is a text for the reply. $BOGUS_FROM is found by formail. To use the variables within the formail script, they need to be written as ${VARIABLE}, that is, with the curly braces.

Serious: Better learn about escaping end of line markers and where to place the semicolon.

Piped Junkmail Reply

(updated June 15 2002) Even if your ISP has some sort of spam filtering installed, some e-mail will get through. Some of this will be actual spam, but most will likely be undesired junkmail -- perfectly legitimate offers for toner or jobs, which you did not ask for. The script below can be used to reply (manually) to these people -- in effect saying, "Remove me from your e-mail list."

There are objections to responding to spam, but this process does not return the e-mail, it just replies with a short message. It is safe if you use your head: inspect the e-mail before responding to it.

As written, the script includes a facility for saving the headers of any e-mail you bounce in a dated folder. These headers are of use to people who are the victims of forged return addresses. Save them for a while, then delete them.

The procmail script below operates on a local machine, that is, you want to run this script when you are actually looking through arrived e-mails. We thus assume (1) you have Linux (or other Unix) running on your local machine, (2) you fetch your e-mail and look at it locally, and (3) you are using Pine or some other e-mail reader which allows piping commands. The directions below are for Pine.

Pine needs to be configured to allow local pipes. Set this up in the configuration file (.pinerc) as "[X] enable-unix-pipe-cmd".

You need to touch a file called "badpeople" in your home directory, that is, it has to pre-exist.

The procmail "rc" file will be called ".spamrc" -- different from the normal .procmailrc file, which is used for all incoming mail.

The shell script to run procmail against the ".spamrc" file will be called "spam" - and should be set as executable. The "spam" script goes as follows..


	#!/bin/sh
	/usr/bin/procmail -f-  $HOME/.spamrc
	echo "JunkEmail warning issued"

This gets invoked as follows: You are using Pine, and run across an e-mail promoting some concert. You have no interest in concerts, and you want to get off their mailing list. Although you could start a long dialogue with these people, it is easier to just put them on notice, so when you see this email you type..


	| 

.. and Pine asks you what process to run. You type ..


     ./spam

After a short delay, Pine returns with ..


        JunkEmail warning issued

.. a left arrow will get you back to the e-mail -- which you can now delete. The shell script "spam" has invoked procmail, using the ".spamrc" rc file, and piped the offending e-mail through. It has taken the From: address and added it to the file "badpeople" and written a return e-mail with a nasty message and a BCC copy to you. The next time (in the same session) Pine won't even ask, the "./spam" will show on the command line.

The .spamrc file goes as follows..


	# the messages which will be used
	REFUSED=" -Lines  -Words  -Characters (text deleted)
	 JunkEmail Daemon automatic reply: E-mail refused."
	FLAGGED="
	 Your address has been flagged as a source
	 of junk e-mail and unsolicited messages.
	 Headers have been retained for follow-up
	 complaints to your ISP and legal action."

	# drop loops 
		:0
		* ^X-Loop
		spam
	# extract To header 
		BOGUS= `formail -x"To:"`
	# get size of the email
		SIZE = `wc`
	# save the full header
		:0hc:
		header.`date +%y-%m-%d`
	# check if From: is in badpeople file 
		:0 Whc:badpeople.lock 
		| formail -rD 8192 $HOME/badpeople
	# do this if _was not_ in badpeople file yet (first time)
		:0efh
		| (formail -Y -r \
		-I "Precedence: junk" \
		-I "From: JunkEmail Education Project <account at domain>"\
		-I "X-Loop: Warning" \
		-I "X-Bogus_To: ${BOGUS}" \
		-I "Bcc: account at localhost"; \
		echo -e " ${SIZE}\n ${REFUSED}\n ${FLAGGED}\n";\
			) | $SENDMAIL -oi -t
	# do this if name _was_ in badpeople already (short note)
		:0 Efh        
		| (formail -Y -r \
		-I "Precedence: junk" \
		-I "From: JunkEmail Education Project<account at domain>"\
		-I "X-Loop: Refused" \
		-I "X-Bogus_To: ${BOGUS}" \
		-I "Bcc: account at localhost"; \
		echo -e " ${SIZE}\n ${REFUSED}\n"; \
			) | $SENDMAIL -oi -t
	# dump whatever	is left over
		:0
		/dev/null

How it works

Change the "account at domain" to whatever is appropriate. The BCC can be addressed to "your_account_name" at "localhost" -- which is perfectly legitimate for local delivery by sendmail. Or you could send it elsewhere also.

The full headers are saved to a dated file (mail-folder). Do this if you promise help in the 'FLAGGED' message ("headers are retained for legal action").

We hope for some psychological secondary effects of the message. Perhaps the possible prospect of legal action (available in some states) will deter some from their evil ways.

The remainder of the script is based in principle on the commonly used "vacation" scripts. It thus writes names to a database (a file) and checks against this file with every e-mail you pipe through. If the address does not exist in the 'badpeople' file it will be added, and a long reply will be sent. If the address does exist, only a short reply is sent.

The 'badpeople' file is a binary list of email addresses. When filled to capacity, formail starts overwriting at the start of the file, so that the list of addresses gets rotated, and the oldest addresses get removed.

You could dispense with the 'badpeople' file and use only one message. You could also set this up to delete any e-mail with return addresses which already exist in the 'badpeople' file, or automate the sending of the second message. (The 'badpeople' file cannot be edited with a text editor, BTW.)

There are a number of choices for the section "drop loops". You can send any looping e-mail with a header of "^X-Loop" to /dev/null, never to be seen again. I use the X-Loop: header to send BCC copies directly to an email-folder 'spam'. Periodically the 'spam' folder can be inspected and cleared. I also add another header, "X-Bogus_To: ${BOGUS}" which tracks who the original e-mail was sent to.

Last, I dump whatever is left over, because (for some reason) I was getting the left-over body sent to the spooler. You don't want that.

Procmail Listserv

(updated April 22 2002) This script operates like a listserv, but requires a website, that is, you need to have a website where all incoming e-mail addressed to any name whatsoever gets sent to the account you are using for the website. Any e-mail sent to a particular name at the website is picked up here and the e-mail is resent to a list of names, yourself included.


	# -- a listserv 
		:0f
		* ^To:.*bowling
		* !^X-Loop: bowling
		| formail -I "X-Loop: bowling" \
		-I "Reply-To: bowling at domain" \
		-I "Cc: "
	# -- add an introduction
		:0afbwi
		| echo "-- $FROM writes:"; cat -
	# -- send to list
		:0awi
		! `cat $HOME/bowlinglist`

Neato? There is a limit to the list of names, no more than will fit on a command line of the shell you are using. I am certain you cannot list hundreds of names, but at least enough for a bowling team. The list 'bowlinglist' needs to be kept as a file in your directory, or in a subdirectory, and is simply a text file with one e-mail address per line.

There might be occasion to use something like this to broadcast e-mail, so that the recipients cannot respond and have additional e-mail delivered again to every member of the list. For this you would try to hide the "From" and "Reply-To" headers by rewriting them with formail.


	# -- a broadcast listserv
		:0fc
		* ^To:.*broadcast
		* ! ^X-Loop: broadcast
		| formail -I "X-Loop: broadcast" \
		-I "Reply-To: me at domain" \
		-I "From: me at domain" \
		-I "To: me at domain" \
		-I "Cc: "
	# -- send to list
		:0awi
		! `cat $HOME/broadcastlist`

Anyone with the savy to look through the expanded headers could figure how to get through anyway. Just don't have them on the list.

Dealing with Bounces

I administer a website which sends a million e-mails per year from a database of e-mail address. Hundreds of bounces of various sorts get returned, amounting to about 3000 per month. These are mailer-daemon auto-generated replies indicating everything from "mailbox full", "no such user", "no such domain", to dns failures, "could not connect". In each case we have to judge if the account really has disappeared -- in which case the address should be deleted from our database -- or if it is only a temporary condition.

E-mail connections are spotty, machines fail for no particular reason, connections disappear temporarily, and mailboxes fill up. To receive a single "fatal error" return code is most of the time no indication at all of what is really happening. Better than half the bounces carry the wrong error message. The best test, in our humble opinion, is to only delete addresses which bounce consistantly for a long period. Currently we have set the "long period" to 20 to 30 consecutive bounces.

To save going through 3000 e-mails and somehow tracking what they all mean, the following script is used. We use the procmail supplied "FROM_MAILER" regex because "FROM_DAEMON" includes a match against "Precedence: Junk" which happens to be included as a header in many of our own auto-generated e-mails.


	# extract addresses of daemon
		:0
		* ^FROM_MAILER
		{
			:0cw:bb.lock
			| tr A-Z a-z | $HOME/bb
	# and if last test passed:
			:0ai:
			bounced.`date +%y-%m-%d`
		}
	# save the rest to spooler 
		:0
		$DEFAULT

In general, unless you know that these scripts always work, I suggest to first save a copy of all incoming e-mail. In Pine the whole 'saved' folder can be deleted with two keys, and a new folder will be created automatically by procmail when needed.

The FROM_MAILER is part of procmail and is a large regular expression which will catch almost any e-mail sent from Deamons, Mailers, and Postmasters.

If a Mailer Daemon e-mail shows up, it is entirely translated to lower case ('tr A-Z a-z'), and then sent (piped) to a Perl script ('bb') which extracts e-mail addresses. There will be many e-mail addresses in the e-mail, one or more of which will refer to the bounce address, others will belong to the reponding site, the administrator, postmaster, mailer daemon, and the e-mail-ID. The Perl script extracts all phrases which look like e-mail, and then discards any which we don't need or want, like postmaster, daemon, etc.

For each unique address found in an e-mail, the script will write the date to a file with the same name as the e-mail address, located in a separate directory. The e-mail is then filed in a mail-folder named "bounced.{date}". Without the separate bounced files by date, the "bounced" folder just gets too large. Simple? Looks even better when you consider how the e-mail-address files are handled. But first the Perl script. It is called "bb"; I forget why.


	#!/usr/bin/perl
	# $HOME/bb, find email addresses in emails, 
	# writes them to $HOME/bad/ with a date 
	# set up as a pipe for procmail or use with Pine
	 $now = `date`;

	# find included email-like addresses
	 while ($line = <STDIN>) {
		if ($line =~ /\b([\w_\-\.]+ at [\w_\-\.]+)\b/) {
	# exclude email addresses not needed
		(next) if ( $1 =~ /\d{8,}/ );       
		(next) if ( $1 =~ /postmaster/i );    
		(next) if ( $1 =~ /DAEMON/i );      
		(next) if ( $1 =~ /nobody/i );
		(next) if ( $1 =~ /localhost/i );
		push( at names,$1); 
		}
	}
	 close (STDIN);
	 foreach $idx (0 .. $#names) { 
		$uniq{$names[$idx]}=1; 
	}
	 undef  at names;
	 foreach $item (keys %uniq) { 
		push ( at names,$item);
	}
	 foreach $address (0 .. $#names) {      
		open (RECORD, ">>$HOME/bad/$names[$address]"); 
		print (RECORD $now);
	close RECORD;
	}

notes on the Perl script

The e-mail address match selects phrases like 'words at words.word' which is not entirely accurate, but good enough to catch most e-mails. The list of '(next)' items just drops the search whenever those e-mail addresses look like something we don't need - postmasters, daemons, local domains, and any string of 8 or more numbers (which will be an e-mail ID number, and which ought to still catch Compuserve addresses).

The e-mail addresses are uniqued and the date is written to a file with an e-mail-address name. Each date is under 60 bytes, so that after 20 hits or so the file exceeds 1000 bytes. File size is an index of the number of bounces.

The bounce record

Inspect the subdirectory 'bad/' with Lynx once a month. Lynx presents a 'ls -l' list, so that dates and sizes can be seen. Obviously any small files with earlier dates represent singular or temporary e-mail failures. These can be deleted ('ry' with Lynx). Similarly any e-mail addresses which recognizable have nothing to do with the database, or stray addresses resulting from badly formatted auto-reponders, can be deleted.

The files which have accumulated a considerable number of hits are candidates for removal from the database (however that is done is up to you), and subsequent removal from the 'bad' directory.

If an address cannot be found in the database, the bounced.{date} mailfolder can be inspected to make sense of things. After updating the database the older bounced.{date} mail-folder files can also be deleted.

Additional information

Some links to Procmail FAQs and SED

[previous page] [back to: Introduction]


[] ISP: Counterpoint Networking,
Website Provider: Outflux.net, www.Outflux.net
URL:http://jnocook.net/procmail.htm