[Reasons] [Getting Started] [Procmail Scripts]

JunkEmail Education Project

Introduction

".. many people simply hate spam beyond belief .."
-- era eriksson

(updated November 30 2003)

.. Things change ...

In 2003 Microsoft started defaulting OutLook to multipart/mixed email -- a way to guarantee the delivery of HTML email with whatever images or other attachments. And the spammers did the same, and in fact during this year spammers have learned to reduce the sizes of their emails (drastically) by sending emails with imbedded links, rather than the previous practice of including images.

As a result the spam is easier to identify and I started to depend on Spam Assassin to postprocess the bulk of the email by trashing anything over Spam Assassin level 4 (explained below).

With the addition at the network level of some utilities from OSDL spam in November of 2003 has dropped drastically from 700 per day. The OSDL network utilities include a neural network, which lags changes in spam design by only a day or so. By mid December 2003, 90 percent of incoming e-mail is being rejected at the network level, that is, it is never delivered. Two or three pieces per day fall through. [In 2006 I am receiving only one piece per week.]

So the methods of blocking spam have changed, and the methods of dealing with JunkMail have simplified. Otherwise, except for some minor edits, I have let the rest of these pages stand as they were earlier this year. They still serve as an introduction to email, procmail, regex, and sed. I hope you will find them of some use. I have added the script for use of Spam Assassin


(updated March 29 2003) These pages describe how to gain some control over unsolicited junk e-mail through the use of procmail as a means of filtering out almost all spam, and automatically or selectively returning warnings to people who need to have their hands slapped, and doing so professionally and mostly anonymously. I have adopted this mechanism because the alternative is to silently delete these e-mails as they come in, and nobody learns anything from silence.

The process described here is an implementation of the set of Procmail utilities available on Unix systems. The methods described here are also unlike any other filtering I have seen in that e-mail is rejected by format instead of content with only a few exceptions. I reject all large e-mail, all HTML format e-mail, all multipart e-mail (attachments or HTML), all e-mail with included HTML tags, all unaddressed e-mail, and all CC and To lines with more than 6 names.

If this sounds drastic, consider that only one or two pieces of trash make their way through the filters out of 900 pieces received per month. In some months of tracking and inspecting incoming mail I have only seldom seen a piece which should not have been rejected. Additionally, senders are notified of how to get through to me if it is required.

This page is an overview. The philosophical underpinings are expounded on the [Reasons] page. How to set up procmail, an explanation of e-mail headers, and a brief review of regular expressions is provided by the [Getting Started] page. Basic procmail scripts which implement a JunkEmail Education Project on the [Procmail Scripts] page.

I need not tell you about spam, if you use e-mail you probably get lots of unsolicited mail. Some of this will be from genuine spammers, who either attempt to promote some product or service, or send the spam in order to verify the existence of your e-mail address. You should be especially aware of this last, since you will end up on more spammmer lists if you follow through with the offer for "removal" from their lists.

The main concern on these pages, however, is junkmail - mail with real return addresses, but mail which you didn't ask for (and must take time to inspect), and oversized mail (which ties up your modem and phone line).

E-mail is cheap to send in bulk, but is not cheap to receive by the individuals at the other end. It is a condition completely different from snail mail, even from telephone usage. E-mail is transported from machine to machine, often involving a half dozen to a dozen connections. Machines have to translate readable addresses to IP addresses by making additional connections to yet other machines.

Sending unwanted large e-mails is criminal. It results in hogging common internet resources with unasked for, unneeded, and unwanted traffic. Similarly the ecological impact of Microsoft's insistence on defaulting to HTML format for their Outlook e-mail program is considerable. Recent inspections of received e-mail indicates that Outlook generates e-mail which is on average 5 times larger than plain-text e-mail. Since Outlook is in very wide use in the USA, the (Microsoft generated) HTML tags could account for up to 80 percent of the e-mail in this country, and probably 99 percent of spam.

As the final step every recipient has to make an individual phone or DSL connections to receive the spam. And then you will need to inspect the e-mail, and delete it. Who pays for the phone connections, who pays for your inspection time? You do.

See the [CAUCE] website for more details.

With procmail you can do something about this. With procmail you can write scripts to limit the size of incoming e-mails, you can reject blind mail (e-mail sent to you without a To: line containing your address), you can send return warnings, you can form blacklists, but you can also bypass the filters you design.

Additionally with procmail you can handle bounces from mailing lists, you can institute listservs and create broadcast lists. Each of these scripts are listed and explained on the [Procmail Scripts] page.

And procmail will do better in curbing the distribution of MicroSoft Outlook e-mail viruses than any other means, with the exception of discontinuing the use of Outlook. See a separate [Outlook] page for details on current worms and e-mail viruses.

The design of this project is not to education spammers, because they already know what they're doing is wrong. What I am proposing is that there are ways to educate the people who do not understand this, and who spend your time and money without your permission. That is the idea of the Junkmail Education Project.

The main focus of these pages will be to give you a means of educating your friends, relatives, acquaintances, people in the same field, confused newbees, and rank idiots. It will be an education on netiquette, on the undesirability of junk e-mail, on the time, money, and ecological waste their e-mails represent, and on attitude.

Before introducing procmail, a few other methods and means:

Open Relay

If you simply need to limit spam, I would suggest that you convince your ISP (or whoever controls the domain where you receive e-mail) to start refusing e-mail from "open relay" and "dialup" IP addresses. The largest reduction in spam, for us, came from refusing e-mail from these blocks of openly available IP addresses; in our case this accounted for 95 percent of all the spam.

See the [open link database] website for lists.

Too bad that you will no longer receive e-mail from friends who don't use their ISP's MTA first (although these would be very few). But the purveyor of internet connections should take some responsibility for the use of the IP addresses they offer for sale, and if they don't - then refuse to communicate with them.

Spam Assassin

Suggest also to your administrator to check out a very smart e-mail preprocessing utility which filters by content, found at http://spamassassin.org/.

Spam Assassin inspects incoming e-mail: it does a quick analysis of the content, checking for a number of items which seem to be common to spam, and uses a scoring system to flag "probable spam."

Every incoming e-mail will include headers written by Spam Assassin. A paragraph of information is attached to the incoming e-mail if the level of spam indicators exceeds a certain count.

What you do with the tagged e-mail is up to you.

What I currently do with this e-mail is to automatically delete it at the remote location if the "Spam Level" exceeds "4".

What I do with email which gets past the Spam Assassin Level 4 filter, and gets delivered locally, is to pipe the offending email to a separate procmail script, warning that the sender's address has been flagged.

You will ask, "Is doing this safe? Doesn't this just verify to the spammer that your e-mail address is valid?"

Well, first of all, I do not respond to obvious spam, for every return address (the "From:" header) I have seen in the last year has been fake.

So the answer is: it is safe to repond to the "From" lable. If this is a genuine spammer, the return address will be invalid, and the spammer's test for validity of your e-mail address will be located in the body of the e-mail, not in the header. If you are sending a warning to a genuine spammer, your reply will go to a fake address, will bounce, and that will be the end of it.

If the spam address is real but forged, your reply will likely bounce because the unfortunate owner of the forged address will already have received 300,000 angry e-mails from other spam victims, and will have exceeded his allowed e-mail quota.

This leaves: The address is real, and the address is the source of the junkmail (which might possibly not have been meant as spam). In this event you are putting the originator on notice. Depending on what state you live in, you may also be able to follow up your reply with legal action, or suggest in your reply that you might do so.

The distinction you need to make, at any rate, is between completely bogus unsolicited e-mail, and bulky e-mail or broadcast e-mails sent by stupid friends and acquaintances. Depending on how your reply reads you could easily confuse, anger, or insult people. Be clear what your note means, even if you are terse.

The unsolicited 'spam' which used to pervade the world of fax machines has completely stopped since Federal legislation has set a penalty for unsolicited faxes. The cost distribution for faxes is very similar to what happens with e-mail: the receiver pays almost all of the cost. But telephone connections are easier to trace also.

Although legislation is pending for similar prohibitions for unsolicited e-mail, is is doubtful if it could be enforced. The source of an e-mail can easily be totally obfuscated. Until then, there is procmail. And after also, for use with those of your friends who will remain forever clueless.

A last note on spam: If you seriously want to trace spammers, and pursue complaints and possible legal action, see the excellent page [dealing with junk mail] by John Rivard of JCR Design and Consulting.

What procmail does

Procmail is a Unix program package, and you will either need Unix account access to your remote mail locations, or have your local machine running Unix (that would be Linux, most likely), or both.

The procmail package is actually two programs: procmail and formail.

Procmail intercepts e-mail destined for your inbox and can redirect it to just about anywhere, including the files in your mail directory (the mail-folders), it can send the e-mail to the bit bucket (/dev/null), it can send it to another program, and it can send it back to the MTA, sendmail, for delivery elsewhere. Procmail can also send e-mail directly to your inbox after inspecting it. And of course procmail can duplicate the e-mails.

Procmail does this by inspecting the headers of the e-mail (or by inspecting the body). The inspection is simply a 'regular expression' match, a fairly consistent means of matching a header line from the e-mail with a set of words or characters. Procmail can also chain the matches, or be set up in if-else fashion.

The companion program, formail, allows rewriting e-mail headers (and the body too). With formail you can generate replies, extract headers, replace the content, insert headers, and more.

What You Need

These pages assume that you have an e-mail account somewhere on the internet, maybe a number of them, and assumes that you somehow get your e-mail to come to you on a local machine, where you sort through it.

You can install procmail scripts at the remote locations (if you have account access, and it is a Unix box), so that filtering can be done without burdening your telephone line -- this is how the greatest savings are to be had. For example, the really big e-mails can be thrown away before they are downloaded to you. But you can also run procmail scripts on you local machine (assuming it also is a Unix box).

If, on the other hand, you are stuck with Windows and Mac boxes, there is less you can do locally, but if you sort through enough on-line spam control HOW-TO's and FAQ's you will probably find programs (including many e-mail readers) which will do nearly the same thing in filtering e-mail.


[next]


[] ISP: Counterpoint Networking,
Website Provider: Outflux.net, www.Outflux.net
URL:http://jnocook.net/junkmail/index.htm

printing and copyright notice