Filtering incoming emails

6 minute read Published: 2023-01-15

In this article I'll explain my learnings about implementing a filtering system for incoming emails. The level of yak shaving I had to go through to reach the final goal has no precedent in the entire human history.

I don't use a webmail or the IMAP protocol, I stick to the classic POP3s (which means download emails from the server and keep a local copy).

It all started with an innocent question: I want to separate emails from mailing lists from the rest. I can split this story in three parts:

By researching and manually implementing with separated tools (each one with its own set of quirks) all the steps that a modern software (such as Thunderbird) implements, I realize how much work mail clients actually do.

The architecture of a modern email exchange is the following (courtesy of Wikipedia):

MUA → MSA → MTA → (the internet) → MTA → MDA →→ MRA →→ MUA

If I read correctly that list, my current idea of "email client" should be close to the following (from left to right):

In comparison, applications like Mozilla Thunderbird do all the above seamlessly.

§ Step 1: handle incoming email classification

After researching a bit and configured fetchmail I decided I was not happy (the configuration looks convoluted and scattered on multiple files) and looked further to finally settle on maildrop.

Luckily my POP3 retrieval tool (mpop) supports different delivery mechanism. Instead of dropping the email in the mailbox, I can config mpop to hand it to an MDA and let it do the rest.

- delivery maildir ~/.local/mail/city17.xyz/catchall/Inbox
+ delivery mda /usr/bin/maildrop ~/.config/maildrop/mailfilter

A maildrop filter file is relatively easy to grok (man 5 maildroprc and man 5 mailfilter). I got a general idea with some copypasta (here and here), then proceeded to create my filters with the help of the good documentation.

Filtering messages from mailing list is also easy because each mailing list service has a specific header, here's a few (src):

Here's an example for the Guix support mailing list:

# Relevant headers
# List-Id: <help-guix.gnu.org>
# List-Post: <mailto:help-guix@gnu.org>

if (/^List-Post: .*help-guix@gnu\.org/:h)
{
    log "MATCHED!"
}

Now where should maildrop save the message? After some more research I've learned about an ancillary tool, maildirmake (man 1 maildirmake) that is used to create Maildir compliant folder structures without bothering to know the specs:

# Create a Maildir folder with two nested folders
$ maildirmake TestMaildir
$ maildirmake -f Drafts TestMaildir
$ maildirmake -f Drafts.Urgent TestMaildir

# Result:
$ tree -a TestMaildir
TestMaildir
├── cur
├── .Drafts
   ├── cur
   ├── maildirfolder
   ├── new
   └── tmp
├── .Drafts.Urgent
   ├── cur
   ├── maildirfolder
   ├── new
   └── tmp
├── new
└── tmp

12 directories, 2 files

The directory cur is for read messages, new is for unread messages and tmp stores the message being written before moving it to new. Please note the paths prepended with a dot (/.Drafts).

A visual way to represent the above could be:

Jane's emails (0)
├── Inbox
├── Drafts
│   ├── Urgent 

As messages arrive and are being marked as "read", this is what happens:

$ tree -a TestMaildir
TestMaildir
├── cur
│   ├── 1670876132.M796214P691272Q1.localdomain:2,RS
│   ├── 1671353042.M030514P2342530Q4.localdomain:2,S
├── .Drafts
│   ├── cur
│   ├── maildirfolder
│   ├── new
│   └── tmp
├── .Drafts.Urgent
│   ├── cur
│   ├── maildirfolder
│   ├── new
│   └── tmp
├── new
│   ├── 1670876132.M796214P691272Q1.localdomain:2,RS
│   ├── 1671353042.M030514P2342530Q4.localdomain:2,S
└── tmp

12 directories, 4 files

Visual representation:

Jane's emails (4)
├── Inbox (2 unread)
├── Drafts
│   ├── Urgent 

Please note that maildirmake does not handle creating maildirs with a dot (example: "personal.com"), so the workaround is two lines of sh:

mkdir -p $MAILDIR/personal.com/{cur,tmp,new}
touch $MAILDIR/personal.com/maildirfolder

§ Step 2: configure Emacs mu4e to handle maildir folders

With this newly acquired knowledge, we can create folders and tell maildrop the locations to save each mailing list message.

if (/^List-Post: .*help-guix@gnu\.org/:h)
{
    log "MATCHED!"
    
    # save the message in this mailfolder then immediately terminates
    to "$MAILDIR/.../.lists.help-guix/."
}

The maildrop config file (curiously by default is ~/.mailfilter) can be tested for typos with:

$ echo | maildrop <your-config-file> -V 9 2>/dev/null \
    && echo "OK" || echo "Error $?"`
OK

Or with a real email, checking if it's saved in the correct place:

cat $MAILDIR/.../1673263026.M771328P839100Q2.localdomain | \
    maildrop -V 1 <your_config_file>

Configuring mu4e to read from all this folder is a breeze, thanks to mu4e-maildir-shortcuts:

(setq mu4e-maildir-shortcuts
      '((:maildir "/.../.lists.help-guix" :key ?i)
      ;; other  folders...
      ))

And it's done! Now everytime we download emails, they will be classified in the correct folder.

§ Step 3: system emails from mbox to Maildir format

Wait, we're not finished yet, just one more little itch to scratch :)

How about those mysterious messages ("You have new mail") that from time to time appear on the console after updating the system? What are these emails? Where are they saved? Turns out they're somehow interesting but this is a another can of worms for the next article about diving into how exim4 works.

Obligatory XKCD reference: xkcd.com/1728.

§ Conclusions

Reatively hard stuff to figure out, mostly because there are not many examples around but the documentation is usually great and with a bit of focus one can figure out this stuff.

I'm glad I've learned a bit about tools written +15 years ago, still doing their job greatly. They are still being used is because they work, are stable and are well-documented.