OpenBSD Journal

Reduce gray-listing pain by seeding white-list with SPF records.

Contributed by pitrh on from the my admin told me not to talk to strangers dept.

Longtime Undeadly contributor sean writes in with tips and tools for improving your spamd(8) experience:

I have been using gray-listing to thwart spamming for what feels like a very long time. I started using it around the release of OpenBSD 3.5. It was an amazing change from a constant storm of spam and just enabling it got rid of 80% of the spam almost immediately. That amazing improvement didn't come without a cost. Some mail services and servers don't work so well with it. Especially large mailing systems that pass around messages and don't necessarily guarantee the next delivery attempt will come from the same IP or network. Microsoft Exchange was also known to be 'usually' configured in such a way to not work with gray-listing as well.

Over the years I've either tolerated or white-listed IPs where I got particular complaints though the really hard nuts to crack are the larger organizations with 'big mail' infrastructure like Google Mail, Hotmail, Yahoo etc. White-listing huge chunks of frequently moving address space really eroded the benefits of gray-listing. I think for at least two years we turned gray listing on and off at a previous employer to 'work around' complaints. On my personal systems I've just left it on and didn't care if it blocked legitimate mail (people really wanting to contact me would know how to get a hold of me) and would just disable PF on-demand for account setups/password reset pages. For the rather annoying mail systems I'd trawl the ARIN database white-listing every network related to a particular company and called it day. A rather manual and time consuming process. It was to the point that I considered automating the temporary disabling of PF (i.e. spamd) for users.

Fast forward to recently where time is at a premium and the address ranges and outgoing MX delegates for big services are changing frequently. I figure I could spend full-time just looking for ranges to white-list. I just didn't have time for that. Then I stumbled upon this thing from Yahoo called SPF. SPF is a mechanism for domains to announce its mail exchangers via DNS TXT records. SPF is typically used as a way to validate incoming mail headers/connections against the published SPF records during delivery conversations (namely SMTP).

The mail daemon I use personally Qmail (slowly being phased out by OpenSMTPD) doesn't support SPF (or pretty much anything other than base RFC SMTP) particularly well. My way around that was to write a shell script to grab the SPF records for a well known set of domains and put them into the white-list (run once a day).

This worked great for a year or so. About a month ago a few of my friends mentioned they couldn't get an email to me from gmail. Having thought I solved those particular problems years ago I manually refreshed the white-list and it didn't work. I then looked at the SPF record Google was publishing and noticed that instead of listing the networks directly they included a recursive record which my dumb and simple script didn't handle (note I just expected A, MX and NET SPF tokens).

I figured with most of my friends and family using gmail (and similar), I had to support SPF a bit better than high-school parsing of basic records.

To that end I wrote an SPF parser in python which handles the recursive records and does a much better job of understanding the SPF records.

The script is written in python because Bourne style scripting just wasn't expressive enough and I enjoy the language. The one thing I didn't like about python was the lack of a 'good' simple (and tiny) DNS library. There are definitely a few out there but it isn't something I would want to ship with a product. My idea of simple is to include the one or three modules in the same place as my main script and call it a day. The available python DNS libraries were far more complex which turned me off. To get around that (and because 'computers are fast and memory is cheap') I decided to just pipe out to using dig (like my original shell script). It is definitely not as efficient as processing DNS calls natively in the tool but it isn't so expensive to be a problem.

The best part is (aside from python) it all ships by default with OpenBSD so no extra packages or similar nonsense.

You can find a copy of the script here:
https://github.com/nullstream/spf_dump/blob/master/spf_dump.py

It isn't the most elegant or perfect of solutions (I found some domains resolve SPF A records with DNS CNAMES which I don't try to further resolve) but it does the job and fixed a few other domains/services as well (twitter's website doesn't continually bother me about updating my email with them anymore). Another issue I should fix is putting a depth limit on the recursion. Because I'm now recursing on all domains that are included there is a possibility of a circular reference but I've not seen that yet so I've not put that in.

In the github repository I've also put a sample of the script I'm using to call this tool and populate a white-list:
https://github.com/nullstream/spf_dump/blob/master/spamd_whitelist.sh

Using gray-listing is just the top-most layer of the spam filtering (also using rblsmtpd with Qmail and spamassassin called from procmail) and it does the job admirably passing the results onto the layers below. On my backup exchanger I'm using OpenSMTPD so I just sync the white-list every generation to the backup MX (via scp) and populate the PF table accordingly (CPU resources on my backup MX is too slow to run it from there) which keeps the primary and backup exchangers in sync at that particular layer.

Regardless I put the code up on github (this was an excuse to try that service out) in case anyone else wants to use or improve upon this odd use of SPF. It would be great to get text versions (i.e. PF friendly) of the more reputable RBL lists though that is either cost prohibitive or not available. If you are not using Peter's block list (http://www.bsdly.net/~peter/traplist.shtml, also see this earlier story)... you definitely should. Another good list is the OpenBL list (previously known as sshbl), http://www.openbl.org/. Originally the server they hosted their list on wasn't too reliable but it has been getting better.

(Comments are closed)


Comments
  1. By Reyk Floeter (89.182.14.224) on

    Thanks for the article and work!

    I have a question: could you please add a license to the scripts? No license defaults to non-free, even on GitHub :-/

    Reyk

    Comments
    1. By Sean Cody (sean) on I don't work here.

      > Thanks for the article and work!
      >
      > I have a question: could you please add a license to the scripts? No license defaults to non-free, even on GitHub :-/
      >

      I'm rather ignorant in the ways of licensing my own work so thanks for pointing that out. I'll slap on a BSD license since 'public domain' probably doesn't mean what I think it does. :)

  2. By Anonymous Coward (78.192.150.96) on

    There is some non IP that show up in the output:
    mail.liveworld.com.

    Comments
    1. By Sean Cody (sean) on I don't work here.

      > There is some non IP that show up in the output:
      > mail.liveworld.com.
      

      Run with the debug option, it should tell you what records are not resolving. Odds are I missed something (more likely than not), or the SPF record for a particular domain was written by hand (and wrong).

      In the python script:
      Replace line 68 with: for i in parse(sys.argv[1],True):

  3. Comments
    1. By Sean Cody (sean) on I don't work here.

      > This is nice, for my part, I have been using some shell script with a recursive function to do the job. It resolves less info, but if a server is not within an ip4 statement, it will probably pass greylisting anyway.
      > spf_to_pf.txt
      >
      > Though, one has to wonder what happens if a 0.0.0.0/0 slips into the records by mistake...
      >
      Your link 404s for me.

      Yeah SPF records are certainly not authoritative or in many cases written correctly, but it is better than nothing. Like the possibility for infinite recursion, handling bogons and special addresses it is a matter of a few lines of code (then there would be 999 more problems left to resolve). :)

      I've had so many perfectly respectable mail servers completely fail at gray-listing I don't make the assumption anything will probably pass. Of course their problem is only marginally mine so I'm happy to continue gray-listing and white-list the annoying servers.

      Comments
      1. By Renaud Allard (renaud) on

        > > This is nice, for my part, I have been using some shell script with a recursive function to do the job. It resolves less info, but if a server is not within an ip4 statement, it will probably pass greylisting anyway.
        > > spf_to_pf.txt
        > >
        > > Though, one has to wonder what happens if a 0.0.0.0/0 slips into the records by mistake...
        > >
        > Your link 404s for me.
        >
        > Yeah SPF records are certainly not authoritative or in many cases written correctly, but it is better than nothing. Like the possibility for infinite recursion, handling bogons and special addresses it is a matter of a few lines of code (then there would be 999 more problems left to resolve). :)
        >
        > I've had so many perfectly respectable mail servers completely fail at gray-listing I don't make the assumption anything will probably pass. Of course their problem is only marginally mine so I'm happy to continue gray-listing and white-list the annoying servers.
        >

        Thanks, the 404 is a mistake in my DNS settings for ipv4. Now corrected.

        I just mentionned the 0.0.0.0/0 case for informational purposes. It's something like BGP peering, you have to trust some parties at some point...

  4. By Constantine A. Murenin (cnst) C++@Cns.SU on

    I've actually started a similar thread on an SPF mailing list back in February 2013: http://comments.gmane.org/gmane.mail.spam.spf.discuss/25200

    They did offer a solution: run an SPF check for 0.0.0.0, and print out all evaluated IPs as the check goes; script at http://spidey2.bmsi.com/pyspf/spf.py does just that, so, you might want to check it out, although it sounds like your script might already be enough in itself.

Latest Articles

Credits

Copyright © - Daniel Hartmeier. All rights reserved. Articles and comments are copyright their respective authors, submission implies license to publish on this web site. Contents of the archive prior to as well as images and HTML templates were copied from the fabulous original deadly.org with Jose's and Jim's kind permission. This journal runs as CGI with httpd(8) on OpenBSD, the source code is BSD licensed. undeadly \Un*dead"ly\, a. Not subject to death; immortal. [Obs.]