Choosing a Spam Filter » Linux Magazine (2023)

Spam filters have different modes of operation. Understanding how they work can help you choose which one to use.

These days, the choice of spam filters comes down to Bogofilter and SpamAssassin. Other choices, like DSPAM, are no longer in development. Although a few other choices (e.g., SpamBayes) are available, when an email reader offers a plugin, it is almost always for either Bogofilter or SpamAssassin. However, what is less often discussed is which filter is the best to use in which circumstances.

Instead, most users simply nod solemnly when they read that both involve “Bayesian filtering.” Most of us – including many who use the phrase – have no idea what Bayesian filtering is, but it sounds scientific and reassures us that either choice is acceptable.

In fact, learning that Bogofilter and SpamAssassin are “Bayesian” is useless for choosing between them. To call them Bayesian means nothing more than their structure is based on the the 18th century work of Thomas Bayes in statistics and probability. More specifically, both apply Bayes’ work by collecting words and assigning a probability that each word indicates spam. The more suspect words contained in an email, the greater the chances it is spam. However, to make an informed choice between spam filters requires considerably more detail.

Bogofilter

(Video) Linux Crash Course - Understanding Logging

Bogofilter has its roots in “A Plan for Spam,” a 2002 essay by English developer Paul Graham. After trying to develop filters based on the identifying characteristics of spam, Graham concluded that beyond a certain point, the more rules he added, the more false positives he obtained – that is, the more email messages that were incorrectly identified as spam.

Graham’s solution was to parse his samples of spam and non-spam into tokens, or individual words, and use Bayesian tools to assign each token the possibility that it indicates spam, biasing them slightly in favor of not being spam to minimize false positives. By examining the top 15 tokens in the header and body of each new email message, he calculated the possibility that it was spam. If the probability was greater than 0.9, the message was considered spam.

According to Graham, the advantage of this statistical approach is that it refers to something real – the probability of being spam – and worked with both neutral and spam-indicating words.

However, he also recognized that the more personalized the filter was, the more accurate it would be. For this reason, he also included the possibility of using white lists to indicate non-spam, or “ham,” and black lists to indicate spam.

After reading Graham’s essay, Eric S. Raymond founded the Bogofilter project. Today, Bogofilter is maintained by other developers,and has refined Graham’s calculations based on Gary Robinson’s suggestions. The modern refinements include recognizing MIME types, treating each hostname and IP address as a separate token (rather than dividing them up into separate words), and ignoring dates and Message-IDs as irrelevant. However, the basic approach remains that advocated by Graham.

(Video) Before I do anything on Linux, I do this first...

The mathematically inclined can learn more about how Bogofilter assigns the probability of an email being spam by following the links and reading the man page for the filter. However, the most important point for the average user is that Bogofilter relies on statistical probability, supplemented by each user’s list of spam and ham. Advocates of this approach emphasize its simplicity, as well as its lower number of false positives once it is trained – that is, once the white and black lists are produced. These lists are contained in the .bogofilter folder in your home directory.

SpamAssassin

SpamAssassin takes a different approach from Bogofilter. SpamAssassin’s main approach is to identify the characteristics of spam and then run tests to locate them. Many tests, although not all, rely heavily on regular expressions to catch variations of words and phrases.

You can view the Perl scripts used by SpamAssassin in /usr/share/spamassassin. More than 50 are listed in my current installation of Debian Stable. From their number alone, you can tell they are a varied lot, but they include tests for the common indicators of spam in headings, in the bodies of email, and in HTML code, as well as tests for recognizing offers for anti-viruses, drugs, and pornography. In the English version, some basic tests for French, German, and Italian are also included. They also include a Bayesian probability test similar to Bogofilter’s, as well as white and black lists for individual customization.

Figure 1: SpamAssassin includes more than 50 tests to detect spam.

(Video) Variant Calling using freebayes | Germline variant calling episode 1 | Reproducing Galaxy tutorials

Additionally, /etc/spamassassin includes a test developed for Debian that looks for spam involving anacron, cron, and debconf, as well as plugins installed with each recent version.

Figure 2: Debian adds its own SpamAssassin tests to the already comprehensive list.

Each test assigns an email message a positive or negative value, which is added to the results of other tests to determine whether the email is ham or spam. Unlike Bogofilter, exactly what these values represent is uncertain, although considering many users probably have no understanding of Bayesian analysis, much the same could also be said for Bogofilter, of course.

With all these tests, SpamAssassin exemplifies the basic security principle of “defense in depth.” Unlike Bogofilter, it does not rely on one or two approaches, but on a wide variety of defenses. A piece of spam might slip by a single SpamAssassin test, but the odds of it slipping by all of them is unlikely.

Context is Everything

(Video) Data Loss Prevention EAC

Both Bogofilter and SpamAssassin are available as plugins for major email readers and generally require little customization. Both also have high success rates. However, because black and white lists greatly improve each filter’s accuracy, be wary of the various comparisons online. Your own results are likely to be very different from those posted, especially before you have trained the filter to suit your personal email.

In fact, the filters are so different in their approaches and so dependent on how they are trained that deciding in any objective sense which one is most effective is almost impossible. To some extent, your decision as to which filter to use may depend on whether you prefer Bogofilter’s single, all-encompassing approach or SpamAssassin’s defense in depth.

Even more importantly, your choice will depend on context. To start, if the speed of filtering matters, Bogofilter is much faster than SpamAssassin for the simple reason that it runs fewer tests. If you ordinarily receive several hundred email messages in the first download of the day, SpamAssassin runs so many tests that you might be unable to access your email for five minutes – a delay that you might consider worse than manually deleting spam.

By contrast, in my experience, Bogofilter requires several days of training before it reaches full effectiveness. On the one hand, stopping to train Bogofilter in the middle of other tasks can be a nuisance, especially because it seems to require several examples before it recognizes posts on a mailing list as ham. On the other hand, SpamAssassin is so comprehensive that it generally identifies spam more accurately without training. If you prefer to minimize training, SpamAssassin is probably the filter you want.

Another consideration is how many false positives you have once your filter of choice has been trained. My experience is that, once trained, Bogofilter has fewer false positives. Just as Graham observed,adding more rules, the way SpamAssassin does, beyond a certain point seems to increase false positives.

(Video) Ep202: Choosing a Private Email Service (Part 1)

Still another consideration is that SpamAssassin is reactive. It adds tests in response to the latest tactics used by spammers but appears to be slower to discard tests that are no longer needed – if it does so at all. Similarly, if new spamming tactics appear, you might temporarily have less effective filtering until a new software release is made. However, because Bogofilter relies on probability rather than on spam characteristics, it might not have the same problems – at least not to the same extent.

As you can see, the decision of which filter to use has no absolute answer. However, once you understand how both filters work, you can at least make a more informed choice to accommodate your preferences and your needs. If nothing else, you can choose the lesser of two evils.

FAQs

What algorithm is best suitable for email spam filtering? ›

Several machine learning algorithms have been used in spam e-mail filtering, but Naıve Bayes algorithm is particularly popular in commercial and open-source spam filters [2]. This is because of its simplicity, which make them easy to implement and just need short training time or fast evaluation to filter email spam.

How do I create a custom spam filter? ›

To do this, simply click on the cog in the top right corner to open your settings. Under the section “Filters and blocked addresses” you can customize the Gmail spam filter for your emails. Get an email address as professional and unique as you are including a free matching domain!

Where do I find spam filters? ›

Click the Settings gear icon on the top right in Gmail. Choose “Settings” in the menu and then select “Filters & Blocked Addresses” along the top.

What do email spam filters look for? ›

Internet Service Providers (ISPs) use spam filters to detect unwanted and unsolicited emails. Once they identify a spam message, they block it from reaching the target user's inbox. Spam filters are an industry response to the increasing rates of scams perpetuated by internet fraudsters.

How do I mass block spam emails? ›

Top Rated Product
  1. Report the email as spam.
  2. Block spam email addresses.
  3. Change your email privacy settings.
  4. Unsubscribe from unwanted newsletters or mailing lists.
  5. Use a secondary email address.
  6. Use a third-party email filter.
  7. Delete suspicious emails.
  8. Protect your device against malicious spam.
31 Aug 2022

How can I send mass emails to avoid spam? ›

Technical Settings
  1. Correctly format headers. ...
  2. Ensure the email underwent a checkout. ...
  3. Use a special header. ...
  4. Include an Unsubscribe Link. ...
  5. Avoid using spam-like words. ...
  6. Don't use suspicious links and attachments. ...
  7. Check the design. ...
  8. Provide a plain-text version.
11 Dec 2020

Is it better to block or mark as spam? ›

But if you find a spam email in your regular inbox, don't delete the message — mark it as spam. Marking a suspicious email as spam will send it to the spam folder. Moving forward, if you receive any more emails from this address, the spam filter will know no to let it into your inbox.

How do I create a custom filter? ›

Filter for a specific number or a number range
  1. Click a cell in the range or table that you want to filter.
  2. On the Data tab, click Filter.
  3. Click the arrow. ...
  4. Under Filter, click Choose One, and then enter your filter criteria.
  5. In the box next to the pop-up menu, enter the number that you want to use.

What is a spam filter an example of? ›

A spam filter is a program used to detect unsolicited, unwanted and virus-infected emails and prevent those messages from getting to a user's inbox. Like other types of filtering programs, a spam filter looks for specific criteria on which to base its judgments.

What is the easiest way to bypass spam filters? ›

How to avoid spam filters when sending emails: a checklist
  1. Set up an SPF (Sender Policy Framework) record on your domain*
  2. Turn on DKIM (Domain Keys Identified Mail) signing for your messages*
  3. Set up a DMARC (Domain-based Message Authentication, Reporting, & Conformance) record on your domain*

What is a 3rd party spam filter? ›

Third party spam filters are those that will pre-analyze your mail before it's delivered to your server and filter out the junk before you receive it in your local server and then mailbox.

Do spam filters stop spam? ›

There are certain phrases that may trigger spam filters, which are typically the types of phrases that email spammers use. Avoiding these spam triggers can help keep your emails out of the spam folder, but it can also help keep you from looking like a spammer when you're dealing with customers.

Does spam filters capture every phishing email? ›

Pretty easy, right? Of course these types of emails can be blocked by your spam filter, as they will typically fail certain technical checks. But unless the person configuring the filter really knows that they're doing, there's a good chance these emails will make it through.

Why is Gmail spam filter so good? ›

Simply put, to protect users at scale, we rely on machine learning powered by user feedback to catch spam and help us identify patterns in large data sets—making it easier to adapt quickly to ever-changing spam tactics. Gmail employs a number of AI-driven filters that determine what gets marked as spam.

Why am I suddenly getting lots of spam emails? ›

Spammers buy email addresses from special providers in bulk to add them to their mailing lists. If you've noted a sudden increase in the number of spam emails landing in your account, there's a high chance that your address was part of a list recently sold to one or more scammers.

Why am I getting so much spam all of a sudden 2022 Gmail? ›

If you start receiving an increased amount of spam, with junk mail filters enabled, then there might be a problem with the mailbox that your spam emails are usually moved to. You should check that the target mailbox or mail folder isn't full or disabled.

Why do blocked emails still come through? ›

Blocking someone stops their email from coming to your mailbox. If email from a blocked sender still appears in your Inbox, the sender might be: Changing their email address. Create an Inbox rule to pick up common words in your Inbox email and move them to the Deleted Items folder.

Is it better to unsubscribe spam or just delete it? ›

Rule #1: If it is a legitimate company, use the unsubscribe option. Make sure the link points to a domain associated with the purported sender. Legit companies or their marketing vendor proxy will usually honor the request. Rule #2: If it is a shady company, do not unsubscribe, just delete.

Why would you use a custom filter? ›

Custom filters allow you to define matching logic that cannot be accomplished using the system-provided message filters. For example, you might create a custom filter that hashes a particular message element and then examines the value to determine whether the filter should return true or false.

What is the difference between auto filter and custom filter? ›

Answer: AutoFilter allows filtering data with a maximum of 2 criteria, and those conditions are specified directly in the custom AutoFilter dialog box. Using AdvancedFilter you, can find rows that meet multiple criteria in multiple columns, and the advanced criteria need to be enter a separate range on your worksheet.

What is meant by custom filter? ›

Custom filter is a module that allows you to create your own filters based on regular expressions. When you need some input filter that is not available from drupal.org/project/modules, and you don't want to write your own module, you can create your own filter with this module.

Is there a totally free spam blocker? ›

Hiya Caller ID & Block is a Spam blocker app for Android phones that identifies the calls that the user must take and blocks the numbers and texts that must be avoided. Hiya is easy to use and it is free.

How do I get rid of spam emails without unsubscribe? ›

How to Unsubscribe From Emails Without Unsubscribe Link
  1. Use a reputable email cleaner, such as Clean Email.
  2. Email the sender and ask them to remove you from the list.
  3. Filter messages from companies in your inbox.
  4. Block the sender.
  5. Mark the email as spam, report spam, or report phishing.
4 Nov 2022

What are the three steps you can take to avoid spam? ›

5 Simple Ways You Can Fight Spam and Protect Yourself
  • Never give out or post your email address publicly. ...
  • Think before you click. ...
  • Do not reply to spam messages. ...
  • Download spam filtering tools and anti-virus software. ...
  • Avoid using your personal or business email address.

What can trigger spam filters? ›

Common spam filter triggers include:
  1. Poor grammar and spelling.
  2. Asking a reader to perform a suspicious action.
  3. Asking for personal information.
  4. Including too many links.
  5. Including suspicious or irrelevant attachments.
  6. A spammy subject line.
  7. Having an anonymous/unfamiliar sender name.

How do I beat Gmail spam filter? ›

How to pass through the Gmail spam filter?
  1. Ask users to add you to their contact list. ...
  2. Get permission from your customers. ...
  3. Don't use a purchased email list. ...
  4. Make it easy to unsubscribe. ...
  5. Don't send emails to unengaged subscribers. ...
  6. Use spam checkers. ...
  7. Verify the email content. ...
  8. Set up email authentication.
11 Aug 2022

Can spam filters open emails? ›

Sometimes, recipient spam filters may click-through links in emails to verify that they are safe for the receiver of the email to click on. This is good because spam filters are doing their job and protecting recipients.

Does clicking unsubscribe from spam emails cause you to be spammed more? ›

Unsubscribing from junk emails may seem like a simple way to clean your inbox, but doing so could actually make the spam problem worse. By clicking on a fake link in a spam email, you might be confirming to the spammer that your email address is correct, active, and checked on a regular basis.

Is Gmail good at blocking spam? ›

He says Google uses machine learning models to detect and filter out new threats, and that it blocks more than 99.9 percent of spam, phishing and malware from reaching Gmail users.

What machine learning algorithms can be used to identify spam messages? ›

TfidfVectorizer + Naive Bayes Algorithm

Naive Bayes is a simple and a probabilistic traditional machine learning algorithm. It is very popular even in the past in solving problems like spam detection.

Which algorithm are used to filter the spam by Google? ›

More sophisticated programs, such as Bayesian filters and other heuristic filters, identify spam messages by recognizing suspicious word patterns or word frequency. They do this by learning the user's preferences based on the emails marked as spam.

Which algorithm is used for email communication? ›

SMTP (simple mail transfer protocol) is a transportation protocol used to transfer e-mail messages over the Internet. All e-mail servers use the SMTP to send e-mails from one e-mail server to another. SMTP is also used to send e-mail messages from e-mail clients to e-mail servers.

How do spam filters use classification algorithm? ›

Adaptive Spam Filtering Technique

Algorithms classify the incoming mails into various groups and, based on the comparison scores of every group with the defined set of groups, spam, and non-spam emails got segregated.

What are two strategies for controlling spam? ›

Here are five simple ways to fight spam and to protect yourself online:
  • Never give out or post your email address publicly. ...
  • Think before you click. ...
  • Do not reply to spam messages. ...
  • Download spam filtering tools and anti-virus software. ...
  • Avoid using your personal or business email address.

Is spam filter an AI? ›

AI spam filters scan each incoming message and label any objectionable content. Its intelligent learning capabilities label warning signs of malware. If a message containing this malicious software is found in your inbox, it's immediately flagged and you're alerted not to touch it.

What are the two types of spamming? ›

Four Common Types of Spam and Tips to Identify Them
  • Phishing. Phishing is the most common form of spam. ...
  • Vishing. Vishing is similar to phishing, except it happens over the phone. ...
  • Baiting. Baiting, similar to phishing, involves offering something enticing in exchange for your login information or private data. ...
  • Quid Pro Quo.
8 Oct 2020

Is Gmail good at filtering spam? ›

Gmail also uses an in-house machine learning framework called Tensorflow – alongside some smart AI – to train new spam filters moving forward. The introduction of this technology now means that Google can block an additional 100 million spam messages every day.

Does SSL affect email? ›

An SSL/TLS certificate will secure your email communications, and an S/MIME certificate will make sure that all emails remain in an encrypted format.

What makes a good email sequence? ›

Here are few tips to write a great email sequence: Think about what your goal is with an email sequence. Have good personalization to make the customer not feel that it is coming from an automated system. Include a clear CTA leading the reader to the next step.

Which is the most common protocol used for receiving emails? ›

SMTP and Email

There are three common protocols used to deliver email over the Internet: the Simple Mail Transfer Protocol (SMTP), the Post Office Protocol (POP), and the Internet Message Access Protocol (IMAP). All three use TCP, and the last two are used for accessing electronic mailboxes.

How do you choose the best classification algorithm? ›

5 Simple Steps to Choose the Best Machine Learning Algorithm That Fits Your AI Project Needs
  1. Understand Your Project Goal. ...
  2. Analyze Your Data by Size, Processing, and Annotation Required. ...
  3. Evaluate the Speed and Training Time. ...
  4. Find Out the Linearity of Your Data. ...
  5. Decide on the Number of Features and Parameters.
3 May 2021

How do you choose a classification algorithm? ›

Do you know how to choose the right machine learning algorithm among 7 different types?
  1. 1-Categorize the problem. ...
  2. 2-Understand Your Data. ...
  3. Analyze the Data. ...
  4. Process the data. ...
  5. Transform the data. ...
  6. 3-Find the available algorithms. ...
  7. 4-Implement machine learning algorithms. ...
  8. 5-Optimize hyperparameters.
19 Mar 2019

Videos

1. start, stop, restart Linux services (daemon HUNTING!!) // Linux for Hackers // EP 6
(NetworkChuck)
2. 5 Best encrypted email services for 2021 | Are you using a secure email??
(VPNpro)
3. Linux Tutorial for Beginners [New Step-by-Step Tutorial with FREE LAB ACCESS]
(KodeKloud)
4. MailAssure Tech Tips | WEBINAR
(N-able)
5. Linux is a MAJOR Rabbit Hole
(TechHut)
6. Email Filter Appliance (E.F.A) Configuration
(mailserverguru)
Top Articles
Latest Posts
Article information

Author: Geoffrey Lueilwitz

Last Updated: 01/22/2023

Views: 5697

Rating: 5 / 5 (80 voted)

Reviews: 87% of readers found this page helpful

Author information

Name: Geoffrey Lueilwitz

Birthday: 1997-03-23

Address: 74183 Thomas Course, Port Micheal, OK 55446-1529

Phone: +13408645881558

Job: Global Representative

Hobby: Sailing, Vehicle restoration, Rowing, Ghost hunting, Scrapbooking, Rugby, Board sports

Introduction: My name is Geoffrey Lueilwitz, I am a zealous, encouraging, sparkling, enchanting, graceful, faithful, nice person who loves writing and wants to share my knowledge and understanding with you.