Spamhalter has not been working well having degraded over the past
couple of years. There was a long thread about Spamhalter from 2019
about this.
I performed all of the hints from that thread but Spamhalter still
misses many Spam emails.
In looking at the "Explain Spam Classification", I see that almost all
of the words used to classify the email are HTML meta tags. Words
like "style", "arial", "margin", "sans-serif", "font-family",
"text-align" and so on.
So I train an email as Spam and those words get into the
classificaiton for Spam messages and then on the next email, I train
the email as not Spam and those words are removed from the
classification. Then the next email is not considered Spam.
Has anyone noticed this?
So I train an email as Spam and those words get into the
classificaiton for Spam messages and then on the next email, I train
the email as not Spam and those words are removed from the
classification. Then the next email is not considered Spam.
My current Spamhalter training strategy and settings:
(*) Train on classification errors only (smaller database)
( ) Train always (larger database, self-trained) <- no need if you're run standalone or on small LAN.
Spam level (%): 50 Not-spam boost: 1
SpamHalter has been running flawlessly here since version 1.0 with
these settings.
I will be sure to drag the spam messages into the Junk folder.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 293 |
Nodes: | 16 (2 / 14) |
Uptime: | 241:00:51 |
Calls: | 6,624 |
Files: | 12,173 |
Messages: | 5,320,079 |