• Spamhalter getting overwhelmed by HTML meta tags

    From Marco Old@21:1/5 to All on Fri Aug 6 16:25:19 2021
    Spamhalter has not been working well having degraded over the past
    couple of years. There was a long thread about Spamhalter from 2019
    about this.

    I performed all of the hints from that thread but Spamhalter still
    misses many Spam emails.

    In looking at the "Explain Spam Classification", I see that almost all
    of the words used to classify the email are HTML meta tags. Words
    like "style", "arial", "margin", "sans-serif", "font-family",
    "text-align" and so on.

    So I train an email as Spam and those words get into the
    classificaiton for Spam messages and then on the next email, I train
    the email as not Spam and those words are removed from the
    classification. Then the next email is not considered Spam.

    Has anyone noticed this?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Steve Hayes@21:1/5 to All on Sat Aug 7 09:43:54 2021
    On Fri, 06 Aug 2021 16:25:19 -0700, Marco Old <notme@silXogicX.com>
    wrote:

    Spamhalter has not been working well having degraded over the past
    couple of years. There was a long thread about Spamhalter from 2019
    about this.

    I performed all of the hints from that thread but Spamhalter still
    misses many Spam emails.

    In looking at the "Explain Spam Classification", I see that almost all
    of the words used to classify the email are HTML meta tags. Words
    like "style", "arial", "margin", "sans-serif", "font-family",
    "text-align" and so on.

    So I train an email as Spam and those words get into the
    classificaiton for Spam messages and then on the next email, I train
    the email as not Spam and those words are removed from the
    classification. Then the next email is not considered Spam.

    Has anyone noticed this?

    That's probably the reason why most HTML e-mail ends up in my "Junk"
    queue, and as most of it is junk, I don't bother to fish it out.



    --
    Steve Hayes from Tshwane, South Africa
    Web: http://www.khanya.org.za/stevesig.htm
    Blog: http://khanya.wordpress.com
    E-mail - see web page, or parse: shayes at dunelm full stop org full stop uk

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Euler German@21:1/5 to All on Sat Aug 7 15:27:46 2021
    On article <5ogrgg99d7l9nucvu8u1968iepbc1ob35r@4ax.com>, Marco Old
    wrote (at least in part):

    So I train an email as Spam and those words get into the
    classificaiton for Spam messages and then on the next email, I train
    the email as not Spam and those words are removed from the
    classification. Then the next email is not considered Spam.



    Maybe you're no "training" SpamHalter correctly. There's a big
    difference between selecting one or more misclassified messages and
    MOVING it to the Suspicious or junk mail folder, and picking
    Spamhalter classification > Train message(s) as Spam from the menu.
    The same applies the other way around, that is, MOVING message(s)
    from the Suspicious or junk mail folder to any other folder is much
    more effective than Train message(s) as Not-Spam. There's a technical explanation for each method but in a nutshell it's how it works.

    OTOH if it is not your case you may benefit of SpamHalter's database
    cleaning which will remove deprecated data from corpus. Pick it from
    Tools > Spam and content controls > Spamhalter... > Cleanup...

    My current Spamhalter training strategy and settings:

    (*) Train on classification errors only (smaller database)
    ( ) Train always (larger database, self-trained) <- no need if you're
    run standalone
    or on small LAN.

    Spam level (%): 50 Not-spam boost: 1

    SpamHalter has been running flawlessly here since version 1.0 with
    these settings.

    --
    Kind regards,
    Euler German

    Please, reply preferably to the list.
    Reply-To: partially ROT13, invalid=com
    Due to spam I'm filtering-out GoogleGroups. Sorry. :(

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marco Old@21:1/5 to rstrezna.hfrarg@znvyahyy.invalid on Wed Aug 25 15:05:48 2021
    Euler,

    Thanks for the hints. I had the training strategy setting but I had
    default settings for Spam Level and Not-spam boost.

    I changed them to your recommendation and we will see
    what happens.

    I will be sure to drag the spam messages into the Junk folder.

    Marco

    On Sat, 7 Aug 2021 15:27:46 -0300, Euler German <rstrezna.hfrarg@znvyahyy.invalid> wrote:

    My current Spamhalter training strategy and settings:

    (*) Train on classification errors only (smaller database)
    ( ) Train always (larger database, self-trained) <- no need if you're run standalone or on small LAN.

    Spam level (%): 50 Not-spam boost: 1

    SpamHalter has been running flawlessly here since version 1.0 with
    these settings.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Euler German@21:1/5 to All on Thu Aug 26 09:32:04 2021
    On article <0ffdig5vmtk4ct0bmn9eads9bt6sjj7gq3@4ax.com>, Marco Old
    wrote (at least in part):

    I will be sure to drag the spam messages into the Junk folder.



    You may also use Quick Actions for this (I'm a keyboard guy). Look at
    Folder > Quick actions > Define quick actions...

    --
    Kind regards,
    Euler German

    Please, reply preferably to the list.
    Reply-To: partially ROT13, invalid=com
    Due to spam I'm filtering-out GoogleGroups. Sorry. :(

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marco Old@21:1/5 to All on Sun Oct 24 13:44:29 2021
    Update:

    Helped by the residents of this group, I've got Spamhalter working
    much better now.

    I cleared out all of the previous cached data, clicked on the

    (o) Train on classification errors only

    set "Spam Level %" to 50

    and set "Not-spam boost" to 1

    as recommended in other posts.

    Then I made sure to ONLY drag spam emails into the spam folder, NEVER
    use the right click menu item "Train Messages(s) as Spam".

    After a few weeks of dragging spam emails, now Spamhalter is working
    very well. Almost 100% accuracy in detecting Spam and not-Spam.

    Thanks to all.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)