• [slrn] experiment: can bayesian filtering score usenet posts?

    From Tavis Ormandy@21:1/5 to All on Mon Dec 20 03:32:55 2021
    The problem with training spam filters with NNTP is that the protocol is designed around offering headers and bodies seperately.

    Sure, in theory you could just download everything at once, but then you
    lose all the performance benefits of the protocol. If you could just
    score on the XOVER headers, then you would still have all the protocol benefits, but is that enough data?

    I decided to try it, and the answer is it works! *but* it took a lot of training before it started to work.

    I used bogofilter (https://bogofilter.sourceforge.io/) and wrote a macro
    to pipe just the overview headers into it. It then auto-generates a
    scorefile.

    For the last few months, it has been really accurate at identifying the messages I want to read and I've been finding it really useful. If
    anyone else wants to try it out, here is the macro I used:

    https://lock.cmpxchg8b.com/files/bogofilter.sl

    The macro automatically learns any articles you read when you leave a
    group. If the message had a positive score, it learns them as good. If
    it has a very low score, it learns them as bad.

    Tavis.

    --
    _o) $ lynx lock.cmpxchg8b.com
    /\\ _o) _o) $ finger taviso@sdf.org
    _\_V _( ) _( ) @taviso

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)