Forum: >>> Magnum BBS <<<

Getting spamassassin and clamav as inn filters

From The Doctor@21:1/5 to All on Mon Oct 9 14:57:15 2023

Any recipes how?
--
Member - Liberal International This is doctor@nk.ca Ici doctor@nk.ca
Yahweh, King & country!Never Satan President Republic!Beware AntiChrist rising! Look at Psalms 14 and 53 on Atheism https://www.empire.kred/ROOTNK?t=94a1f39b An oil stain on the carpet is not removed by picking up the litter. -unknown Beware https://mindspring.com

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Gea-Suan Lin@21:1/5 to The Doctor on Thu Oct 12 04:26:16 2023

On 2023-10-09, The Doctor <doctor@doctor.nl2k.ab.ca> wrote:

Any recipes how?

Yeah, I just implemented a simple hack within `cleanfeed.local`. Have
tried, but not so useful. Still many spam into comp.lang.c and other
groups.

The most efficient way to avoid Google Groups spam for now is just
giving up anything from Google Groups.

```
use Mail::SpamAssassin;

my $sa_agent = Mail::SpamAssassin->new();

sub local_filter_last {
return unless $hdr{Path} =~ /google-groups\.googlegroups\.com/;

my %myhdr = %hdr;
delete $myhdr{__BODY__};
delete $myhdr{__LINES__};

my $header_str = join "\n", map { "$_: $hdr{$_}" } keys %myhdr;
my $article_str = "$header_str\n\n$hdr{__BODY__}";

my $mail = $sa_agent->parse($article_str);
my $status = $sa_agent->check($mail);

return reject("Reject Google Groups posting to $hdr{Newsgroups} by SpamAssassin") if $status->is_spam();

$status->finish();
$mail->finish();

return;
}
```

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Ray Banana@21:1/5 to All on Thu Oct 12 05:44:28 2023

* Gea-Suan Lin wrote:

On 2023-10-09, The Doctor <doctor@doctor.nl2k.ab.ca> wrote:

Any recipes how?

Yeah, I just implemented a simple hack within `cleanfeed.local`. Have
tried, but not so useful. Still many spam into comp.lang.c and other
groups.

[...]

use Mail::SpamAssassin;

my $sa_agent = Mail::SpamAssassin->new();

sub local_filter_last {
return unless $hdr{Path} =~ /google-groups\.googlegroups\.com/;

my %myhdr = %hdr;
delete $myhdr{__BODY__};
delete $myhdr{__LINES__};

my $header_str = join "\n", map { "$_: $hdr{$_}" } keys %myhdr;
my $article_str = "$header_str\n\n$hdr{__BODY__}";

my $mail = $sa_agent->parse($article_str);
my $status = $sa_agent->check($mail);

return reject("Reject Google Groups posting to $hdr{Newsgroups} by SpamAssassin") if $status->is_spam();

$status->finish();
$mail->finish();

return;
}
```

OK, now you need a ~/.spamassassin directory for your news user and a user_prefs
file in that directory. After that you can start adding rules for Usenet spam. You will also need to feed several hundreds of spam and ham articles to sa-learn --spam
or sa-learn --ham as the news user. After that, SpamAssassin will gradually improve.

--
Пу́тін — хуйло́
http://www.eternal-september.org

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From =?UTF-8?Q?Julien_=C3=89LIE?=@21:1/5 to All on Thu Oct 12 08:58:19 2023

Hi Gea-Suan Lin,

Any recipes how?

Yeah, I just implemented a simple hack within `cleanfeed.local`. Have
tried, but not so useful. Still many spam into comp.lang.c and other
groups.

FWIW, there's a doc in French to set up a "spamchk" funnel to
SpamAssassin in the newsfeeds file:

https://web.archive.org/web/20230901182332/https://git.alphanet.ch/gitweb/?p=inn-install;a=blob_plain;f=README.html;hb=HEAD#filtrer-le-spam-avec-spamassassin

--
Julien ÉLIE

« Medicus curat, natura sanat. »

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Gea-Suan Lin@21:1/5 to Ray Banana on Thu Oct 12 09:00:09 2023

Thanks for the information.

I added a setting into ~/.spamassassin/user_prefs for recognizing MIME
part:

#
bayes_token_sources all

Then I manually selected 200+ hams and 200+ spams from comp.lang.c, and
50+ spams from comp.lang.python as well as 200+ spams from sci.crypt. Afterwards I sent all these hams/spams into sa-learn.

The result looks pretty good so far. Almost all new spams into
comp.lang.c were blocked by SpamAssassin.

I put my trained files here, so you may just reuse it:

https://newsfeed.hasname.com/files/usenet-spamassassin-20231012.tar.gz

Ray Banana <rayban@raybanana.net> wrote:

* Gea-Suan Lin wrote:

On 2023-10-09, The Doctor <doctor@doctor.nl2k.ab.ca> wrote:

Any recipes how?

Yeah, I just implemented a simple hack within `cleanfeed.local`. Have
tried, but not so useful. Still many spam into comp.lang.c and other
groups.

[...]

use Mail::SpamAssassin;

my $sa_agent = Mail::SpamAssassin->new();

sub local_filter_last {
return unless $hdr{Path} =~ /google-groups\.googlegroups\.com/;

my %myhdr = %hdr;
delete $myhdr{__BODY__};
delete $myhdr{__LINES__};

my $header_str = join "\n", map { "$_: $hdr{$_}" } keys %myhdr;
my $article_str = "$header_str\n\n$hdr{__BODY__}";

my $mail = $sa_agent->parse($article_str);
my $status = $sa_agent->check($mail);

return reject("Reject Google Groups posting to $hdr{Newsgroups} by SpamAssassin") if $status->is_spam();

$status->finish();
$mail->finish();

return;
}
```

OK, now you need a ~/.spamassassin directory for your news user and a user_prefs
file in that directory. After that you can start adding rules for Usenet spam.
You will also need to feed several hundreds of spam and ham articles to sa-learn --spam
or sa-learn --ham as the news user. After that, SpamAssassin will gradually improve.

--
Resistance is futile.
https://blog.gslin.org/ & <gslin@gslin.org>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From yamo'@21:1/5 to All on Sun Jan 28 10:05:49 2024

Hi Julien,

Julien ÉLIE a tapoté le 12/10/2023 08:58:

Hi Gea-Suan Lin,

Any recipes how?

Yeah, I just implemented a simple hack within `cleanfeed.local`. Have
tried, but not so useful. Still many spam into comp.lang.c and other
groups.

FWIW, there's a doc in French to set up a "spamchk" funnel to
SpamAssassin in the newsfeeds file:

https://web.archive.org/web/20230901182332/https://git.alphanet.ch/gitweb/?p=inn-install;a=blob_plain;f=README.html;hb=HEAD#filtrer-le-spam-avec-spamassassin

The spamchk funnel is slower than calling SpamAssassin in cleanfeed.local. After some tests, I've adopted the technique from Gea-Suan Lin, it could
be found here :
<http://al.howardknight.net/?STYPE=msgid&MSGI=%3Cug7sh8%24pcc%241%40colo-sc-1.gslin.com%3E>

I will update the French documentation :
<https://git.mcos.nc/INN/inn_install>

--
Stéphane
UTILISATEURS de GOOGLE GROUPS, vous n'aurez bientôt plus accès à Usenet. <https://support.google.com/groups/answer/11036538>
Des serveurs gratuits de remplacement : <http://usenet-fr.yakakwatik.org>
Des logiciels : <http://usenet-fr.yakakwatik.org/lecteurs-de-news.html>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Ray Banana@21:1/5 to All on Sun Jan 28 11:22:51 2024

Thus spake yamo' <yamo@beurdin.invalid>

[...]

After some tests, I've adopted the technique from Gea-Suan Lin, it could
be found here :
<http://al.howardknight.net/?STYPE=msgid&MSGI=%3Cug7sh8%24pcc%241%40colo-sc-1.gslin.com%3E>

For performance reasons, especially if you receive a full text feed, I
would recommend to use spamd instead of starting spamassassin for every article:

my %myhdr = %hdr;
delete $myhdr{__BODY__};
delete $myhdr{__LINES__};
my $header_str = join "\n", map { "$_: $hdr{$_}" } keys %myhdr;
my $article_str = "$header_str\n\n$hdr{__BODY__}";
my $spamtest = Mail::SpamAssassin::Client->new({
port => /spamd port/,
host => /spamd host/,
username => 'news'}); # Use ~news/.spamassassin/user_prefs

my $result = $spamtest->process($article_str);
$score = $result->{score};

INN::syslog('notice', $hdr{'Message-ID'} . " Score: $score, isspam: " . $result->{isspam} );
if ($result->{isspam} =~ 'True') {
[...] # local proceessing, nocemize etc.
return 'SPAM';

} else {
[...] # local processing
}

--
Пу́тін — хуйло́
https://www.eternal-september.org

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From yamo'@21:1/5 to All on Sun Jan 28 19:58:20 2024

Hi Ray,

Ray Banana a tapoté le 28/01/2024 11:22:

Thus spake yamo' <yamo@beurdin.invalid>

[...]

After some tests, I've adopted the technique from Gea-Suan Lin, it could
be found here :
<http://al.howardknight.net/?STYPE=msgid&MSGI=%3Cug7sh8%24pcc%241%40colo-sc-1.gslin.com%3E>

For performance reasons, especially if you receive a full text feed, I
would recommend to use spamd instead of starting spamassassin for every article:

Thanks!

It works but I have to test a little more.

--
Stéphane
UTILISATEURS de GOOGLE GROUPS, vous n'aurez bientôt plus accès à Usenet. <https://support.google.com/groups/answer/11036538>
Des serveurs gratuits de remplacement : <http://usenet-fr.yakakwatik.org>
Des logiciels : <http://usenet-fr.yakakwatik.org/lecteurs-de-news.html>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	307
Nodes:	16 (2 / 14)
Uptime:	129:16:25
Calls:	6,854
Files:	12,360
Messages:	5,417,847

Getting spamassassin and clamav as inn filters

Who's Online

System Info