Jan 31 15:56:12 www innd: rejecting[perl] <T5gCL.2931033$miq3.1112593@usenetxs.com> 439 Binary: misplaced binary
It doesn't really tell me which news server is sending out misplaced
binaries
or which newsgroup is the culprit.
Hi all,
Jan 31 15:56:12 www innd: rejecting[perl] <T5gCL.2931033$miq3.1112593@usenetxs.com> 439 Binary: misplaced binary
Jan 31 15:56:14 www innd: rejecting[perl] <U5gCL.2931041$miq3.2915980@usenetxs.com> 439 Binary: misplaced binary
I'm getting hundreds, if not thousands of these.
It doesn't really tell me which news server is sending out misplaced
binaries or which newsgroup is the culprit.
Open to suggestions on figuring this one out. I'd really like it to
stop. I don't accept binaries for a reason. I don't have unlimited
bandwidth so I'd like to nip this junk in the bud.
Thanks,
Jan 31 15:56:12 www innd: rejecting[perl] <T5gCL.2931033$miq3.1112593@usenetxs.com> 439 Binary: misplaced binary
Jan 31 15:56:14 www innd: rejecting[perl] <U5gCL.2931041$miq3.2915980@usenetxs.com> 439 Binary: misplaced binary
I'm getting hundreds, if not thousands of these.
It doesn't really tell me which news server is sending out misplaced
binaries or which newsgroup is the culprit.
You can see which feed is sending that posts in /var/log/news/news but
the first server is not in the log files.
The first server is to some extent fundamentally unknowable since the Path header can be manipulated, but you could make the Perl filter log the Path header as well if you wanted to try to track things down to that extent.
I'm not propgating the articles, and have asked them to filter them out, but no response yet.
On Feb 3, 2023 at 4:41:53 AM CST, "yamo'" <yamo@beurdin.invalid> wrote:
Hi,
Jesse Rehmer a tapoté le 03/02/2023 11:23:
The original post was referencing "alt.b", which I was getting a ton of for >>> months and filtering. Your example is from a.b.erotica, and should have been
rejected by pyClean. I'm investigating, but have added more specific group >>> filters to your feed in the meantime.
Please reach out to me with more examples if it is still an issue.
You may have found the bug, the last one filtered by cleanfeed is :
Feb 3 11:05:55.650 - usenet.blueworldhosting.com
<HX4DL.844661$US27.24444@usenetxs.com> 439 Binary: misplaced binary
Thanks!
pyClean didn't work on my server...
I think I figured it out, or at least I am seeing it reject misplaced binaries
now. There were no error messages to be found, but on a hunch I removed pyclean/lib/* and restarted INN.
2023-02-03 04:45:09 INFO reject: mid=<5b323cbb146d4e14a9ec0afabb7d3660@ngPost>, reason=Binary (yEnc) 2023-02-03 04:45:11 INFO reject: mid=<63800.TQ.20230203.114510@teamquest.pl>, reason=EMP Body Reject
Hi,
Jesse Rehmer a tapoté le 01/02/2023 04:13:
I'm not propgating the articles, and have asked them to filter them out, but >> no response yet.
There may be something wrong in your configuration.
You send me some of this : <https://pasdenom.info/usenet/news-notice.2023.02.02-06.15.04.html#inn_unwanted>
I think there should be an entry "Binary" in this table.
Example (today) : <48a6d37ee00645eaa197950f15bd9c66@ngPost>
The original post was referencing "alt.b", which I was getting a ton of for months and filtering. Your example is from a.b.erotica, and should have been rejected by pyClean. I'm investigating, but have added more specific group filters to your feed in the meantime.
Please reach out to me with more examples if it is still an issue.
Hi,
Jesse Rehmer a tapoté le 03/02/2023 11:23:
The original post was referencing "alt.b", which I was getting a ton of for >> months and filtering. Your example is from a.b.erotica, and should have been >> rejected by pyClean. I'm investigating, but have added more specific group >> filters to your feed in the meantime.
Please reach out to me with more examples if it is still an issue.
You may have found the bug, the last one filtered by cleanfeed is :
Feb 3 11:05:55.650 - usenet.blueworldhosting.com <HX4DL.844661$US27.24444@usenetxs.com> 439 Binary: misplaced binary
Thanks!
pyClean didn't work on my server...
Diablo's "feeder" design is more scalable, even with Cleanfeed in
the mix, but will admit INN works best for a backend spool for me. I like having working control message processing, Cancel-Lock support, native TLS, etc.
I don't know whether it could be of help for your usage, notably for the reduced CPU amount needed for filtering across several intermediate
feeders.
As Russ stated, it's the filtering that's eating up CPU cycles. I can handle >50Mbps of incoming and >80Mbps outgoing traffic using ~20% of the CPU without
filtering, but when I turn on Cleanfeed or PyClean innd maxes out its core and
upstream peers start to spool.
Hi Jesse,
Diablo's "feeder" design is more scalable, even with Cleanfeed in
the mix, but will admit INN works best for a backend spool for me. I like
having working control message processing, Cancel-Lock support, native TLS, >> etc.
FWIW, Miquel van Smoorenburg added support in INN for Diablo's hashfeed.
It's the Q flag value in newsfeeds. It permits scaling feeders like
Diablo does with backend-servers.
You may then have the following architecture:
- a front innd transit server without overview/reader/filtering, just
feeding to N internal innd feeders (using Q to split the feed in N parts
in newsfeeds; or binaries to some feeders, text to others);
- N internal innd feeders doing filtering, without overview and reader;
- a final innd/nnrpd serving server, without filtering, but numbering
and storing articles, generating overview data and handling readers.
I don't know whether it could be of help for your usage, notably for the reduced CPU amount needed for filtering across several intermediate feeders.
50Mbps of incoming and >80Mbps outgoing traffic using ~20% of the CPU withoutfiltering, but when I turn on Cleanfeed or PyClean innd maxes out its core and upstream peers start to spool.
Hi Jesse,
As Russ stated, it's the filtering that's eating up CPU cycles. I can handle >>> 50Mbps of incoming and >80Mbps outgoing traffic using ~20% of the CPU without
filtering, but when I turn on Cleanfeed or PyClean innd maxes out its core and
upstream peers start to spool.
Do you think reordering the checks Cleanfeed does would be of help?
For instance having the is_binary() check as soon as possible instead of always having body parsing like:
$body = lc substr($hdr{__BODY__}, 0, 4000);
$state{badlines}++ while $hdr{__BODY__} =~ /[^\r]\n/g;
$hdr{__BODY__} =~ /
^[Bb][Ee][Gg][Ii][Nn]$hws+[0-7]{3,4}$hws+ # begin 666
(...long regexp for UUencoded...)
/mx;
I'm sure it would help. If the binary detection were performed first, and an article is identified as a binary then passed through without further checks would eliminate a lot of unncessary cycles.
I've been able to strip out most of the checks from Cleanfeed to one that is only checking for misplaced binaries and CPU impact is negligable compared to before. Beyond ripping everything out, I'm not a developer but sometimes manage to modify things to suit my needs. Likely not efficiently. :)
The one thing I noticed early on with PyClean was the binary articles pass through all the other checks, or at least the EMP check and every so often one
would be rejected from that part of the filter. That always seemed inefficient
and I experiemented with the variable it uses to exclude groups but that didn't seem to work to exclude them from the EMP filter.
If anyone has modified versions/patches they'd be interested in having me test
for performance I'm capable of tossing a steady stream of mixed article flow at it.
Hi Jesse,
I'm sure it would help. If the binary detection were performed first, and an >> article is identified as a binary then passed through without further checks >> would eliminate a lot of unncessary cycles.
Does it mean that if you turn off binary detection in Cleanfeed or
PyClean, there's no longer any huge CPU load?
This resulted in a very mixed article stream (lots of misplaced
binaries on commercial spools!), and is when I first noticed the
filtering bottleneck.
can you even fully disable the EMP filter in pyClean? >
Your suggestion of:
emp_exclude = ['\.']
Is a good one that I didn't think to try, but will!
I didn't understand what the test groups definition in pyClean was for, so I didn't experiment with that either.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 300 |
Nodes: | 16 (2 / 14) |
Uptime: | 78:52:42 |
Calls: | 6,716 |
Files: | 12,247 |
Messages: | 5,357,835 |