• github.com/go-while/nntp-overview

    From go-while@21:1/5 to All on Thu Jul 13 02:12:31 2023
    Hello World!

    just want to create a topic for my repo. maybe anyone finds it useful.

    https://github.com/go-while/nntp-overview

    it is running great so far and connected to a usenet server i'm writing.

    feeding mbox files from archive.org as hard as i can and the
    nntp-storage module (not released yet) in combo with nntp-overview
    writes articles faster than i can extract them from mbox files and feed
    them via nntp :=D

    why i'm doing this?
    i tried importing few tb from archive.org and did not wanna wait for
    month. inn2 is too slow. limited to single core and gets slower and
    slower with every more incoming articles it has to check vs history.
    diablo is hell to compile... config... documentation?
    any (performant) alternatives do(n't) exist?

    my code works without a history file. it can write one but i see no need
    why. the overview files represent a perfect history as long as the app
    not crashes and crushes the memory mapped overview files xD

    i'm simply storing articles with their messageid as sha256 in flat file structure. checking if an article exists is easy, stat the filesystem
    for the head file (and body if you want).

    storage engine splits head and body in different files and stores them
    in different parent dirs, 3 levels deep [a-f0-9] so heads and body can
    go to different hdd/ssd/nfs storage for example.

    good old reiserfs on zfs is quite fast with enough disks and ram :D
    still not sure about recordsize, need more benchmarking. bigger
    recordsize gives better compression but write amplification gets nasty.
    set to 32K at the moment but got a broken disk and waiting for
    resilvering to finish. almost done my playground! :D :=)


    state: DEGRADED
    status: One or more devices is currently being resilvered.
    The pool will continue to function, possibly in a degraded state.
    action: Wait for the resilver to complete.
    scan: resilver in progress since Wed Jul 12 16:32:32 2023
    39.2T scanned at 1.66G/s, 38.0T issued at 1.61G/s, 39.2T total
    1.79T resilvered, 96.95% done, 00:12:42 to go


    redis is also already included in the storage engine to map group/msgnum
    to storage/msgidhash or/and it creates softlinks, both from overview
    xref information. that's already working and you can set as many workers
    as you need =) even more workers that can push to redis but limit
    syncwrites for softlinks ;) maybe use redis, mongo or rocksdb as storage
    too or invent cyclic buffers in go but i only have 2 hands and a wife
    and i love both more than anything, my wife and go, gogo power rangers!

    what do you think?
    does the world need another usenet server written in go?
    do it anyways, as far as i can ;)

    i'll update when new server arrives and public access to the still
    importing archive is possible, maybe this weekend.

    a patch-1 is queued and i hope to find some beta tester here.

    go-while, over and out.



    https://github.com/go-while/nntp-overview

    nntp-overview

    nntp-overview generates .overview files per group from incoming usenet
    headers (POST, IHAVE, TAKETHIS).

    Generation is done in a concurrent way and files are mmap'ed while open.

    Overview file content is human readable based on RFC overview.FMT

    OV_Handler

    OV_Handler processes MMAP open/retrieve/park/close requests and
    schedules workers for writing overview data.

    The system keeps track of last message number per group when adding new overview to group.

    When integrated into a usenet server: works as a central message
    numbering station per group.

    Example integration in repo:
    https://github.com/go-while/nntp-overview_test


    License
    MIT

    Author
    go-while

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From rek2 hispagatos@21:1/5 to go-while on Thu Jul 13 02:52:43 2023
    On 2023-07-13, go-while <no-reply@no.spam> wrote:
    Hello World!

    just want to create a topic for my repo. maybe anyone finds it useful.

    https://github.com/go-while/nntp-overview

    Very cool.
    Rek2

    --
    {gemini,https}://{,rek2.}hispagatos.org
    https://hispagatos.space/@rek2

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From go-while@21:1/5 to All on Sun Jul 16 15:09:28 2023
    TEST DATASET: splitted 202G head + 232G body parts = 434G total

    imported from archive.org mbox files or other source:
    utzoo, netnews cdroms, funet.fi, ancientfj-2004, socmotsscrape
    usenet-[af-ch], parts of usenet-comp, usenet0203.tar and
    some more ...

    TEST DATASET on reiserfs3.6 on zfs:
    head and body are stored by sha256sum of messageid in folders: cache/[head|body]/[a-f0-9]/[a-f0-9]/[a-f0-9]/hash[3:].[head|body]

    the distribution over all 16 parent folders is very even
    and tables below show size of only 1 parent-directory
    guess a/*/*/.. or 0/*/*/..

    every head/[a-f0-9]/ dir occupies ~12.6G on reiserfs = 202G
    every body/[a-f0-9]/ dir occupies ~14.5G on reiserfs = 232G
    and got compressed down to head 82G + body 118G = 200G on zfs


    did some tests with different mkfs.XFS options
    -b size=___
    -i size=___
    -m crc=0 -i maxpct=100
    as guest on ZFS with compression=lz4 and recordsize=128K

    fallocate'd 11TB partitions on zfs
    => mkfs.XFS -m crc=0 -i maxpct=100 -b size=___ -i size=___ /dev/loopN
    => mount /dev/loopN ... and moved the dataset in

    size of
    # XFS | block inode used space used space CompRatio Ratio
    # Inodes | XFS -b XFS -i ZFS /head/ → XFS /head/ ZFS head XFS / ZFS
    # 42b | 512 256 2.2 GB 5.1 GB 2.7x 2.3
    # 42b | 1024 256 2.5 GB 5.8 GB 2.9x 2.3
    # 21b | 1024 512 2.5 GB 6.5 GB 3.3x 2.6
    # 42b | 2048 256 2.7 GB 7.4 GB 3.6x 2.7
    # 10b | 2048 1024 2.7 GB 9.7 GB 4.4x 3.6
    # 21b | 4096 512 3.0 GB 15.0 GB 5.6x 5.0

    size of
    # XFS | block inode used space used space CompRatio Ratio
    # Inodes | XFS -b XFS -i ZFS /body/ → XFS /body/ ZFS body XFS / ZFS
    # 42b | 512 256 4.1 GB 6.9 GB 1.9x 1.7
    # 42b | 1024 256 4.3 GB 7.9 GB 2.1x 1.8
    # 21b | 1024 512 4.3 GB 8.5 GB 2.3x 2.0
    # 42b | 2048 256 4.6 GB 9.9 GB 2.6x 2.2
    # 10b | 2048 1024 4.7 GB 13.0 GB 3.1x 2.8
    # 21b | 4096 512 4.9 GB 16.0 GB 3.8x 3.3


    what can we see?
    1. higher XFS inodesize: less xfs inodes
    2. higher XFS blocksize:
    more space usage on zfs and worsen on xfs
    higher (more not better*) zfs compression

    * "more" zfs compression is just nul? compression of wasted space
    and more used space for no reason?

    sadly no performance data because zfs was degraded while testing

    i think i'll go with '-b size=512 -i size=256' for now
    + less zfs compression but more xfs inodes
    + less space usage on both filesystems (zfs host and xfs guest).
    + 42 is the answer to everything!


    happy sunday!

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From go-while@21:1/5 to All on Mon Jul 24 15:25:56 2023
    XFS partitions on ZFS
    compression=lz4 recordsize=32K

    tank0/xfs/head compressratio 2.91x
    tank0/xfs/body compressratio 1.80x

    XFS partitions need this space:

    Filesystem Used Mounted
    tank0/xfs/head/a 13G /tank0/xfs/head/a
    tank0/xfs/body/a 28G /tank0/xfs/body/a
    tank0/xfs/head/b 13G /tank0/xfs/head/b
    tank0/xfs/body/b 28G /tank0/xfs/body/b
    tank0/xfs/head/c 13G /tank0/xfs/head/c
    tank0/xfs/body/c 28G /tank0/xfs/body/c
    tank0/xfs/head/d 13G /tank0/xfs/head/d
    tank0/xfs/body/d 28G /tank0/xfs/body/d
    tank0/xfs/head/e 13G /tank0/xfs/head/e
    tank0/xfs/body/e 28G /tank0/xfs/body/e
    tank0/xfs/head/f 13G /tank0/xfs/head/f
    tank0/xfs/body/f 28G /tank0/xfs/body/f
    tank0/xfs/head/0 13G /tank0/xfs/head/0
    tank0/xfs/body/0 28G /tank0/xfs/body/0
    tank0/xfs/head/1 13G /tank0/xfs/head/1
    tank0/xfs/body/1 28G /tank0/xfs/body/1
    tank0/xfs/head/2 13G /tank0/xfs/head/2
    tank0/xfs/body/2 28G /tank0/xfs/body/2
    tank0/xfs/head/3 13G /tank0/xfs/head/3
    tank0/xfs/body/3 28G /tank0/xfs/body/3
    tank0/xfs/head/4 13G /tank0/xfs/head/4
    tank0/xfs/body/4 28G /tank0/xfs/body/4
    tank0/xfs/head/5 13G /tank0/xfs/head/5
    tank0/xfs/body/5 28G /tank0/xfs/body/5
    tank0/xfs/head/6 13G /tank0/xfs/head/6
    tank0/xfs/body/6 28G /tank0/xfs/body/6
    tank0/xfs/head/7 13G /tank0/xfs/head/7
    tank0/xfs/body/7 28G /tank0/xfs/body/7
    tank0/xfs/head/8 13G /tank0/xfs/head/8
    tank0/xfs/body/8 28G /tank0/xfs/body/8
    tank0/xfs/head/9 13G /tank0/xfs/head/9
    tank0/xfs/body/9 28G /tank0/xfs/body/9


    XFS itself says it uses this space:

    Filesystem Used Mounted
    /dev/loop32 34G /mnt/xfs/head/a
    /dev/loop33 34G /mnt/xfs/head/b
    /dev/loop34 34G /mnt/xfs/head/c
    /dev/loop35 34G /mnt/xfs/head/d
    /dev/loop36 34G /mnt/xfs/head/e
    /dev/loop37 34G /mnt/xfs/head/f
    /dev/loop38 34G /mnt/xfs/head/0
    /dev/loop39 34G /mnt/xfs/head/1
    /dev/loop40 34G /mnt/xfs/head/2
    /dev/loop41 34G /mnt/xfs/head/3
    /dev/loop42 34G /mnt/xfs/head/4
    /dev/loop43 34G /mnt/xfs/head/5
    /dev/loop44 34G /mnt/xfs/head/6
    /dev/loop45 34G /mnt/xfs/head/7
    /dev/loop46 34G /mnt/xfs/head/8
    /dev/loop47 34G /mnt/xfs/head/9
    /dev/loop48 49G /mnt/xfs/body/a
    /dev/loop49 49G /mnt/xfs/body/b
    /dev/loop50 49G /mnt/xfs/body/c
    /dev/loop51 49G /mnt/xfs/body/d
    /dev/loop52 49G /mnt/xfs/body/e
    /dev/loop53 49G /mnt/xfs/body/f
    /dev/loop54 49G /mnt/xfs/body/0
    /dev/loop55 49G /mnt/xfs/body/1
    /dev/loop56 49G /mnt/xfs/body/2
    /dev/loop57 49G /mnt/xfs/body/3
    /dev/loop58 49G /mnt/xfs/body/4
    /dev/loop59 49G /mnt/xfs/body/5
    /dev/loop60 49G /mnt/xfs/body/6
    /dev/loop61 49G /mnt/xfs/body/7
    /dev/loop62 49G /mnt/xfs/body/8
    /dev/loop63 49G /mnt/xfs/body/9


    moving the same data (raw .head + .body files)
    from XFS into a ZFS dataset


    tank0/cache/head compressratio 1.00x
    tank0/cache/body compressratio 1.17x

    results in this used space on ZFS?!

    tank0/cache/head/a 116G /tank0/cache/head/a
    tank0/cache/head/b 116G /tank0/cache/head/b
    ..
    tank0/cache/body/a 124G /tank0/cache/body/a
    tank0/cache/body/b 124G /tank0/cache/body/b
    ..

    i stopped moving it...

    looks like zfs does not like many very small files aka usenet articles
    and compression does not work.


    finally inodes: XFS vs ZFS raw dataset are almost identical

    /dev/loop32 42846221114 19964404 /mnt/xfs/head/a
    /dev/loop33 42846243638 19958208 /mnt/xfs/head/b
    /dev/loop48 42783061340 19964404 /mnt/xfs/body/a
    /dev/loop49 42783346284 19958208 /mnt/xfs/body/b
    ...

    tank0/cache/head/a 50835458823 19964404 /tank0/cache/head/a
    tank0/cache/head/b 50835452630 19958211 /tank0/cache/head/b
    tank0/cache/body/a 50835159146 19664727 /tank0/cache/body/a
    tank0/cache/body/b 50835452630 19958211 /tank0/cache/body/b
    ...



    Kind Regards

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jesse Rehmer@21:1/5 to go-while on Wed Jul 26 15:27:32 2023
    On Jul 24, 2023 at 8:25:56 AM CDT, "go-while" <no-reply@no.spam> wrote:

    XFS partitions on ZFS
    compression=lz4 recordsize=32K

    tank0/xfs/head compressratio 2.91x
    tank0/xfs/body compressratio 1.80x

    XFS partitions need this space:

    Filesystem Used Mounted
    tank0/xfs/head/a 13G /tank0/xfs/head/a
    tank0/xfs/body/a 28G /tank0/xfs/body/a
    tank0/xfs/head/b 13G /tank0/xfs/head/b
    tank0/xfs/body/b 28G /tank0/xfs/body/b
    tank0/xfs/head/c 13G /tank0/xfs/head/c
    tank0/xfs/body/c 28G /tank0/xfs/body/c
    tank0/xfs/head/d 13G /tank0/xfs/head/d
    tank0/xfs/body/d 28G /tank0/xfs/body/d
    tank0/xfs/head/e 13G /tank0/xfs/head/e
    tank0/xfs/body/e 28G /tank0/xfs/body/e
    tank0/xfs/head/f 13G /tank0/xfs/head/f
    tank0/xfs/body/f 28G /tank0/xfs/body/f
    tank0/xfs/head/0 13G /tank0/xfs/head/0
    tank0/xfs/body/0 28G /tank0/xfs/body/0
    tank0/xfs/head/1 13G /tank0/xfs/head/1
    tank0/xfs/body/1 28G /tank0/xfs/body/1
    tank0/xfs/head/2 13G /tank0/xfs/head/2
    tank0/xfs/body/2 28G /tank0/xfs/body/2
    tank0/xfs/head/3 13G /tank0/xfs/head/3
    tank0/xfs/body/3 28G /tank0/xfs/body/3
    tank0/xfs/head/4 13G /tank0/xfs/head/4
    tank0/xfs/body/4 28G /tank0/xfs/body/4
    tank0/xfs/head/5 13G /tank0/xfs/head/5
    tank0/xfs/body/5 28G /tank0/xfs/body/5
    tank0/xfs/head/6 13G /tank0/xfs/head/6
    tank0/xfs/body/6 28G /tank0/xfs/body/6
    tank0/xfs/head/7 13G /tank0/xfs/head/7
    tank0/xfs/body/7 28G /tank0/xfs/body/7
    tank0/xfs/head/8 13G /tank0/xfs/head/8
    tank0/xfs/body/8 28G /tank0/xfs/body/8
    tank0/xfs/head/9 13G /tank0/xfs/head/9
    tank0/xfs/body/9 28G /tank0/xfs/body/9


    XFS itself says it uses this space:

    Filesystem Used Mounted
    /dev/loop32 34G /mnt/xfs/head/a
    /dev/loop33 34G /mnt/xfs/head/b
    /dev/loop34 34G /mnt/xfs/head/c
    /dev/loop35 34G /mnt/xfs/head/d
    /dev/loop36 34G /mnt/xfs/head/e
    /dev/loop37 34G /mnt/xfs/head/f
    /dev/loop38 34G /mnt/xfs/head/0
    /dev/loop39 34G /mnt/xfs/head/1
    /dev/loop40 34G /mnt/xfs/head/2
    /dev/loop41 34G /mnt/xfs/head/3
    /dev/loop42 34G /mnt/xfs/head/4
    /dev/loop43 34G /mnt/xfs/head/5
    /dev/loop44 34G /mnt/xfs/head/6
    /dev/loop45 34G /mnt/xfs/head/7
    /dev/loop46 34G /mnt/xfs/head/8
    /dev/loop47 34G /mnt/xfs/head/9
    /dev/loop48 49G /mnt/xfs/body/a
    /dev/loop49 49G /mnt/xfs/body/b
    /dev/loop50 49G /mnt/xfs/body/c
    /dev/loop51 49G /mnt/xfs/body/d
    /dev/loop52 49G /mnt/xfs/body/e
    /dev/loop53 49G /mnt/xfs/body/f
    /dev/loop54 49G /mnt/xfs/body/0
    /dev/loop55 49G /mnt/xfs/body/1
    /dev/loop56 49G /mnt/xfs/body/2
    /dev/loop57 49G /mnt/xfs/body/3
    /dev/loop58 49G /mnt/xfs/body/4
    /dev/loop59 49G /mnt/xfs/body/5
    /dev/loop60 49G /mnt/xfs/body/6
    /dev/loop61 49G /mnt/xfs/body/7
    /dev/loop62 49G /mnt/xfs/body/8
    /dev/loop63 49G /mnt/xfs/body/9


    moving the same data (raw .head + .body files)
    from XFS into a ZFS dataset


    tank0/cache/head compressratio 1.00x
    tank0/cache/body compressratio 1.17x

    results in this used space on ZFS?!

    tank0/cache/head/a 116G /tank0/cache/head/a
    tank0/cache/head/b 116G /tank0/cache/head/b
    ..
    tank0/cache/body/a 124G /tank0/cache/body/a
    tank0/cache/body/b 124G /tank0/cache/body/b
    ..

    i stopped moving it...

    looks like zfs does not like many very small files aka usenet articles
    and compression does not work.


    finally inodes: XFS vs ZFS raw dataset are almost identical

    /dev/loop32 42846221114 19964404 /mnt/xfs/head/a
    /dev/loop33 42846243638 19958208 /mnt/xfs/head/b
    /dev/loop48 42783061340 19964404 /mnt/xfs/body/a
    /dev/loop49 42783346284 19958208 /mnt/xfs/body/b
    ...

    tank0/cache/head/a 50835458823 19964404 /tank0/cache/head/a tank0/cache/head/b 50835452630 19958211 /tank0/cache/head/b tank0/cache/body/a 50835159146 19664727 /tank0/cache/body/a tank0/cache/body/b 50835452630 19958211 /tank0/cache/body/b
    ...



    Kind Regards

    For what it's worth, using INN with tradspool I don't see hardly any compression (ZFS), but when using CNFS buffers I get a little over 3x.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Garrett Wollman@21:1/5 to jesse.rehmer@blueworldhosting.com on Wed Jul 26 17:10:22 2023
    In article <u9re14$ecv$1@nnrp.usenet.blueworldhosting.com>,
    Jesse Rehmer <jesse.rehmer@blueworldhosting.com> wrote:

    For what it's worth, using INN with tradspool I don't see hardly any >compression (ZFS), but when using CNFS buffers I get a little over 3x.

    On my small text-only server with tradspool I see:

    NAME PROPERTY VALUE SOURCE rootvg/root/usr/local/news/spool compressratio 1.43x -

    ...but that's less than 10 GiB (about 880,000 articles).

    -GAWollman

    --
    Garrett A. Wollman | "Act to avoid constraining the future; if you can, wollman@bimajority.org| act to remove constraint from the future. This is Opinions not shared by| a thing you can do, are able to do, to do together."
    my employers. | - Graydon Saunders, _A Succession of Bad Days_ (2015)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jesse Rehmer@21:1/5 to All on Wed Jul 26 19:16:00 2023
    On Jul 26, 2023 at 12:10:22 PM CDT, "Garrett Wollman" <Garrett Wollman> wrote:

    In article <u9re14$ecv$1@nnrp.usenet.blueworldhosting.com>,
    Jesse Rehmer <jesse.rehmer@blueworldhosting.com> wrote:

    For what it's worth, using INN with tradspool I don't see hardly any
    compression (ZFS), but when using CNFS buffers I get a little over 3x.

    On my small text-only server with tradspool I see:

    NAME PROPERTY VALUE SOURCE rootvg/root/usr/local/news/spool compressratio 1.43x -

    ...but that's less than 10 GiB (about 880,000 articles).

    -GAWollman

    Just under 185,000,000 articles here, duplicated between two boxes, approximately 900GB of articles and 130GB overview:

    tradspool:

    $ zfs get compressratio
    NAME PROPERTY VALUE SOURCE
    zroot compressratio 1.11x -

    CNFS:

    $ zfs get compressratio
    NAME PROPERTY VALUE SOURCE
    zroot compressratio 3.03x -

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Andre@21:1/5 to jesse.rehmer@blueworldhosting.com on Thu Jul 27 11:07:04 2023
    On 26 Jul 2023 at 21:16:00 CEST, "Jesse Rehmer" <jesse.rehmer@blueworldhosting.com> wrote:

    Just under 185,000,000 articles here, duplicated between two boxes, approximately 900GB of articles and 130GB overview:

    tradspool:

    $ zfs get compressratio
    NAME PROPERTY VALUE SOURCE
    zroot compressratio 1.11x -

    CNFS:

    $ zfs get compressratio
    NAME PROPERTY VALUE SOURCE
    zroot compressratio 3.03x -

    Thanks for the info. That's really interesting.

    Here 451,167,356 text only articles on ZFS with lz4 compression.

    timecaf:
    compressratio 2.08x
    used 568G
    logicalused 1.16T

    overview tradindexed (xfs):
    264G

    So it seems that CNFS works better for compression than timecaf.
    Do you also use lz4?

    --
    Thanks,
    Andre.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From go-while@21:1/5 to Andre on Thu Jul 27 15:47:37 2023
    On 27.07.23 13:07, Andre wrote:
    So it seems that CNFS works better for compression than timecaf.
    Do you also use lz4?


    i'm not sure if "more" compression with cnfs is really better.
    may be more null/sparse compression of not totally filled blocks?
    but who cares. lz4 does the job and vs. tradspool wins without question.


    cnfs storage and buffindexed overview
    space on disk after compression

    tank0/cycbufs/gwene compressratio 5.13x - 302G
    tank0/history/gwene compressratio 1.87x - 21G
    tank0/overview/gwene compressratio 2.42x - 47G

    tank0/cycbufs/gmane compressratio 2.97x - 685G
    tank0/history/gmane compressratio 1.87x - 11G
    tank0/overview/gmane compressratio 2.97x - 28G

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)