• Good compression

    From sundaresh.venugopal@gmail.com@21:1/5 to All on Wed Jun 10 07:03:08 2020
    The web site is www.complimentaryware.unaux.com . I have an excellent method on PDF, an unpublished research of my own but this newsgroup does not allow attachments.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Elhana@21:1/5 to All on Tue Jun 16 06:11:28 2020
    sundaresh...@gmail.com:

    The web site is www.complimentaryware.unaux.com . I have an excellent method on
    PDF, an unpublished research of my own but this newsgroup does not allow attachments.

    Show us the code.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From sundaresh.venugopal@gmail.com@21:1/5 to All on Tue Jun 16 08:37:40 2020
    I started coding but shelved it and put it in the background since l am also simultaneously developing and coding other s/w as well. But why do you ask ? I am quite certain there is nothing wrong with the method.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eli the Bearded@21:1/5 to sundaresh.venugopal@gmail.com on Wed Jun 17 22:02:05 2020
    In comp.compression, <sundaresh.venugopal@gmail.com> wrote:
    I started coding but shelved it and put it in the background since l
    am also simultaneously developing and coding other s/w as well. But
    why do you ask ? I am quite certain there is nothing wrong with the
    method.

    This group is littered with the dead claims of people who said they had
    a better compression method but could never prove it with code.

    Elijah
    ------
    "compression" seems to attrack kooks as much as "perpetual motion"

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Helm@21:1/5 to sundaresh.venugopal@gmail.com on Wed Jun 17 17:58:45 2020
    On 2020-06-10 14:03:08 +0000, sundaresh.venugopal@gmail.com said:

    The web site is www.complimentaryware.unaux.com . I have an excellent method on
    PDF, an unpublished research of my own but this newsgroup does not
    allow attachments.

    You can always attach it if you use a newsreader instead of the Google
    Groups interface (that also puts your IP in the headers by the way).
    Don't do it however since all the free newsservers won't be able to
    hold your post thus reducing the visibility and reach. Instead, upload
    it to a website like WeTransfer and post the link in the newsgroup.

    Good research though! Look forward to learning more.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From sundaresh.venugopal@gmail.com@21:1/5 to Helm on Wed Jun 17 20:59:23 2020
    On Thursday, June 18, 2020 at 3:28:47 AM UTC+5:30, Helm wrote:
    On 2020-06-10 14:03:08 +0000, sundaresh.venugopal@gmail.com said:

    The web site is www.complimentaryware.unaux.com . I have an excellent method on
    PDF, an unpublished research of my own but this newsgroup does not
    allow attachments.

    You can always attach it if you use a newsreader instead of the Google
    Groups interface (that also puts your IP in the headers by the way).
    Don't do it however since all the free newsservers won't be able to
    hold your post thus reducing the visibility and reach. Instead, upload
    it to a website like WeTransfer and post the link in the newsgroup.

    Good research though! Look forward to learning more.

    Well thanks ! One encouraging response . Here is the link

    https://drive.google.com/file/d/1w5sNV_YVM9XGymz8A0MT0V_cubxMPYCM/view?usp=sharing

    If it does not work, let me know.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From sundaresh.venugopal@gmail.com@21:1/5 to Eli the Bearded on Wed Jun 17 21:58:37 2020
    On Thursday, June 18, 2020 at 3:32:07 AM UTC+5:30, Eli the Bearded wrote:
    In comp.compression, <sundaresh.venugopal@gmail.com> wrote:
    I started coding but shelved it and put it in the background since l
    am also simultaneously developing and coding other s/w as well. But
    why do you ask ? I am quite certain there is nothing wrong with the
    method.

    This group is littered with the dead claims of people who said they had
    a better compression method but could never prove it with code.

    Elijah
    ------

    "compression" seems to attrack kooks as much as "perpetual motion"

    The little(of the other stuff) I have coded works extremely well, just as I expected, in fact in most cases surprisingly better. All I am saying is, proof of the recipe is different from proof of the pudding, a truth which is even more exacting w.r.t
    recipe's and pudding's in the world of computing or as you may call it the virtual world. Speaking only for myself, and in all candor I will admit, having tried my hand at cooking, I am a much better cook today than I was when I started off cooking, both
    on my kitchen table and on my computer table.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott@21:1/5 to *@eli.users.panix.com on Thu Jun 18 16:13:42 2020
    On Wed, 17 Jun 2020 22:02:05 +0000 (UTC), Eli the Bearded <*@eli.users.panix.com> wrote:

    In comp.compression, <sundaresh.venugopal@gmail.com> wrote:
    I started coding but shelved it and put it in the background since l
    am also simultaneously developing and coding other s/w as well. But
    why do you ask ? I am quite certain there is nothing wrong with the
    method.

    This group is littered with the dead claims of people who said they had
    a better compression method but could never prove it with code.

    Oh, it's not that bad. Recursive compression is actually pretty easy
    to do. The decompressor turns out to be a good deal trickier, though.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From BGB@21:1/5 to Eli the Bearded on Thu Jun 18 14:39:10 2020
    On 6/17/2020 5:02 PM, Eli the Bearded wrote:
    In comp.compression, <sundaresh.venugopal@gmail.com> wrote:
    I started coding but shelved it and put it in the background since l
    am also simultaneously developing and coding other s/w as well. But
    why do you ask ? I am quite certain there is nothing wrong with the
    method.

    This group is littered with the dead claims of people who said they had
    a better compression method but could never prove it with code.

    Elijah
    ------
    "compression" seems to attrack kooks as much as "perpetual motion"


    Yep.


    Luckily at least, data compression is more objectively measurable.


    Things like audio and video compression also involve a lot of subjective tradeoffs, like how do the artifacts of one codec look or sound relative
    to another, and I have noted that it seems my perceptions don't really
    agree with others.

    Eg: I seem to prefer a loss of dynamic range and an increase in
    "sharpness" a lot more readily than a loss of detail and blurring;
    whereas many other people would apparently much rather have a blurry
    mess than sharp high-contrast edges.

    Similarly for audio: I prefer codecs which decay into a "metallic" sound
    over ones which introduce a bunch of warbling and whistling at lower
    bitrates.

    Eg: MP3 below ~ 60 kbps sounds a bit unpleasant with all the whistling
    and warbling and similar, whereas 11KHz ADPCM still sounds pretty
    reasonable in comparison.

    The result of this has partly been a lot of my own codecs using
    Color-Cell and VQ style technology, along with ADPCM style audio codecs,
    which apparently a lot of other people perceive as looking and sounding
    awful.



    But, yeah, as for data compression, don't have all that much notable.

    I had a few compressors, which did pretty good (vs, eg, LZ4 and Zstd) at
    being fast on my past computers (K10 and FX), but after switching to a
    Ryzen then both LZ4 and Zstd got a pretty significant speed-up.

    Getting high compression ratios is, granted, a little harder...


    Recently, managed to pull off something which gets slightly better
    compression at comparable decode speeds to LZ4 (on my Ryzen), and is
    faster to decode than LZ4 on my own ISA (BJX2).

    Decode speeds on a 3.7 GHz Ryzen in both cases is a little over 2GB/sec.
    I haven't yet gotten around to running tests on ARM.


    Compression for many files seems to be part-way between LZ4 and Deflate.
    Note that, like LZ4, it is byte oriented with no entropy coding.

    <--

    BtRP2 (Transposed, LE):
    * dddddddd-dlllrrr0 (l=3..10, d=0..511, r=0..7)
    * dddddddd-dddddlll-lllrrr01 (l=4..67, d=0..8191)
    * dddddddd-dddddddd-dlllllll-llrrr011 (l=4..515, d=0..131071)
    * rrrr0111 (Raw Bytes, r=(r+1)*8, 8..128)
    * * rrr01111 (Long Match)
    * rr011111 (r=1..3 bytes, 0=EOB)
    * rrrrrrrr-r0111111 (Long Raw, r=(r+1)*8, 8..4096)
    ** d: Distance
    ** l: Match Length
    ** r: Literal Length

    Values are encoded in little-endian order, with tag bits located in the
    LSB. Bits will be contiguous within the value, with shift-and-mask being
    used to extract individual elements.

    Long Match will encode length and distance using variable-length
    encodings directly following the initial tag byte.

    Length VLN:
    lllllll0, 4.. 131
    llllllll-llllll01, 132..16383

    Distance VLN:
    dddddddd-ddddddd0, 32K (0..32767)
    dddddddd-dddddddd-dddddd01, 4M




    The compression could be improved a little more in the Long Match case
    via a few more conjoined cases, but the gain in compression would be
    pretty small relative to the amount of code added (as the first few
    cases handle the vast majority of LZ matches).

    I had also left out adding 1 to the distance, mostly because in my tests
    this had very little effect on compression, but a somewhat more
    noticeable effect on decode speed.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From icepic.dz@gmail.com@21:1/5 to All on Thu Jul 2 06:38:21 2020
    Den torsdag 18 juni 2020 kl. 05:59:27 UTC+2 skrev sundaresh...@gmail.com:

    The web site is www.complimentaryware.unaux.com . I have an excellent method on
    PDF, an unpublished research of my own but this newsgroup does not
    allow attachments.

    You can always attach it if you use a newsreader instead of the Google Groups interface (that also puts your IP in the headers by the way).
    Don't do it however since all the free newsservers won't be able to
    hold your post thus reducing the visibility and reach. Instead, upload
    it to a website like WeTransfer and post the link in the newsgroup.

    Good research though! Look forward to learning more.

    Well thanks ! One encouraging response . Here is the link

    https://drive.google.com/file/d/1w5sNV_YVM9XGymz8A0MT0V_cubxMPYCM/view?usp=sharing

    If it does not work, let me know.

    It doesn't work.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From sundaresh.venugopal@gmail.com@21:1/5 to icep...@gmail.com on Thu Jul 2 20:38:41 2020
    On Thursday, July 2, 2020 at 7:08:23 PM UTC+5:30, icep...@gmail.com wrote:
    Den torsdag 18 juni 2020 kl. 05:59:27 UTC+2 skrev sundaresh...@gmail.com:

    The web site is www.complimentaryware.unaux.com . I have an excellent method on
    PDF, an unpublished research of my own but this newsgroup does not allow attachments.

    You can always attach it if you use a newsreader instead of the Google Groups interface (that also puts your IP in the headers by the way). Don't do it however since all the free newsservers won't be able to
    hold your post thus reducing the visibility and reach. Instead, upload
    it to a website like WeTransfer and post the link in the newsgroup.

    Good research though! Look forward to learning more.

    Well thanks ! One encouraging response . Here is the link

    https://drive.google.com/file/d/1w5sNV_YVM9XGymz8A0MT0V_cubxMPYCM/view?usp=sharing

    If it does not work, let me know.

    It doesn't work.

    Really, have you tried coding it. If you have, can I see the code ? You can do just what I have done. In the meantime I have made significant improvements to it, and it should be visible in the link and I will be coding this new version.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?B?RsO2bGRlcyBMw6FzemzDsw==?@21:1/5 to sundaresh...@gmail.com on Fri Jul 3 03:58:44 2020
    On Friday, July 3, 2020 at 5:38:42 AM UTC+2, sundaresh...@gmail.com wrote:
    On Thursday, July 2, 2020 at 7:08:23 PM UTC+5:30, icep...@gmail.com wrote:
    Den torsdag 18 juni 2020 kl. 05:59:27 UTC+2 skrev sundaresh...@gmail.com:

    The web site is www.complimentaryware.unaux.com . I have an excellent method on
    PDF, an unpublished research of my own but this newsgroup does not allow attachments.

    You can always attach it if you use a newsreader instead of the Google Groups interface (that also puts your IP in the headers by the way). Don't do it however since all the free newsservers won't be able to hold your post thus reducing the visibility and reach. Instead, upload it to a website like WeTransfer and post the link in the newsgroup.

    Good research though! Look forward to learning more.

    Well thanks ! One encouraging response . Here is the link

    https://drive.google.com/file/d/1w5sNV_YVM9XGymz8A0MT0V_cubxMPYCM/view?usp=sharing

    If it does not work, let me know.

    It doesn't work.

    Really, have you tried coding it. If you have, can I see the code ? You can do just what I have done. In the meantime I have made significant improvements to it, and it should be visible in the link and I will be coding this new version.

    This paper talks about a method with an example applied to a single code but a single code has no entropy (on which compression methods work).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From icepic.dz@gmail.com@21:1/5 to All on Fri Jul 3 06:28:10 2020
    Den fredag 3 juli 2020 kl. 05:38:42 UTC+2 skrev sundaresh...@gmail.com:
    On Thursday, July 2, 2020 at 7:08:23 PM UTC+5:30, icep...@gmail.com wrote:
    Den torsdag 18 juni 2020 kl. 05:59:27 UTC+2 skrev sundaresh...@gmail.com:

    The web site is www.complimentaryware.unaux.com . I have an excellent method on
    PDF, an unpublished research of my own but this newsgroup does not allow attachments.

    You can always attach it if you use a newsreader instead of the Google Groups interface (that also puts your IP in the headers by the way). Don't do it however since all the free newsservers won't be able to hold your post thus reducing the visibility and reach. Instead, upload it to a website like WeTransfer and post the link in the newsgroup.

    Good research though! Look forward to learning more.

    Well thanks ! One encouraging response . Here is the link

    https://drive.google.com/file/d/1w5sNV_YVM9XGymz8A0MT0V_cubxMPYCM/view?usp=sharing

    If it does not work, let me know.

    It doesn't work.

    Really, have you tried coding it. If you have, can I see the code ? You can do just what I have done. In the meantime I have made significant improvements to it, and it should be visible in the link and I will be coding this new version.

    Please post the code you have then, and the results.

    You are the one making a claim that a large number M can be written as a smaller number S by transforming it in one of several ways, which the pigeonhole principle will show anyone will not work for all numbers from 0 to M, since they can't all fit in S.

    You can find S numbers out of M that do, but even selecting which set of Ses is out of the set 0->M will cost bits.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Sundaresh Venugopal@21:1/5 to icep...@gmail.com on Wed Aug 19 02:45:23 2020
    On Friday, July 3, 2020 at 6:58:11 PM UTC+5:30, icep...@gmail.com wrote:
    Den fredag 3 juli 2020 kl. 05:38:42 UTC+2 skrev sundaresh...@gmail.com:
    On Thursday, July 2, 2020 at 7:08:23 PM UTC+5:30, icep...@gmail.com wrote:
    Den torsdag 18 juni 2020 kl. 05:59:27 UTC+2 skrev sundaresh...@gmail.com:

    The web site is www.complimentaryware.unaux.com . I have an excellent method on
    PDF, an unpublished research of my own but this newsgroup does not allow attachments.

    You can always attach it if you use a newsreader instead of the Google
    Groups interface (that also puts your IP in the headers by the way). Don't do it however since all the free newsservers won't be able to hold your post thus reducing the visibility and reach. Instead, upload
    it to a website like WeTransfer and post the link in the newsgroup.

    Good research though! Look forward to learning more.

    Well thanks ! One encouraging response . Here is the link

    https://drive.google.com/file/d/1w5sNV_YVM9XGymz8A0MT0V_cubxMPYCM/view?usp=sharing

    If it does not work, let me know.

    It doesn't work.

    Really, have you tried coding it. If you have, can I see the code ? You can do just what I have done. In the meantime I have made significant improvements to it, and it should be visible in the link and I will be coding this new version.
    Please post the code you have then, and the results.

    You are the one making a claim that a large number M can be written as a smaller number S by transforming it in one of several ways, which the pigeonhole principle will show anyone will not work for all numbers from 0 to M, since they can't all fit in
    S.

    You can find S numbers out of M that do, but even selecting which set of Ses is out of the set 0->M will cost bits.
    I will see what I an do.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Sundaresh Venugopal@21:1/5 to Sundaresh Venugopal on Wed Aug 19 03:23:18 2020
    On Wednesday, August 19, 2020 at 3:15:24 PM UTC+5:30, Sundaresh Venugopal wrote:
    On Friday, July 3, 2020 at 6:58:11 PM UTC+5:30, icep...@gmail.com wrote:
    Den fredag 3 juli 2020 kl. 05:38:42 UTC+2 skrev sundaresh...@gmail.com:
    On Thursday, July 2, 2020 at 7:08:23 PM UTC+5:30, icep...@gmail.com wrote:
    Den torsdag 18 juni 2020 kl. 05:59:27 UTC+2 skrev sundaresh...@gmail.com:

    The web site is www.complimentaryware.unaux.com . I have an excellent method on
    PDF, an unpublished research of my own but this newsgroup does not
    allow attachments.

    You can always attach it if you use a newsreader instead of the Google
    Groups interface (that also puts your IP in the headers by the way).
    Don't do it however since all the free newsservers won't be able to
    hold your post thus reducing the visibility and reach. Instead, upload
    it to a website like WeTransfer and post the link in the newsgroup.

    Good research though! Look forward to learning more.

    Well thanks ! One encouraging response . Here is the link

    https://drive.google.com/file/d/1w5sNV_YVM9XGymz8A0MT0V_cubxMPYCM/view?usp=sharing

    If it does not work, let me know.

    It doesn't work.

    Really, have you tried coding it. If you have, can I see the code ? You can do just what I have done. In the meantime I have made significant improvements to it, and it should be visible in the link and I will be coding this new version.
    Please post the code you have then, and the results.

    You are the one making a claim that a large number M can be written as a smaller number S by transforming it in one of several ways, which the pigeonhole principle will show anyone will not work for all numbers from 0 to M, since they can't all fit
    in S.

    You can find S numbers out of M that do, but even selecting which set of Ses is out of the set 0->M will cost bits.
    I will see what I an do.
    Sorry. Meant, I will see what I can do, but not to convince people like you.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From icepic.dz@gmail.com@21:1/5 to All on Wed Aug 19 09:59:42 2020
    Please post the code you have then, and the results.

    You are the one making a claim that a large number M can be written as a smaller number S by transforming it in one of several ways, which the pigeonhole principle will show anyone will not work for all numbers from 0 to M, since they can't all fit
    in S.

    You can find S numbers out of M that do, but even selecting which set of Ses is out of the set 0->M will cost bits.
    I will see what I an do.
    Sorry. Meant, I will see what I can do, but not to convince people like you.

    I am not expecting you to do any work for me, I was suggesting you do the work for *you*, so you see how it does, or does not work for any numbers or all numbers.

    It is quite common for people to assume things like "I can find my wanted large number in the decimals of PI and will refer to it by its position" which is true, but is not usable for compression since the pointer will be as long, or longer than the
    number when you take all possible combinations into account.

    If you are only envisioning "I will look for the number 141592653589" then yes, you can find a very short index for it. If you instead look for the far shorter number "93774" then it will not appear in the first 100000 digits.

    So what you need to do in order to create fame for yourself with a new novel way of compressing data is 1) understand the pigeon hole principle (everyone in compression research must understand this one) and 2) make some kind of implementation of your
    idea to know when it will *not* work.

    Good compressors work on something like 1% of the inputs, they just chose the 1% which are commonly found in files on computers. For all the 99% other kinds of data which are uncommon, they will expand, but if you don't hand such data to your compressor,
    noone notices and all is fine.

    So by making the implementation, you have a good chance of finding out what your "99%" are.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Sundaresh Venugopal@21:1/5 to icep...@gmail.com on Wed Aug 19 17:08:52 2020
    On Wednesday, August 19, 2020 at 10:29:45 PM UTC+5:30, icep...@gmail.com wrote:
    Please post the code you have then, and the results.

    You are the one making a claim that a large number M can be written as a smaller number S by transforming it in one of several ways, which the pigeonhole principle will show anyone will not work for all numbers from 0 to M, since they can't all
    fit in S.

    You can find S numbers out of M that do, but even selecting which set of Ses is out of the set 0->M will cost bits.
    I will see what I an do.
    Sorry. Meant, I will see what I can do, but not to convince people like you.
    I am not expecting you to do any work for me, I was suggesting you do the work for *you*, so you see how it does, or does not work for any numbers or all numbers.

    It is quite common for people to assume things like "I can find my wanted large number in the decimals of PI and will refer to it by its position" which is true, but is not usable for compression since the pointer will be as long, or longer than the
    number when you take all possible combinations into account.

    If you are only envisioning "I will look for the number 141592653589" then yes, you can find a very short index for it. If you instead look for the far shorter number "93774" then it will not appear in the first 100000 digits.

    So what you need to do in order to create fame for yourself with a new novel way of compressing data is 1) understand the pigeon hole principle (everyone in compression research must understand this one) and 2) make some kind of implementation of your
    idea to know when it will *not* work.

    Good compressors work on something like 1% of the inputs, they just chose the 1% which are commonly found in files on computers. For all the 99% other kinds of data which are uncommon, they will expand, but if you don't hand such data to your
    compressor, noone notices and all is fine.

    So by making the implementation, you have a good chance of finding out what your "99%" are.
    I am not the one making a false claim here. Do not mislead people.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Matthias Waldhauer@21:1/5 to BGB on Wed Aug 26 00:50:59 2020
    BGB wrote on 18th June 2020 at 21:39:13 UTC+2:
    Compression for many files seems to be part-way between LZ4 and Deflate.
    Note that, like LZ4, it is byte oriented with no entropy coding.

    <--

    BtRP2 (Transposed, LE):
    * dddddddd-dlllrrr0 (l=3..10, d=0..511, r=0..7)
    * dddddddd-dddddlll-lllrrr01 (l=4..67, d=0..8191)
    * dddddddd-dddddddd-dlllllll-llrrr011 (l=4..515, d=0..131071)
    * rrrr0111 (Raw Bytes, r=(r+1)*8, 8..128)
    * * rrr01111 (Long Match)
    * rr011111 (r=1..3 bytes, 0=EOB)
    * rrrrrrrr-r0111111 (Long Raw, r=(r+1)*8, 8..4096)
    ** d: Distance
    ** l: Match Length
    ** r: Literal Length

    Values are encoded in little-endian order, with tag bits located in the
    LSB. Bits will be contiguous within the value, with shift-and-mask being
    used to extract individual elements.

    Interesting tokens! You're using unary coding for the different tokens. In the last one (Long Raw) you might switch the 0 for another r bit, as a limit on code length (6 bits, 111111) would be enough to differentiate the last 2 cases.

    I guess with some effort (PhD thesis), one could develop a method to formally describe all possible token variants via a grammar. And based on that, both the efficiency and the decoder complexity could be calculated and optimized.

    M.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janne Johansson@21:1/5 to All on Wed Aug 26 03:07:08 2020
    torsdag 20 augusti 2020 kl. 02:08:54 UTC+2 skrev Sundaresh Venugopal:
    On Wednesday, August 19, 2020 at 10:29:45 PM UTC+5:30, icep...@gmail.com wrote:
    Please post the code you have then, and the results.

    You are the one making a claim that a large number M can be written as a smaller number S by transforming it in one of several ways, which the pigeonhole principle will show anyone will not work for all numbers from 0 to M, since they can't all
    fit in S.

    So by making the implementation, you have a good chance of finding out what your "99%" are.
    I am not the one making a false claim here. Do not mislead people.

    False or not remains to be seen. You are making a claim though. We are waiting for some proof of it.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From BGB@21:1/5 to Matthias Waldhauer on Wed Aug 26 14:29:08 2020
    On 8/26/2020 2:50 AM, Matthias Waldhauer wrote:
    BGB wrote on 18th June 2020 at 21:39:13 UTC+2:
    Compression for many files seems to be part-way between LZ4 and Deflate.
    Note that, like LZ4, it is byte oriented with no entropy coding.

    <--

    BtRP2 (Transposed, LE):
    * dddddddd-dlllrrr0 (l=3..10, d=0..511, r=0..7)
    * dddddddd-dddddlll-lllrrr01 (l=4..67, d=0..8191)
    * dddddddd-dddddddd-dlllllll-llrrr011 (l=4..515, d=0..131071)
    * rrrr0111 (Raw Bytes, r=(r+1)*8, 8..128)
    * * rrr01111 (Long Match)
    * rr011111 (r=1..3 bytes, 0=EOB)
    * rrrrrrrr-r0111111 (Long Raw, r=(r+1)*8, 8..4096)
    ** d: Distance
    ** l: Match Length
    ** r: Literal Length

    Values are encoded in little-endian order, with tag bits located in the
    LSB. Bits will be contiguous within the value, with shift-and-mask being
    used to extract individual elements.

    Interesting tokens! You're using unary coding for the different tokens. In the last one (Long Raw) you might switch the 0 for another r bit, as a limit on code length (6 bits, 111111) would be enough to differentiate the last 2 cases.


    Leaving it as 0 leaves room for possible further expansion; also the
    "Long Raw" case is infrequent enough (and big enough) that 1 more bit
    wont make a big difference (worst case overhead is ~ 0.05%).

    I guess with some effort (PhD thesis), one could develop a method to formally describe all possible token variants via a grammar. And based on that, both the efficiency and the decoder complexity could be calculated and optimized.


    Dunno.

    As noted, compression ratios tend to generally be slightly better than
    LZ4 in this case (including if encoded using the same match algo and the
    same match length/distance settings).

    This seems to be more so for "general data compression", however for compressing PE/COFF binaries with my ISA, compression ratio seems to be
    roughly break-even with LZ4.


    An older "fast" LZ encoder, which did everything in terms of 32-bit
    DWORD values, also did reasonably well for binaries, but worse for most
    other data compression.

    A likely factor though is that the ISA is mostly a mix of 16 and 32-bit instructions, and sequences of 32-bit instructions also tend to be
    32-bit aligned (though, this is not enforced as doing so in-general has
    an adverse effect on code-density).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)