• A Random Data Compressor to solve

    From Gerald Tamayo@21:1/5 to All on Sat May 18 02:30:30 2019
    Do you have a library to do arithmetic using Very Large Integers, BigNums or BigInts, or Infinite-Precision Integers? Then maybe you can use it for the compression algorithm i describe here. But really, we need to have small integers (frequencies) as
    much as possible to avoid very long computations.

    The random data compression algorithm is best understood if you "visualize" a bar chart or a histogram, where new symbol frequencies are always trying to become greater than the current highest frequency, which we increment by its delta with the new
    symbol's frequency. The new highest frequency becomes the new symbol's frequency; or put simply, the new symbol must have the highest frequency. So at most, the new highest frequency can only "double" or add by 1 bit in length. (In decoding, the symbol
    with the highest frequency is the symbol to decode; this means it is stack based. We add the delta to the highest frequency during encoding so we can preserve or get back to the previous frequency of the symbol when decoding.) Output is actually the
    frequency table, which is easy to compress or generate?

    Algorithm pseudo-code:

    /* initialize frequency table. */
    for (i=0; i < 256; i++) freq[i] = i + 1;
    max = freq[255];

    do {
    c = get_byte(infile);
    if (c == EOF) break;
    freq[c] = max + (max - freq[c]);
    max = freq[c];
    } while (1);


    No "runs" of a single character allowed in the input, no "run-lengths" as much as possible. "Random data" indeed.

    New or recalled observations:

    1. This algorithm ironically "expands" the frequencies at first. ? LOL

    We're back to the early days of information theory or data compression history!

    2. The bombshell: It takes more than 1 bit added to encode for very small frequencies which suddenly must be maximum. The solution might be to "swap" them but this requires new information or codes. This is back to delta coding. haist

    3. But a total cycling of the frequency table might work...

    4. Instead of 8-bit bytes, use 4-bit symbols;

    ***

    This is similar, i think, to WEB Technologies' algorithm as featured in BYTE magazine in 1992 and noted by comp.compression FAQ:
    "WEB, in fact, says that virtually any amount of data can be
    squeezed to under 1024 bytes by using DataFiles/16 to compress
    its own output multiple times."

    I think they were using or playing with a frequency table too, 256 32-bit frequencies = 1K.

    They might had to output the MSbit of the highest frequency, the result of which may equal another byte frequency/ies?

    That's why they had the problem that 4 numbers in a matrix are equal, a rare case in their algorithm.

    Just maybe.

    (Ideally, at most 1 bit increase in frequency of output or new symbol, but the bombshell precludes that. If they are of the same bitsize, then only 1 bit increase in the new max frequency.

    The current symbol has always the highest frequency.

    You decode backwards, from last symbol to first; the symbol with the highest frequency is the current symbol.

    One parameter in decoding is the famed file_size().
    )

    The problem with the algorithm is that the emitted frequency table could be very large due to very large frequencies if you implement it by really using BigNums or BigInts;
    You then have to compress the very large frequency table.

    Maybe to achieve compression, you can just consider the MSBit after the arithmetic (addition) operation.
    Or the solution is nearly just MTF (you have to output the character that *doubled* (MSBit activated)).

    WEB Technologies' Datafiles/16 algorithm is clearly designed for compression of *random* data, and recursive, which are futile indeed.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Gerald Tamayo@21:1/5 to All on Sat May 18 02:28:43 2019
    Do you have a library to do arithmetic using Very Large Integers, BigNums or BigInts, or Infinite-Precision Integers? Then maybe you can use it for the compression algorithm i describe here. But really, we need to have small integers (frequencies) as
    much as possible to avoid very long computations.

    The random data compression algorithm is best understood if you "visualize" a bar chart or a histogram, where new symbol frequencies are always trying to become greater than the current highest frequency, which we increment by its delta with the new
    symbol's frequency. The new highest frequency becomes the new symbol's frequency; or put simply, the new symbol must have the highest frequency. So at most, the new highest frequency can only "double" or add by 1 bit in length. (In decoding, the symbol
    with the highest frequency is the symbol to decode; this means it is stack based. We add the delta to the highest frequency during encoding so we can preserve or get back to the previous frequency of the symbol when decoding.) Output is actually the
    frequency table, which is easy to compress or generate?

    Algorithm pseudo-code:

    /* initialize frequency table. */
    for (i=0; i < 256; i++) freq[i] = i + 1;
    max = freq[255];

    do {
    c = get_byte(infile);
    if (c == EOF) break;
    freq[c] = max + (max - freq[c]);
    max = freq[c];
    } while (1);


    No "runs" of a single character allowed in the input, no "run-lengths" as much as possible. "Random data" indeed.

    New or recalled observations:

    1. This algorithm ironically "expands" the frequencies at first. ? LOL

    We're back to the early days of information theory or data compression history!

    2. The bombshell: It takes more than 1 bit added to encode for very small frequencies which suddenly must be maximum. The solution might be to "swap" them but this requires new information or codes. This is back to delta coding. haist

    3. But a total cycling of the frequency table might work...

    4. Instead of 8-bit bytes, use 4-bit symbols;

    ***

    This is similar, i think, to WEB Technologies' algorithm as featured in BYTE magazine in 1992 and noted by comp.compression FAQ:
    "WEB, in fact, says that virtually any amount of data can be
    squeezed to under 1024 bytes by using DataFiles/16 to compress
    its own output multiple times."

    I think they were using or playing with a frequency table too, 256 32-bit frequencies = 1K.

    They might had to output the MSbit of the highest frequency, the result of which may equal another byte frequency/ies?

    That's why they had the problem that 4 numbers in a matrix are equal, a rare case in their algorithm.

    Just maybe.

    (Ideally, at most 1 bit increase in frequency of output or new symbol, but the bombshell precludes that. If they are of the same bitsize, then only 1 bit increase in the new max frequency.

    The current symbol has always the highest frequency.

    You decode backwards, from last symbol to first; the symbol with the highest frequency is the current symbol.

    One parameter in decoding is the famed file_size().
    )

    The problem with the algorithm is that the emitted frequency table could be very large due to very large frequencies if you implement it by really using BigNums or BigInts;
    You then have to compress the very large frequency table.

    Maybe to achieve compression, you can just consider the MSBit after the arithmetic (addition) operation.
    Or the solution is nearly just MTF (you have to output the character that *doubled* (MSBit activated)).

    WEB Technologies' Datafiles/16 algorithm is clearly designed for compression of *random* data, and recursive, which are futile indeed.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Gerald Tamayo@21:1/5 to All on Sun May 19 01:15:18 2019
    4. Instead of 8-bit bytes, use 4-bit symbols;

    How about 2-bit (base-4) symbols? Or maybe even better, a data source of base-3 symbols ??

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Gerald Tamayo@21:1/5 to All on Mon May 20 23:47:38 2019
    You get it, compression of random data and recursive. One of my early "probably breakthrough" algorithms. Definitely 2006-2007 ideas.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Gerald Tamayo@21:1/5 to All on Mon May 20 23:44:18 2019
    You get it, compression of random data and recursive. One my early "probably breakthrough" algorithms. Definitely 2006-2007 ideas.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott@21:1/5 to compgt@gmail.com on Tue May 21 17:02:10 2019
    On Mon, 20 May 2019 23:47:38 -0700 (PDT), Gerald Tamayo
    <compgt@gmail.com> wrote:

    You get it, compression of random data and recursive. One of my early "probably breakthrough" algorithms. Definitely 2006-2007 ideas.

    Recursive compression is a well-researched subject and there exist
    many examples of working compressors. Research into recursive
    decompression has lagged though, and it's hard to find a working
    reference implementation.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From flemingsarah015@gmail.com@21:1/5 to Gerald Tamayo on Mon Jul 1 10:51:00 2019
    On Saturday, May 18, 2019 at 10:28:44 AM UTC+1, Gerald Tamayo wrote:
    Do you have a library to do arithmetic using Very Large Integers, BigNums or BigInts, or Infinite-Precision Integers? Then maybe you can use it for the compression algorithm i describe here. But really, we need to have small integers (frequencies) as
    much as possible to avoid very long computations.

    The random data compression algorithm is best understood if you "visualize" a bar chart or a histogram, where new symbol frequencies are always trying to become greater than the current highest frequency, which we increment by its delta with the new
    symbol's frequency. The new highest frequency becomes the new symbol's frequency; or put simply, the new symbol must have the highest frequency. So at most, the new highest frequency can only "double" or add by 1 bit in length. (In decoding, the symbol
    with the highest frequency is the symbol to decode; this means it is stack based. We add the delta to the highest frequency during encoding so we can preserve or get back to the previous frequency of the symbol when decoding.) Output is actually the
    frequency table, which is easy to compress or generate?

    Algorithm pseudo-code:

    /* initialize frequency table. */
    for (i=0; i < 256; i++) freq[i] = i + 1;
    max = freq[255];

    do {
    c = get_byte(infile);
    if (c == EOF) break;
    freq[c] = max + (max - freq[c]);
    max = freq[c];
    } while (1);


    No "runs" of a single character allowed in the input, no "run-lengths" as much as possible. "Random data" indeed.

    New or recalled observations:

    1. This algorithm ironically "expands" the frequencies at first. ? LOL

    We're back to the early days of information theory or data compression history!

    2. The bombshell: It takes more than 1 bit added to encode for very small frequencies which suddenly must be maximum. The solution might be to "swap" them but this requires new information or codes. This is back to delta coding. haist

    3. But a total cycling of the frequency table might work...

    4. Instead of 8-bit bytes, use 4-bit symbols;

    ***

    This is similar, i think, to WEB Technologies' algorithm as featured in BYTE magazine in 1992 and noted by comp.compression FAQ:
    "WEB, in fact, says that virtually any amount of data can be
    squeezed to under 1024 bytes by using DataFiles/16 to compress
    its own output multiple times."

    I think they were using or playing with a frequency table too, 256 32-bit frequencies = 1K.

    They might had to output the MSbit of the highest frequency, the result of which may equal another byte frequency/ies?

    That's why they had the problem that 4 numbers in a matrix are equal, a rare case in their algorithm.

    Just maybe.

    (Ideally, at most 1 bit increase in frequency of output or new symbol, but the bombshell precludes that. If they are of the same bitsize, then only 1 bit increase in the new max frequency.

    The current symbol has always the highest frequency.

    You decode backwards, from last symbol to first; the symbol with the highest frequency is the current symbol.

    One parameter in decoding is the famed file_size().
    )

    The problem with the algorithm is that the emitted frequency table could be very large due to very large frequencies if you implement it by really using BigNums or BigInts;
    You then have to compress the very large frequency table.

    Maybe to achieve compression, you can just consider the MSBit after the arithmetic (addition) operation.
    Or the solution is nearly just MTF (you have to output the character that *doubled* (MSBit activated)).

    WEB Technologies' Datafiles/16 algorithm is clearly designed for compression of *random* data, and recursive, which are futile indeed.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Gerald Tamayo@21:1/5 to All on Wed Jul 24 00:02:32 2019
    If compression is recursive, then corresponding decompression procedure is fundamentally recursive too.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Gerald Tamayo@21:1/5 to All on Wed Jul 24 00:09:07 2019
    Problem with inventing a random data compressor is that it will stifle storage devices innovations and will arouse disgust in storage manufacturers. So, will one be safe if such breakthrough algorithms exist? Even if you post your algorithm online, will
    it remain open and free? Will the programmer not be harassed and his invention stolen?

    The story of Jan Sloot and his purported data compression breakthrough is a classic example of how serious the Feds or the storage industry about data compression. His immediate death was mysterious. To me, this news spread is a mechanism to dissuade
    researchers from inventing new super compression algorithms.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Gerald Tamayo@21:1/5 to All on Sun Sep 29 00:14:49 2019
    On Random Data, the above was one here:

    https://grtamayoblog.blogspot.com/2018/10/random-data.html?m=1

    Cold War Zip Files:

    https://grtamayoblog.blogspot.com/2018/10/zip-file-compression.html?m=1

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Elhana@21:1/5 to All on Fri Oct 11 01:29:59 2019
    Gerald Tamayo:

    Cold War Zip Files:

    Holy f**k.

    https://www.youtube.com/watch?v=1aW3MXrrUfU

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)