-
One Way To Output LZ77 codes
From
Gerald Tamayo@21:1/5 to
All on Wed Feb 12 08:35:33 2020
Reduced Length LZ (RLLZ)
Let me share here my one way to output LZ77 codes, which may improve LZ77/LZSS or LZW.
LZ77:
LZ77 coding transmits <offset, length> codes. Many times in the compression process, the same matching string is encoded with an <offset> and the same <length> code. We can avoid transmitting the <length> code for many strings as well as not outputting a
bit to identify a literal (as in LZSS) by the following:
Instead, read the whole input or block and gather duplicated or similar strings and encode <the whole string> (e.g., <length of the string (n>=2)> plus the actual string of characters) only this one time, <the number of the same matching strings (i.e.,
number of following offset codes)>, and its succeeding occurrences in the input block by transmitting only the said <offset> codes. (Or you can use an escape code to end the offset codes, perhaps BLOCK_SIZE.) Do this for all duplicated strings. This is
actually a "Reduced Length LZ (RLLZ)".
The literals are outputted last *without bit flags* since they fall into the block buffers not covered or "not activated" by encoded strings. So just one array of literals can be outputted maybe at the end of the file, or, since block-based coding is
more practical for shorter offset codes, at the end of the appropriately-sized block of <offsets>.
During decoding, the strings are written first in the output or write buffer using the offsets, and the literals "fill in" the unwritten positions or "holes" in the write buffer. This makes very compact LZ.
***
No need to output bit flags for literals or the number of following consecutive literals in some algorithms.
Then the completely filled write buffer is written to file.
That is, e.g. after gathering all distinct repeated strings on a block,
1) no transmission of <length> code for the next occurrences of the same string (but, in the simplest way, you have to output the number of succeeding strings);
2) transmitting the literals last (in the output buffer) means no need to transmit *bit flags* for literals (and matches). This is the novel or surprising idea here: deferred literals output;
3) if you know (LZT) LZ-Tamayo (2008) algorithm where it was demonstrated complete exclusion of the <length> code, this might as well be "LZ-Tamayo2". Should improve LZ77/LZSS/LZW based compressors. Decoding is also straight-forward.
Sorry that only now after a decade of LZT (2008 ) i am releasing this. I stopped coding compression in 2010 or 2011. (Note: it seems there is already "LZT" before i named my algorithm. LZ-Tischer.)
***
I have similar ideas as "rep-codes" in 2000s. My compression ideas are from the late 90s when i became interested again in data compression.
The ones i call here "holes" they call "gaps", even in LZW. But the idea of deferring transmission of literals for an output buffer avoids bit flags for both literals and matches which is still used in some explanations of ROLZ. Other algorithms still
need to output the number of following literals, or literal_length. Deferred, meaning this is not an "online" algorithm.
-- Gerald R. Tamayo
(reposted, edited from Sept. 2018)
--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
-
From
Gerald Tamayo@21:1/5 to
All on Wed Feb 12 08:42:21 2020
There are no "size" of holes to transmit, there are just characters written to holes, whatever the sizes of those holes, 1 char, 2 chars, or n chars, the literals are just written one by one into the holes.
Decode Output buffer (one appropriately-sized block):
[STRING..STRING....STRING.STRING.....STRING.STRING]
(2 holes, 4 holes, 1 hole, 5 holes, 1 hole)
--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
-
From
Gerald Tamayo@21:1/5 to
All on Wed Feb 12 08:51:57 2020
The literals "nicely fit" into the holes in the write buffer, and they are in their *correct positions* in the file!, like the strings. Simply put, we still preserve the order or sequence of literals and strings.
--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
-
From
Gerald Tamayo@21:1/5 to
All on Wed Feb 12 21:29:55 2020
Edit: <BLOCK_SIZE-1>
<the number of the same matching strings (i.e., number of following offset codes)>, and its succeeding occurrences in the input block by transmitting only the said <offset> codes. (Or you can use an escape code to end the offset codes, perhaps <BLOCK_
SIZE-1>.)
-- Gerald
--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
-
From
Harry Potter@21:1/5 to
All on Thu Mar 26 14:35:47 2020
I like your idea but admit that I don't fully understand it. :( Can you post some P-code to describe the technique? :)
--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
-
From
Harry Potter@21:1/5 to
Harry Potter on Thu Mar 26 17:04:30 2020
On Thursday, March 26, 2020 at 5:35:49 PM UTC-4, Harry Potter wrote:
I like your idea but admit that I don't fully understand it. :( Can you post some P-code to describe the technique? :)
Never mind: I get it. :) I'm applying your ideas now, but I'm running into some problems with your technique. :(
--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
-
From
Gerald Tamayo@21:1/5 to
All on Mon Jun 29 04:48:35 2020
Do not immediately write the codes to output file. (That's what most LZ compressors do.) Buffer it. So you can output strings to the output buffer first, then the literals. Then write the whole buffer to output file. The original sequence or order of the
literals and strings is still preserved.
--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
-
From
Gerald Tamayo@21:1/5 to
All on Mon Jun 29 04:45:06 2020
Do not immediately write the codes to output file. (That's what most LZ compressors do.) Buffer it. So you can output strings to the output buffer first, then the literals. Then write the whole buffer to output file. The original sequence or order of the
literals is still preserved.
--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
-
From
Gerald Tamayo@21:1/5 to
All on Mon Jun 29 04:50:49 2020
Do not immediately write the codes to output file. (That's what most LZ compressors do.) Buffer it. So you can output strings to the output buffer first, then the literals. Then write the whole buffer to output file. The original sequence or order of the
literals and strings is still preserved.
--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
-
From
Matthias Waldhauer@21:1/5 to
Harry Potter on Wed Aug 26 00:54:33 2020
Harry Potter schrieb am Freitag, 27. März 2020 um 01:04:32 UTC+1:
On Thursday, March 26, 2020 at 5:35:49 PM UTC-4, Harry Potter wrote:
I like your idea but admit that I don't fully understand it. :( Can you post some P-code to describe the technique? :)
Never mind: I get it. :) I'm applying your ideas now, but I'm running into some problems with your technique. :(
What's your progress so far?
On encode.su I wrote some thoughts about it and where it might loose efficiency. Some improvement ideas are also there. The main benefits in my humble view might come with text compression, as there are a lot of reused strings, where removing the length
encoding for all of them could offset the need to have pointers into the decompressed block including for the first occurence of the string.
M.
--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
-
From
Harry Potter@21:1/5 to
Gerald R. Tamayo on Fri Sep 11 13:28:05 2020
I like the sound of your technique, but I don't fully understand how it works. How do I specify where to insert an offset block?
On Wednesday, February 12, 2020 at 11:35:36 AM UTC-5, Gerald R. Tamayo wrote:
Reduced Length LZ (RLLZ)
Let me share here my one way to output LZ77 codes, which may improve LZ77/LZSS or LZW.
LZ77:
LZ77 coding transmits <offset, length> codes. Many times in the compression process, the same matching string is encoded with an <offset> and the same <length> code. We can avoid transmitting the <length> code for many strings as well as not outputting
a bit to identify a literal (as in LZSS) by the following:
Instead, read the whole input or block and gather duplicated or similar strings and encode <the whole string> (e.g., <length of the string (n>=2)> plus the actual string of characters) only this one time, <the number of the same matching strings (i.e.,
number of following offset codes)>, and its succeeding occurrences in the input block by transmitting only the said <offset> codes. (Or you can use an escape code to end the offset codes, perhaps BLOCK_SIZE.) Do this for all duplicated strings. This is
actually a "Reduced Length LZ (RLLZ)".
The literals are outputted last *without bit flags* since they fall into the block buffers not covered or "not activated" by encoded strings. So just one array of literals can be outputted maybe at the end of the file, or, since block-based coding is
more practical for shorter offset codes, at the end of the appropriately-sized block of <offsets>.
During decoding, the strings are written first in the output or write buffer using the offsets, and the literals "fill in" the unwritten positions or "holes" in the write buffer. This makes very compact LZ.
***
No need to output bit flags for literals or the number of following consecutive literals in some algorithms.
Then the completely filled write buffer is written to file.
That is, e.g. after gathering all distinct repeated strings on a block,
1) no transmission of <length> code for the next occurrences of the same string (but, in the simplest way, you have to output the number of succeeding strings);
2) transmitting the literals last (in the output buffer) means no need to transmit *bit flags* for literals (and matches). This is the novel or surprising idea here: deferred literals output;
3) if you know (LZT) LZ-Tamayo (2008) algorithm where it was demonstrated complete exclusion of the <length> code, this might as well be "LZ-Tamayo2". Should improve LZ77/LZSS/LZW based compressors. Decoding is also straight-forward.
Sorry that only now after a decade of LZT (2008 ) i am releasing this. I stopped coding compression in 2010 or 2011. (Note: it seems there is already "LZT" before i named my algorithm. LZ-Tischer.)
***
I have similar ideas as "rep-codes" in 2000s. My compression ideas are from the late 90s when i became interested again in data compression.
The ones i call here "holes" they call "gaps", even in LZW. But the idea of deferring transmission of literals for an output buffer avoids bit flags for both literals and matches which is still used in some explanations of ROLZ. Other algorithms still
need to output the number of following literals, or literal_length. Deferred, meaning this is not an "online" algorithm.
-- Gerald R. Tamayo
(reposted, edited from Sept. 2018)
--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
-
From
Harry Potter@21:1/5 to
All on Fri Jan 15 11:26:43 2021
Hi, again! I just figured out how to use your technique: I missed the part where you stated how to write the compressed block pointers. I get it now, and, well, your technique worked! :) I had to kill a few of my techniques, but it was well worth it:
I'm doing much better than before. Thank you!
--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
-
From
Harry Potter@21:1/5 to
All on Wed Feb 17 10:16:54 2021
I was wrong. I found a bug in my software,and now, I'm doing worse than without your technique, and I couldn't get back the numbers. :(
--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
-
From
Gerald R. Tamayo@21:1/5 to
All on Wed Mar 17 02:45:27 2021
Perhaps this simple illustration will help:
Reduced Length LZ (RLLZ)
(A very compact LZ. Made possible by deferred literals output.)
aacbbdeaabb <- input source
012345678910 <- index
encode all strings first:
(size of string, string, number of occurrences of strings), [positions in file or block]
aa: (2, aa, 2), [0, 7]
bb: (2, bb, 2), [3, 9]
encode literals last: [cde]
Decompress:
Decode all strings first;
Decode literals.
(The strings and literals are nicely in their correct order or sequences.)
--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
-
From
Gerald R. Tamayo@21:1/5 to
All on Wed Mar 17 02:17:44 2021
Perhaps this simple illustration will help:
Reduced Length LZ (RLLZ)
(A very compact LZ. Made possible by deferred literals output.)
aacbbdeaab <- input source
0123456789 <- index
Compress:
Encode all strings first:
(size of string, string, number of occurrences of strings), [positions in file or block]
aa: (2, aa, 2), [0, 7]
bb: (2, bb, 1), [3]
Encode literals last: [cdeb]
Decompress:
Decode all strings first;
Decode literals.
(The strings and literals are nicely in their correct order or sequences.)
--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
-
From
Harry Potter@21:1/5 to
Gerald R. Tamayo on Thu Sep 2 11:17:49 2021
On Wednesday, March 17, 2021 at 5:45:28 AM UTC-4, Gerald R. Tamayo wrote:
Perhaps this simple illustration will help:
Reduced Length LZ (RLLZ)
(A very compact LZ. Made possible by deferred literals output.)
aacbbdeaabb <- input source
012345678910 <- index
encode all strings first:
(size of string, string, number of occurrences of strings), [positions in file or block]
aa: (2, aa, 2), [0, 7]
bb: (2, bb, 2), [3, 9]
encode literals last: [cde]
Decompress:
Decode all strings first;
Decode literals.
(The strings and literals are nicely in their correct order or sequences.)
I did that then some performance optimizations and am doing *worse* than without your technique. Maybe I'm doing something wrong. I need to try it again. :)
--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
-
From
Gerald R. Tamayo@21:1/5 to
All on Sat Oct 2 06:19:17 2021
I did that then some performance optimizations and am doing *worse* than without your technique. Maybe I'm doing something wrong. I need to try it again. :)
Maybe the following improved example will help.
Here is simple illustration for RLLZ, to be complete:
Reduced Length LZ (RLLZ)
(A very compact LZ. Made possible by deferred literals output.)
aacbcdeaabc <- input source
012345678910 <- index
Compress:
Encode all duplicated strings first:
Very simple:
(<size of string>, <string>, <number of "next" occurrences of string>), [positions in file or block (no match lengths needed)]
aa: (2, aa, 1), [0, 7]
bc: (2, bc, 1), [3, 9]
Better:
aa: (2, aa, 1-1=0), [0, 7]
bc: (2, bc, 1-1=0), [3, 9]
Encode literals last: [cde]
Decompress:
Decode all strings first;
Read/Get literals. (The literals are then inserted into the holes. Whatever the size of those holes randomly positioned in the write buffer, they are filled by the literals.)
The strings and literals are nicely in their correct order or sequences. Write to actual file.
--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
-
From
Harry Potter@21:1/5 to
All on Sat Oct 2 07:10:13 2021
Maybe there's a problem with my code. (?) I pasted the relevant code here. It uses Assembler, but I believe the original C code is still in the file.
--------------------------------
void CompBin01 (void)
{
//signed char c2; //Holds value for len. current
// int c2;
// unsigned /*j1,*/ j3, j4, j5;
// unsigned i, j, j2, k, m, n, o=0;
//vz.InPos=0;
k=0; vz.InPos=0;
for (; vz.InPos<vz.InEnd; )
{
//printc ('.');
printu (vz.InPos); printcr ();
/* if (Buffer2[vz.InPos>>3]&(1<<(vz.InPos&7))) {
//getkey ();
for (i=j=m=0; i<NumRLLZEnt; ++i) {
if (RLLZBuf[i].len>j && !memcmp(&InBuffer[vz.InPos], &InBuffer[RLLZBuf[i].loc], RLLZBuf[i].len)) {
j=RLLZBuf[i].len; m=i;
}
}
//if (j==0) {CompBin01U (); ++vz.InPos; continue;}
if (j==0) {WriteB (InBuffer[vz.InPos]); ++vz.InPos; continue;}
//CompBin01Ue ();
++backcol;
vz.InPos+=j;
} else {
//CompBin01U (); ++vz.InPos;
WriteB (InBuffer[vz.InPos]); ++vz.InPos;
}*/
//if (!(Buffer2[vz.InPos>>3]&(1<<(vz.InPos&7)))) WriteB (InBuffer[vz.InPos]);
//else ++backcol;
// if (!(Buffer2[vz.InPos>>3]&(1<<(vz.InPos&7)))) {
// WriteB (InBuffer[vz.InPos]);
// //++backcol;
// }
// ++vz.InPos;
if ((Buffer2[vz.InPos>>3]&(1<<(vz.InPos&7)))) {
//CompBin01Ue ();
writebuffer ();
for (; Buffer2[vz.InPos>>3]&(1<<(vz.InPos&7)); ++vz.InPos);
//++backcol;
} else {
CompBin01U ();
++vz.InPos;
}
}
writeoutlast ();
}
static unsigned p;
void GetRLLZBlocks (void)
{
unsigned i, j, k, l, m, n, o;
unsigned q;
struct RLLZBuf* r;
for (i=NumRLLZEnt=0; i<vz.InEnd; ) {
printu (i); printcr ();
if ((lenc=GetLZW (i))>=4) {
o=techLZW.pos-(unsigned)&InBuffer;
//if (GetLZW(i+1)>=lenc+1) {++i; continue;}
for (j=0; j<NumRLLZEnt; ++j) {
if (RLLZBuf[j].len==lenc &&
!memcmp (&InBuffer[o], &InBuffer[RLLZBuf[j].loc], RLLZBuf[j].len))
break;
}
if (j==NumRLLZEnt) {
if (NumRLLZEnt>=1000) {
//++backcol;
++i; continue;
}
RLLZBuf[j].loc=o;
RLLZBuf[j].len=lenc;
for (k=0; k<lenc; k++) {
//Buffer3[(o+k)>>3]|=1<<(o+k&7);
}
RLLZBuf[j].numoccur=1;
++NumRLLZEnt;
}
++RLLZBuf[j].numoccur;
for (k=0; k<lenc; k++) {
//++backcol;
//Buffer3[(i+k)>>3]|=1<<(i+k&7);
}
i+=techLZW.size;
} else {
a: ++i;
}
}
for (i=0; i<NumRLLZEnt; ++i) {
//++backcol;
if (RLLZBuf[i].numoccur<5-(RLLZBuf[i].len>=5?2:0)) {
for (j=i; j<NumRLLZEnt; ++j) {
memcpy (&RLLZBuf[j], &RLLZBuf[j+1], sizeof (RLLZBuf[0]));
} --NumRLLZEnt; --i;
}
}
prints ("rllz="); printu (NumRLLZEnt); printcr (); getkey ();
WriteDist (NumRLLZEnt, 1024);
for (o=0; o<NumRLLZEnt; ++o) {
i=2;
lenc=RLLZBuf[o].len;
if (lenc<6) {
WriteDist (lenc-2+1-i, 6-i);
} else if (lenc< 11) {
WriteDist (0, 6-i);
//writeoutf_10 ();
WriteDist (lenc-6, 11-6);
} else if (lenc<18) {
WriteDist (5-i, 6-i);
//writeoutf_10 ();
WriteDist ((lenc- 11), (22- 11));
} else {
WriteDist (5-i, 6-i);
WriteDist ((18- 11)+(lenc-18)%4, (22- 11));
//writeoutf_01 ();
//WriteDist (11-7+((lenc-11)&5), 17-7);
WriteDist ((lenc-18)/4, (38-18+3)/4);
//WriteDist ((lenc-18), (57-18));
}
for (j=0; j<lenc; ++j) {
WriteB (InBuffer[RLLZBuf[o].loc+j]);
} k=0; m=-1;
for (l=n=0; l<vz.InEnd; )
{
//++backcol;
lenc=0; p=-1;
//if (Buffer2[vz.InPos>>3]&(1<<(vz.InPos&7))) {
//if (GetLZW(l)>=4) {
//if (!(Buffer3[l]>>(l>>3)&1<<(i&7))) {++l; continue;}
//if (InBuffer[l]!=InBuffer)
//r=&RLLZBuf[0];
cin=&InBuffer[l];
/*for (j=0; j<NumRLLZEnt; ++j) {
if ((q=r->len)>lenc &&
*cin==InBuffer[i=r->loc] &&
//(Buffer3[l>>3]&1<<(l&7)) &&
memcmp2(&InBuffer[i], q))
{lenc=q; p=j;}
++r;
}*/
__asm__ (
"\tlda\t#0\n"
"\tsta\t_i\n"
//"\tlda\tNumRLLZEnt+1\n"
"\tsta\t_i+1\n"
"\tlda\t#<_RLLZBuf\n"
"\tsta\tptr2\n"
"\tlda\t#>_RLLZBuf\n"
"\tsta\tptr2+1\n"
// "\tlda\t_cin\n"
// "\tsta\tptr4\n"
// "\tlda\t_cin+1\n"
// "\tsta\tptr4+1\n"
"@lp1:\n"
"\tldy\t#2\n"
"\tlda\t_lenc\n"
"\tcmp\t(ptr2),y\n"
"\tbeq\t@skip\n"
"\tbcs\t@skip\n"
"\tlda\t#<_InBuffer\n"
"\tldy\t#0\n"
"\tclc\n"
"\tadc\t(ptr2),y\n"
"\tsta\tptr3\n"
"\tiny\n"
"\tlda\t#>_InBuffer\n"
"\tadc\t(ptr2),y\n"
"\tsta\tptr3+1\n"
"\tldy\t#0\n"
"\tlda\t(_cin),y\n"
"\tcmp\t(ptr3),y\n"
"\tbne\t@skip\n"
"\tldy\t#2\n"
"\tlda\t(ptr2),y\n"
"\ttax\n"
"\ttay\n"
"\tdey\n"
"@lp2:\n"
"\tlda\t(_cin),y\n"
"\tcmp\t(ptr3),y\n"
"\tbne\t@skip\n"
"\tdey\n"
"\tbne\t@lp2\n"
"\tlda\t_i\n"
"\tsta\t_p\n"
"\tlda\t_i+1\n"
"\tsta\t_p+1\n"
"\tstx\t_lenc\n"
"@skip:\n"
"\tlda\tptr2\n"
"\tclc\n"
"\tadc\t#8\n"
"\tsta\tptr2\n"
"\tbcc\t@skip2\n"
"\tinc\tptr2+1\n"
"@skip2:\n"
"\tinc\t_i\n"
"\tbne\t@skip3\n"
"\tinc\t_i+1\n"
"@skip3:\n"
"\tlda\t_i+1\n"
"\tcmp\t_NumRLLZEnt+1\n"
"\tbcc\t@lp1\n"
"\tlda\t_i\n"
"\tcmp\t_NumRLLZEnt\n"
"\tbcc\t@lp1\n"
);
if (lenc && p==o) {
//++backcol;
for (i=0; i<lenc; i++) {
Buffer2[(l+i)>>3]|=1<<(l+i&7); //++backcol;
}
WriteDist (l-k, vz.InEnd-k+1);
//k=l; l+=lenc;
k=l+lenc; l+=lenc;
} else {
++l;
}
} WriteDist (vz.InEnd-k, vz.InEnd+1-k);
printu (o); printcr ();
}
//printf (">>> %X\n", &Buffer2); getkey ();
}
--------------------------------
WriteDist() writes the information using as few bits as possible.
--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
-
From
Gerald R. Tamayo@21:1/5 to
All on Sat Oct 2 08:54:17 2021
Maybe the following improved example will help.
Here is simple illustration for RLLZ, to be complete:
Reduced Length LZ (RLLZ)
(A very compact LZ. Made possible by deferred literals output.)
aacbcdeaabc <- input source
012345678910 <- index
Compress:
Encode all duplicated strings first:
Very simple:
(<size of string>, <string>, <number of "next" occurrences of string>), [positions in file or block (no match lengths needed)],
:
aa: (2, aa, 1), [0, 7]
bc: (2, bc, 1), [3, 9]
Better:
aa: (2, aa, 1-1=0), [0, 7]
bc: (2, bc, 1-1=0), [3, 9]
Encode end of string code stream: (0, , );
Encode literals last: [cde].
Decompress:
Decode all strings first;
Read/Get literals. (The literals are then inserted into the holes. Whatever the size of those holes randomly positioned in the write buffer, they are filled by the literals.)
The strings and literals are nicely in their correct order or sequences. Write to actual file.
--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
-
From
Gerald R. Tamayo@21:1/5 to
Gerald R. Tamayo on Tue Mar 7 11:28:22 2023
On Saturday, October 2, 2021 at 11:54:18 PM UTC+8, Gerald R. Tamayo wrote:
Maybe the following improved example will help.
Here is simple illustration for RLLZ, to be complete:
Reduced Length LZ (RLLZ)
(A very compact LZ. Made possible by deferred literals output.)
aacbcdeaabc <- input source
012345678910 <- index
Compress:
Encode all duplicated strings first:
Very simple:
(<size of string>, <string>, <number of "next" occurrences of string>), [positions in file or block (no match lengths needed)],
:
aa: (2, aa, 1), [0, 7]
bc: (2, bc, 1), [3, 9]
Better:
aa: (2, aa, 1-1=0), [0, 7]
bc: (2, bc, 1-1=0), [3, 9]
Encode end of string code stream: (0, , );
Encode literals last: [cde].
Decompress:
Decode all strings first;
Read/Get literals. (The literals are then inserted into the holes. Whatever the size of those holes randomly positioned in the write buffer, they are filled by the literals.)
The strings and literals are nicely in their correct order or sequences. Write to actual file.
"Deferred literals transmission" in Reduced Length LZ (#RLLZ) #algorithm actually has many practical uses in #data #compression! Not just in the way I described RLLZ.
--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
-
From
Gerald R. Tamayo@21:1/5 to
All on Tue Mar 7 13:10:01 2023
"Deferred literals transmission" in Reduced Length LZ (#RLLZ) #algorithm actually has many practical uses in #data #compression! Not just in the way I described RLLZ.
You can use "deferred literals transmission" if you want. Permission granted, no strings attached. ;)
--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
-
From
Harry Potter@21:1/5 to
All on Tue Mar 7 13:23:04 2023
I want to implement this on small 8-bit and 16-bit platforms where I might not have enough memory to store a whole file. Is there
--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)
-
From
Harry Potter@21:1/5 to
All on Tue Mar 7 13:25:01 2023
I want to implement this on small 8-bit and 16-bit platforms where I might not have enough memory to store a whole file. Is there a way to get this to work only on a piece of a file? Or maight I have to divide the file into smaller pieces?
BTW, sorry about the last, incomplete post. I pressed the wrong key. :(
--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)