• A software for combining text files to obtain high quality pseudo-rando

    From Mok-Kong Shen@21:1/5 to All on Sun Jul 9 12:08:50 2017
    An estimate of entropy of English texts is 1.34 bits per letter [1]. This implies that, if the letters are coded into 5 bits, one needs to
    appropriately
    combine 4 text files in order to obtain bit sequences of full entropy, since 4*1.34 = 5.36 > 5. The method used in our software is to sum (mod 32)
    the coded
    values of a-z (mapped to 0-25) as 5 bits of the corresponding letters of
    the
    text files.

    There are plenty of other schemes for obtaining high quality pseudo-random sequences in practice, e.g. AES in counter mode. However our scheme seems to
    be much simpler both in the underlying logic (understandability) and in implementation and is thus a viable alternative that one could use/need
    under
    circumstances.

    The software, TEXTCOMBINE-SP, is available at mok-kong-shen.de

    [1] T. M. Cover, R. C. King, A Convergent Gambling Estimate of the
    Entropy of
    English, IEEE Trans. Inf. Theory, vol. 24, 1978, pp. 413-421.


    M. K. Shen

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From William Unruh@21:1/5 to Mok-Kong Shen on Sun Jul 9 16:56:36 2017
    On 2017-07-09, Mok-Kong Shen <mok-kong.shen@t-online.de> wrote:
    An estimate of entropy of English texts is 1.34 bits per letter [1]. This implies that, if the letters are coded into 5 bits, one needs to appropriately
    combine 4 text files in order to obtain bit sequences of full entropy, since 4*1.34 = 5.36 > 5. The method used in our software is to sum (mod 32)
    the coded
    values of a-z (mapped to 0-25) as 5 bits of the corresponding letters of
    the
    text files.

    That is a very bad estimate-- it is basically the estimate of the
    entropyif you pick one letter out at random from the text file. It does
    NOT take into account correlations between the letters, of which there
    are loads and loads. Ie, if you pick three letters in sequence, there is
    high probability that they are correlated, which would be disasterous
    for a pseudo random number generator. Also, text is an extremely biased
    source. Eg, in English the letter z occurs with a somewhat different
    frequency than e. Exactly why you woud want to do
    what you do is entirely unclear since there are lots of extremely good
    pseudo random number generators out there--ones not based on a half
    assed theory

    There are plenty of other schemes for obtaining high quality pseudo-random sequences in practice, e.g. AES in counter mode. However our scheme seems to be much simpler both in the underlying logic (understandability) and in implementation and is thus a viable alternative that one could use/need
    under
    circumstances.

    It is NOT viable, unless you want a complete cockup of a random number generator

    The software, TEXTCOMBINE-SP, is available at mok-kong-shen.de

    [1] T. M. Cover, R. C. King, A Convergent Gambling Estimate of the
    Entropy of
    English, IEEE Trans. Inf. Theory, vol. 24, 1978, pp. 413-421.


    M. K. Shen

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Mok-Kong Shen@21:1/5 to All on Mon Jul 10 06:56:27 2017
    Am 09.07.2017 um 18:56 schrieb William Unruh:
    On 2017-07-09, Mok-Kong Shen <mok-kong.shen@t-online.de> wrote:
    An estimate of entropy of English texts is 1.34 bits per letter [1]. This
    implies that, if the letters are coded into 5 bits, one needs to
    appropriately
    combine 4 text files in order to obtain bit sequences of full entropy, since >> 4*1.34 = 5.36 > 5. The method used in our software is to sum (mod 32)
    the coded
    values of a-z (mapped to 0-25) as 5 bits of the corresponding letters of
    the
    text files.

    That is a very bad estimate-- it is basically the estimate of the
    entropyif you pick one letter out at random from the text file. It does
    NOT take into account correlations between the letters, of which there
    are loads and loads. Ie, if you pick three letters in sequence, there is
    high probability that they are correlated, which would be disasterous
    for a pseudo random number generator. Also, text is an extremely biased source. Eg, in English the letter z occurs with a somewhat different frequency than e. Exactly why you woud want to do
    what you do is entirely unclear since there are lots of extremely good pseudo random number generators out there--ones not based on a half
    assed theory

    Note that Shannon, who introduced the concept entropy, did similar
    works. Cover and King did only a work following him. Cover wrote a
    book on information theory. I suppose he knew what he did. Note,
    further, my example contains a test of the resulting byte sequence with Maurer's test and that test is ok. The other points you raised are dealt
    with in my OP (and quoted by you here).

    M. K. Shen


    There are plenty of other schemes for obtaining high quality pseudo-random >> sequences in practice, e.g. AES in counter mode. However our scheme seems to >> be much simpler both in the underlying logic (understandability) and in
    implementation and is thus a viable alternative that one could use/need
    under
    circumstances.

    It is NOT viable, unless you want a complete cockup of a random number generator

    The software, TEXTCOMBINE-SP, is available at mok-kong-shen.de

    [1] T. M. Cover, R. C. King, A Convergent Gambling Estimate of the
    Entropy of
    English, IEEE Trans. Inf. Theory, vol. 24, 1978, pp. 413-421.


    M. K. Shen

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Mok-Kong Shen@21:1/5 to All on Fri Jul 14 17:40:12 2017
    I am extremely sorry to say that I was unfortunately misled by some
    erroneous
    computations in the design stage such that I like to retract this software (instead of attempting certain more complicated redesign) and sincerely ask
    for pardon from readers of this thread for having wasted their precious
    time.

    M. K. Shen

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From William Unruh@21:1/5 to Mok-Kong Shen on Fri Jul 14 15:55:57 2017
    On 2017-07-14, Mok-Kong Shen <mok-kong.shen@t-online.de> wrote:
    I am extremely sorry to say that I was unfortunately misled by some
    erroneous
    computations in the design stage such that I like to retract this software (instead of attempting certain more complicated redesign) and sincerely ask for pardon from readers of this thread for having wasted their precious
    time.

    Excellent. Thanks for admitting it.


    M. K. Shen

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)