In fact it is sufficient to change the seeding function of bcd32 like
...
a = seed[0] ^ 0xffffffff;
b = seed[1];
c = seed[2];
d = seed[3];
t = a + b + c + d;
On Tuesday, October 13, 2015 at 1:24:40 PM UTC-4, Karl-Uwe Frank wrote:
In fact it is sufficient to change the seeding function of bcd32 like
...
a = seed[0] ^ 0xffffffff;
b = seed[1];
c = seed[2];
d = seed[3];
t = a + b + c + d;
What happens if the user, in an effort to force in a bunch of 1s, tries the seed array {0xFFFFFFFF,0,0,0} or the equivalent {-1,0,0,0} ?
As a general rule, you can't prevent a bad starting state simply by doing state = g(state), where g is an invertible function in the state space. This only changes which state is bad (from the caller's view).
Bob H
On Tuesday, October 13, 2015 at 1:24:40 PM UTC-4, Karl-Uwe Frank wrote:
In fact it is sufficient to change the seeding function of bcd32 like
...
a = seed[0] ^ 0xffffffff;
b = seed[1];
c = seed[2];
d = seed[3];
t = a + b + c + d;
What happens if the user, in an effort to force in a bunch of 1s,
tries the seed array {0xFFFFFFFF,0,0,0} or the equivalent {-1,0,0,0}
?
As a general rule, you can't prevent a bad starting state simply by
doing state = g(state), where g is an invertible function in the
state space. This only changes which state is bad (from the caller's
view).
Bob H
On Tuesday, October 13, 2015 at 1:24:40 PM UTC-4, Karl-Uwe Frank wrote:
In fact it is sufficient to change the seeding function of bcd32 like
...
a = seed[0] ^ 0xffffffff;
b = seed[1];
c = seed[2];
d = seed[3];
t = a + b + c + d;
What happens if the user, in an effort to force in a bunch of 1s,
tries the seed array {0xFFFFFFFF,0,0,0} or the equivalent {-1,0,0,0}
?
As a general rule, you can't prevent a bad starting state simply by
doing state = g(state), where g is an invertible function in the
state space. This only changes which state is bad (from the caller's
view).
Bob H
On 2015-10-14,
rsharris.bx.psu.edu@gmail.com<rsharris.bx.psu.edu@gmail.com> wrote:
On Tuesday, October 13, 2015 at 1:24:40 PM UTC-4, Karl-Uwe Frank
wrote:
In fact it is sufficient to change the seeding function of bcd32
like ... a = seed[0] ^ 0xffffffff; b = seed[1]; c = seed[2]; d =
seed[3]; t = a + b + c + d;
What happens if the user, in an effort to force in a bunch of 1s,
tries the seed array {0xFFFFFFFF,0,0,0} or the equivalent
{-1,0,0,0} ?
As a general rule, you can't prevent a bad starting state simply by
doing state = g(state), where g is an invertible function in the
state space. This only changes which state is bad (from the
caller's view).
But if g is sufficiently weird (Ie it is sufficiently hard to invert
it) and the bad states are sufficiently rare, then the caller may
have no way of finding our using that state. And if the state space
is sufficiently large, the probablity of hitting it by accident can
be extremely small ( eg such that the computer being hit by an
asteroid while using it for that problem is higher).
On 2015-10-14, rsharris.bx.psu.edu@gmail.com<rsharris.bx.psu.edu@gmail.com> wrote:tries the seed array {0xFFFFFFFF,0,0,0} or the equivalent {-1,0,0,0} ?
On Tuesday, October 13, 2015 at 1:24:40 PM UTC-4, Karl-Uwe Frank wrote:
In fact it is sufficient to change the seeding function of bcd32 like
...
a = seed[0] ^ 0xffffffff;
b = seed[1];
c = seed[2];
d = seed[3];
t = a + b + c + d;
What happens if the user, in an effort to force in a bunch of 1s,
doing state = g(state), where g is an invertible function in the state
As a general rule, you can't prevent a bad starting state simply
by
But if g is sufficiently weird (Ie it is sufficiently hard to invert it)
and the bad states are sufficiently rare, then the caller may have no
way of finding our using that state. And if the state space is
sufficiently large, the probablity of hitting it by accident can be
extremely small ( eg such that the computer being hit by an asteroid
while using it for that problem is higher).
Bob H
Somehow meanwhile I think that I'm a laughing stock on that issue.
On Wednesday, October 14, 2015 at 5:45:18 PM UTC-4, Karl-Uwe Frank
wrote:
Somehow meanwhile I think that I'm a laughing stock on that issue.
I don't believe anyone thinks of you as a laughing stock.
The reason I've been posting the things I have is to try to give you
a gentle nudge toward considering analysis techniques that might be beneficial in the design of a PRNG.
When I did the loop analysis of your ctr variable, I actually thought
the variable performed pretty well. The shorter loops I observed
occurred with low probability, the loops that it usually ended up in
were on the order of what would be expected from a truly random
mapping. My analysis was brief, though, only the tip of the
iceberg.
The point I've been trying to make about seeding is not that there is
this one initial state (all zeros) that's a problem. A more thorough analysis would try to characterize the degenerate states, estimate
what fraction of the state space they occupy, etc, and how the output
is affected in those states.
Bob H
When I did the loop analysis of your ctr variable, I actually thought
the variable performed pretty well. The shorter loops I observed
occurred with low probability, the loops that it usually ended up in
were on the order of what would be expected from a truly random
mapping. My analysis was brief, though, only the tip of the
iceberg.
The point I've been trying to make about seeding is not that there is
this one initial state (all zeros) that's a problem. A more thorough analysis would try to characterize the degenerate states, estimate
what fraction of the state space they occupy, etc, and how the output
is affected in those states.
Bob H
But in heavy contrast, the test you made by just "tipping the iceberg" revealed a grave design error, because the counter running very soon
into a loop.
This is in fact really fascinating. Perhaps you are using some new
analysis techniques which can reveal design mistakes that TestU01 and
other test batteries don't recognise as such.
So my question now is, how did you manage to find the looping counter in bcd32_ctr?
So my question now is, how did you manage to find the looping
counter in bcd32_ctr?
Floyd's cycle-finding algorithm. https://en.wikipedia.org/wiki/Cycle_detection.
So one thing you might try is to do something like that to identify
loops in your ctr. Then for each different loop, do some number of
tests that start with ctr in that loop and run the output of your
generator through the test battery to see if any ctr loop induces
bias in your output.
Bob H
Thanks for sharing. I think I will not use a counter currently but the function might also help finding potential problem with a,b,c or d.
On 20.10.15 20:41, I wrote:
If it were me, I'd try it on your a,d,t update function.
One additional question: did you initialise a,d,t through the seeding function or did you set them to arbitrary selected values directly?
On Monday, October 19, 2015 at 3:41:58 PM UTC-4, Karl-Uwe Frank
wrote:
Thanks for sharing. I think I will not use a counter currently but
the function might also help finding potential problem with a,b,c
or d.
For a large state space you may want to stop the cycle-detector's
iteration after some sensible value, e.g. corresponding to a minute.
If it were me, I'd try it on your a,d,t update function. I already identified a couple of short loops in that, but there might be
others.
Bob H
Did you find a short loop inside of 300,000,000, 1,500,000,000 or 4,294,967,295 outputs or on a higher volume?
Did you just test one of the three vars (a,d,t) when searching for a
loop or a combination of all three at once?
Perhaps you can name me just one initial setup for a,b,c,d which produce such a short loop you found.
I need to get back on this issue because I would like to ask you for a
more specific explanation on how you found those loops.
If it were me, I'd try it on your a,d,t update function. I already identified a couple of short loops in that, but there might be
others.
On Monday, October 19, 2015 at 3:41:58 PM UTC-4, Karl-Uwe Frank
wrote:
Thanks for sharing. I think I will not use a counter currently but
the function might also help finding potential problem with a,b,c
or d.
For a large state space you may want to stop the cycle-detector's
iteration after some sensible value, e.g. corresponding to a minute.
If it were me, I'd try it on your a,d,t update function. I already identified a couple of short loops in that, but there might be
others.
Bob H
And when it sees the same interval, for a, twice in a row, it would
make the unsubstantiated claim that there *is* a loop, without
bothering to check whether the other variables have repeated.
The routine I use checks first for any recurrence of "a" in the given
range (100,000,000 or 300,000,000 for example) and if a recurrence is
found it extend the test length by adding 4294967295 plus eight times
the first recurrence distance.
The source code is over here as well as some test results
http://www.freecx.co.uk/bcd32/loop_finder/
if ((a + b + c + d) == 0) raise exception
a = a ^ 0xffffffff;
t = a + b + c + d;
On Tuesday, November 3, 2015 at 10:25:55 AM UTC-5, Karl-Uwe Frank
wrote:
The routine I use checks first for any recurrence of "a" in the
given range (100,000,000 or 300,000,000 for example) and if a
recurrence is found it extend the test length by adding 4294967295
plus eight times the first recurrence distance.
The source code is over here as well as some test results
http://www.freecx.co.uk/bcd32/loop_finder/
Looking at your source code, if I understand it correctly, I think it
won't report most loops even if it is in one. And what loops it does
report probably aren't loops.
Consider this hypothetical loop, which repeats with period 4006.
n d t a
0000 11111111 22222222 33333333
1000 00000000 22222222 33333333
2001 11111111 00000000 33333333
3003 11111111 22222222 00000000
4006 11111111 22222222 33333333
Your program, looking at the intervals between re-occurrences of a,
would see intervals of 1000, 1001, and 2005. These intervals would
repeat over and over and the program, not seeing the same interval
twice in a row, would make the unsubstantiated claim that there is no
loop.
The only loop I think it will report is one where the value of a only
occurs once within the loop.
The same would be true if you were looking at re-occurrences of d or
t instead of a, obviously.
And when it sees the same interval, for a, twice in a row, it would
make the unsubstantiated claim that there *is* a loop, without
bothering to check whether the other variables have repeated.
I suppose there could be some way to work that into a valid loop
detector, but it would be much simpler to just check whether the
whole state repeats (a,d,t or a,b,c,d,t depending on viewpoint). The
cycle detection algorithm I posted earlier is simple to implement,
easily proved correct, and (possibly with variations) widely used.
But since your a,d,t operation is invertible you can simply just run
until you see your initial a,d,t state again. You are 100% guaranteed
to get back to the initial state (a,d,t not necessarily a,b,c,d,t).
The only question is how soon.
On an unrelated topic; from the seeding part of that program:
if ((a + b + c + d) == 0) raise exception
a = a ^ 0xffffffff;
t = a + b + c + d;
The test for the exception is in the wrong place. And is incomplete.
The test does not prevent the initial state of a=b=c=d=t=0, because
you modify a after making the test.
And it doesn't matter what b and c are. The initial state to avoid
would be a=0, d<32, t<32, regardless of b and c, because you'll end
up in a loop with period 4.
The simplest correction may be to just do away with the
test/exception and set a to some value you know is good. In other
words, take the 128 bits of psuedo-randomness you're currently
sticking into a,b,c,d and instead stick it into b,c,d,t and set
a=0x15BADB04 (or some other non-zero value). This assumes states with a=15BADB04 aren't in short loops, of course, which is something you'd
want to test/verify.
Bob H
I've used this function to find loops for the version bcd32_ctr and it
found a lot loops in the counter variable (ctr).
In your hypothetical example there is no obvious simple direct distant
loop of "a".
… if a variable has a recurrence with the same distance more than twice it must considered a potential loop and the initial seed combination should
be verified more deeply. The function currently bail out if a potential loop is found ...
On Friday, November 6, 2015 at 5:00:21 PM UTC-5, Karl-Uwe Frank wrote:rate of the generator by itself. It's well enough known that it has a wikipedia page.
For example, consider what happens when you start with ctr=DD56D3A3.
On the next step (n=2) it enters a cycle that has period 250. Of
course, you can't know that until you see n=252.
n ctr
1 DD56D3A3
2 7901AE18
...
251 88017E4E
252 7901AE18
Your program is never going to see DD56D3A3 again, so it's not going
to report a loop, even though it spent all but one step firmly
entrenched in a cycle.
Well, would your recommended cycle-detector find any loop?
Yes! That's how I found the example. My actual starting value was something like 1200 steps before the one listed there as n=1. As I posted earlier, it is simple to implement and easy to prove correct. Hardly takes any memory. Runs about 1/3 the
I used that cycle detector again today, to look at your ctr function. I realize you are no longer using that, but it's a nice example of a function that can run for a while before it enters a loop.such that if x+y == 0xFFFFFFFD then there is a much-better-than-expected chance that f(x)+f(y) == 0xFFFFFFFD. So for example the two period-3069 cycles are essentially the same, the value in one just being the negative of the value in the other, offset
I tried 10,000 random starting values and let it run until it got into a cycle. The cycles it got into are shown below, along with their cycle length.
02F6FD77 period 250
007BA0F7 period 853
008A1BDA period 853
008C3FC9 period 1812
0006B6E1 period 2244
000F5672 period 3069
0024B858 period 3069
000B95A9 period 4480
00061FEE period 9730
001A84BD period 9730
00011A83 period 29566
Among the things that stood out are the cycles that have the same period. This suggests they are isomorphic. So looked for a reason. To make a long story short it looks like your ctr update function, which I will write as f(x) = rotl(x+1,29) + x+1 is
I also found a fixed point, using algebra. f(0xFFFFFFFE) = 0xFFFFFFFE. Fortunately there is no other x s.t. f(x) = 0xFFFFFFFE. So unless you start in that state, you won't enter it. But if you start in it you won't leave it.
It should be possible to to work out by hand whether there are any period-2 loops and whether they can be entered.
Bob H
On Thursday, November 5, 2015 at 2:44:08 PM UTC-5, Karl-Uwe Frank
wrote:
I've used this function to find loops for the version bcd32_ctr and
it found a lot loops in the counter variable (ctr).
And missed many.
Your ctr variable does not depend on any other variables. So when you
see a re-occurence of a value, you are in a cycle. You will see the
same interval between re-occurences of that value, every time. That eliminates the problem of your program falsely reporting a loop, in
this special case where the one variable you are looking at doesn't
depend on any other state variables.
But your program will miss loops in ctr because it is only testing
for a re-occurence of the initial value. Since your ctr update is not invertible, ctr can (and does) go through non-cyclic values before it
enters a cycle. Unless your program happens to start within a cycle,
it won't report one.
For example, consider what happens when you start with ctr=DD56D3A3.
On the next step (n=2) it enters a cycle that has period 250. Of
course, you can't know that until you see n=252.
n ctr
1 DD56D3A3
2 7901AE18
…
251 88017E4E
252 7901AE18
Your program is never going to see DD56D3A3 again, so it's not going
to report a loop, even though it spent all but one step firmly
entrenched in a cycle.
Many other initial values outside the cycle -- at least 1900 but
probably many many more -- will get you into that same short loop.
Even if those 2150 (1900 outside, 250 inside) are the only ones that
get into that cycle, you only have a ≈12% chance of realizing that
you are in that loop when you are spinning your wheels in it.
That assumes I implemented your ctr update correctly -- [(ctr+1)
rotated left 29] + (ctr+1). But even if I got it wrong, the same
logic will be true for whatever the update function is, as long as it
can't be inverted.
In your hypothetical example there is no obvious simple direct
distant loop of "a".
I disagree. The whole d,t,a state repeats every 4006 steps. Obviously
"a" will also repeat every 4006 steps.
If you would like a non-hypothetical example, make a PRNG that has
two single bit state variables called a and b. The update is temp=b;
b=a; a=(a+temp)&1. This will generate the fibonacci sequence modulo
2 (assuming you don't start with a=b=0) -- the sequence will be 0,1,1,0,1,1,0,1,1,0,1,1,0,1,1,0,1,1 … and obviously has a period of 3
(this is true for both a and b). Your program will detect the loop
only if it sees 0 as the initial value.
… if a variable has a recurrence with the same distance more than
twice it must considered a potential loop and the initial seed
combination should be verified more deeply. The function currently
bail out if a potential loop is found ...
The confusion was mine. I misinterpreted what the program means when
it announces LOOP FOUND as it "bails out".
Bob H
Interesting though the loop value 29566 which I have found in every test so far.
It seems that you have chosen the value a=0x15BADB04 very carefully.
2) does the 100% recurrence of the initial state for (a,d,t) affect the output quality in a negative way?
3) will the cycle detector function you are using find loops in one of the vars (a,d,t)?
For example, consider what happens when you start with ctr=DD56D3A3.
On the next step (n=2) it enters a cycle that has period 250. Of
course, you can't know that until you see n=252.
n ctr
1 DD56D3A3
2 7901AE18
...
251 88017E4E
252 7901AE18
Your program is never going to see DD56D3A3 again, so it's not going
to report a loop, even though it spent all but one step firmly
entrenched in a cycle.
Well, would your recommended cycle-detector find any loop?
There is a period-29566 cycle which most randomly chosen starting
points will end up in. It's not the only cycle though. I think if
you have run more than 100 tests and they all ended up in that cycle,
something is wrong.
Not that much but the results mostly look like this one
4294967295) 2021ee5f 85873670 0a11d53c 4fff3c6a f19f153c | 5cbbdc5f |
4294967295
4294996861) 78cf9471 ce40f024 d12d7ef9 8cf873cf dfa75ef9 | 5cbbdc5f |
29566
4295026427) 58aba37a 2181b9fd c7e035e9 240a2e43 0da895e9 | 5cbbdc5f |
29566
4295055993) fce64fe5 534ebea6 48fe8951 0d6e245b 0d89e951 | 5cbbdc5f |
29566
didn't know ... what invertible meant.
On Friday, November 6, 2015 at 9:53:30 PM UTC-5, Karl-Uwe Frank
wrote:
Interesting though the loop value 29566 which I have found in every
test so far.
There is a period-29566 cycle which most randomly chosen starting
points will end up in. It's not the only cycle though. I think if
you have run more than 100 tests and they all ended up in that cycle, something is wrong.
Ha, I really like that :-))It seems that you have chosen the value a=0x15BADB04 very
carefully.
I didn't. It is just my favorite constant because it looks is "is bad
boy".
The simplest correction may be to just do away with the
test/exception and ==> set a to some value you know is good <== ...
and set a=0x15BADB04
2) does the 100% recurrence of the initial state for (a,d,t) affect
the output quality in a negative way?
This is a consequence of the a,d,t update being invertible. For PRNG
this is often considered a desirable property.
3) will the cycle detector function you are using find loops in one
of the vars (a,d,t)?
I have no idea what that means. Are you thinking that, for example,
a will repeat with one period while d and t repeat with a longer
period? I guess that is what happens when a=0, d<32, t<32, since a
and t repeat with period=1 and d repeats with period 2. I don't know
if Floyd's cycle detector can be modified to look for that.
Bob H
On 08.11.15 18:57, Karl-Uwe Frank wrote:
This means that the seeding function with
a = 0x15BADB04;
b = 0x9E3779B9;
c = 0;
d = 0;
t = a + b + c + d;
represent the very basic setup.
This initial setup passed the Crush-Test
========= Summary results of Crush =========
Version: TestU01 1.2.3
Generator: bcd32 32-bit
Number of statistics: 144
Total CPU time: 01:14:08.00
All tests were passed
Changing the seeding function this way
// The Seeding Function
void seed_bcd32(uint32_t seed[]) {
a = seed[0];
b = seed[1];
c = seed[2];
d = seed[3];
if (a==0) a = 0x15BADB04;
if (b==0) b = 0x9E3779B9;
t = a + b + c + d;
}
will keep the known good basic state (which hopefully will survive the BigCrush-Test) *and* allow the passing of an arbitrary seed with 128bit entropy.
There is a period-29566 cycle which most randomly chosen
starting points will end up in. It's not the only cycle though.
I think if you have run more than 100 tests and they all ended up
in that cycle, something is wrong.
Not that much but the results mostly look like this one
4294967295) 2021ee5f 85873670 0a11d53c 4fff3c6a f19f153c | 5cbbdc5f
| 4294967295 4294996861) 78cf9471 ce40f024 d12d7ef9 8cf873cf
dfa75ef9 | 5cbbdc5f | 29566 4295026427) 58aba37a 2181b9fd c7e035e9
240a2e43 0da895e9 | 5cbbdc5f | 29566 4295055993) fce64fe5 534ebea6
48fe8951 0d6e245b 0d89e951 | 5cbbdc5f | 29566
Does the first column mean you ran 4 billion steps of your generator
every time?
The cycle detection should run pretty fast since you're working in C.
My implementation is pure python, which I expect to be at least a
factor of 4 slower than C. Yet, I ran the test from 10 thousand
randomly chosen starting points and it took only a few hours. But I
stopped, each time, when it detected a cycle.
Of those 10K, about 80% ended up in the period-29566 cycle. Four
other cycles -- with periods 9730, 9730, 3069 and 3069 -- were each discovered about 400-500 times. Some other rarer ones were
encountered.
There are also other cycles but they were observed more rarely. And
a different program looking only for short periods identified periods
of 6, 6, 8, 8, 21, 21, 66, and 66 in addition to the period-1 and
period-250 mentioned earlier.
didn't know ... what invertible meant.
Invertible means the process can theoretically be run in reverse,
because for any state s there is only one state s' that you can give
your update function and get s.
That your t,d,a update is invertible is obvious and easily computed,
like this:
t = t - a
d = (d - a) ^ t
a = a - (d>> 5)
That your ctr update is not invertible is obvious because (among
other reasons) there are two states, DD56D3A3 and 88017E4E, that
produce the same result.
Sometimes you may be able to prove the update is invertible even
though you can't easily write code to run it in reverse.
Bob H
This means that the seeding function with
a = 0x15BADB04;
b = 0x9E3779B9;
c = 0;
d = 0;
t = a + b + c + d;
represent the very basic setup.
This initial setup passed the Crush-Test
========= Summary results of Crush =========
Version: TestU01 1.2.3
Generator: bcd32 32-bit
Number of statistics: 144
Total CPU time: 01:14:08.00
All tests were passed
On 08.11.15 21:39, Karl-Uwe Frank wrote:
On 08.11.15 18:57, Karl-Uwe Frank wrote:
This means that the seeding function with
a = 0x15BADB04;
b = 0x9E3779B9;
c = 0;
d = 0;
t = a + b + c + d;
represent the very basic setup.
This initial setup passed the Crush-Test
========= Summary results of Crush =========
Version: TestU01 1.2.3
Generator: bcd32 32-bit
Number of statistics: 144
Total CPU time: 01:14:08.00
All tests were passed
Changing the seeding function this way
// The Seeding Function
void seed_bcd32(uint32_t seed[]) {
a = seed[0];
b = seed[1];
c = seed[2];
d = seed[3];
if (a==0) a = 0x15BADB04;
if (b==0) b = 0x9E3779B9;
t = a + b + c + d;
}
will keep the known good basic state (which hopefully will survive the
BigCrush-Test) *and* allow the passing of an arbitrary seed with 128bit
entropy.
OMG - tired, need to sleep and think with a clear mind - because
obviously this seeding will not prevent d<32 and a=1 running into a
"dead" loop ...
Will
a = 0x15BADB04;
b = 0;
c = 0;
d = 0;
t = 0x9E3779B9;
giving a try with TestU01 tomorow ...
------------------------------------------------
Additionally I've tested with
seed[0]=0x61C88648 // ==> b=1
seed[1]=0
seed[2]=0
and it passed the Crush-Test and the BigCrush too.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 112 |
Nodes: | 8 (1 / 7) |
Uptime: | 09:27:10 |
Calls: | 2,468 |
Files: | 8,620 |
Messages: | 1,889,475 |