(These attack orders can be numbered from 1 to 24 or 0 to 23 and then it's attack order/permutation can be looked up to safe memory.)
Each number has it's own attack order and thus this leads to the following combinational computational problem:
All combinations of permutations in which order group X can attack Group Y
and vice versa:
Group X = 24 x 24 x 24 x 24
Group Y = 24 x 24 x 24 x 24
So this is 24 possibility to the power of 8.
Final computation complexity at the very minimum is (24 to the power of 8) multiplied by roughly 4 attacks perhaps even 5 or 6 to finally destroy a
group of numbers.
24 to the power of 8 = 110.075.314.176
I have already written a computer program which can solve this, however the computer program estimates it takes 19 hours on a 2.0 gigahertz AMD Athlon
Using dual core could solve the problem over night, though I do not feel comfortable running my PC at night unattended.
So now I have the idea to make this program run when my computer is idling during the day, it should also be able to store it's progress so that it can continue after it was shutdown.
(Idea for now is to make it multi threaded and assign a low thread priority
so it can run during the day when I use my computer and it's not doing much
so it can use the reserve computational horse power).
(I still have to try these "idle/reverse" ideas to see which one works best without interrupting my web browsing or music listening too much ;))
My system has 4 GB of ram, so other ideas could be to store a data structure partially which could keep some computations so that it doesn't have to be
done again... Though memory lookups might be a bit slow so not sure if that makes any sense.
I might also try GPU/Cuda since there seems to be lots of loops/reoccurences
of the same computations that will happen over and over again... So maybe
cuda can detect "same branch execution" and some "computations" and might
speed it up, not sure about that.
Just the 8 index loops already cost a lot of instructions. Since there are
only 24 permutation it would be enough to store it in 5 bits. Perhaps a
rounded floating point which increases by 1/24 might be enough to trigger
the 4th bit from incrementing when it actually needs to.
2x2x2x2x2 = 32 (it should increment bit 6 when the 5 bits reach 24).
So the idea here was to create 8 indexes from just 1 index being incremented
to create the 8 combinations of indexes "instruction cheaply".
Not sure if this will work, just an idea I might try :)
Then those bits would still need to be extract and makes. So perhaps on
older systems this is not efficient.
The 8 indexes need at least 3 instructions, 1 index increment, 1
comparision, 1 jump.
The inner loop contains some while loops to increment attack index per
Each number has a "alive" variable which starts at 1.0 and is decreased everytime it's attacked.
Once a number is dead below 0.0000001 it's considered dead and can no longer attack.
(Since victory table was described as integers above this can also be read
as: Alive starts at 100 and once it goes zero or negative it's dead).
Anyway the main reason why I wrote to this/these groups is that the numbers themselfes are not that larger and can fit in a byte, even a few bits.
Thus they will fit into SSE registers and such and I also know SSE has some permutations instructions.
I am not expert at SSE but I am wondering if there are perhaps some SSE1/2/3/4/5 instructions which can help at solving/computing this problem ?
Now the question you might be wondering about is ? What am I actually trying
to compute ?
Well the final output could simply be the number of victories that each
attack order has had per number. I am particullary interested in victories
of number 4. So that would be sufficient for now.
Number 4 has 24 permutations.
So I want to know how each permutation of number 4 performs under all other circumstances/permutations of all other numbers/combinations and so forth.
This basically requires the entire set to be computed and then record number
of victories for for example Group X number 4.
Only the results of one group needs to be outputted since the two groups are
a mirror of each other.
Also during the development of the computer program I finally had a
successfull implementation by keeping it as simple as possible and working
with direct variables, instead of much more complex arrays.
Later on I made a more code efficient version which uses arrays/lookups and such. It was much harder to try the array approach at first since it becomes mindly complex and can obscure "vision to a solution".
So I suggest anybody trying to implement a solver for this to keep the code
as simple as possible at first, since this is already an 8 dimensional
problem with a 9th dimension.
Also it was interesting to see the Delphi for loops not allowing array
fields as loop variables, so those loops will need to be fleshed out anyway,
or one big linear loop could be used and then 8 indexes calculated from
That approach will probably be necessary for a cuda solution anyway... 32
bit might be sufficient if remainders are reincluded in the conversion from thread/block dimensions to linear dimension back to 8 dimensional indexes.
Perhaps such computation might even be a bit quicker than the 3 instructions per loop.
For example 8 divisions and 8 remainders will be necessary to compute 8
indexes from one big linear indexes. Since div/mod can be the same
instruction this might only require 8 instructions to compute these loop indexes from a big linear index.
However computing the big linear index already requires between 6 or 10 instructions.
So just to compute those 100+ billion data entries requires something like
20 instructions just to be able to compute those loop indexes.
This is my finding with bigger data sets which consists out of small little data units... The closer the number of entries/data items get to the
actually processing computing power the more time/processing/instructions
are wasted on just computing these loop indexes.
It makes sense from this perspective: A problem of 100 items only needs 100 loops. Thus lots of processing power remains for the data. But for 2 billion data items, the number of items already matches the gigaherts of the core...
so the core is already swamped... with just loop index calculations.
I hope this reasoning/explanation convinces some CPU/GPU designers to
include more instructions to compute loop indexes in all kinds of ways that could speed that up somewhat and safe some processing time.
This idea was inspired by decision trees. However currently nothing is multiplied. Only victory chances are subtracted from an alive variable. Perhaps this is the wrong way to go about it and chances should be multiplied instead ?
Perhaps alive/death should be ignored or something ?
Though subtracting something from a "health" variable seems somewhat more realistic... and necessary ? Since just multiplieing by itself will not reduce health to zero... it will drive it downwards though.
Perhaps I just should compute "overall" chances of staying alive and ignore the possibility of death ?
So perhaps I should ignore the "dynamic nature" of this problem and just assume that every number has a chance of actually surviving no matter what the odds are against it ?!
That's kind of an interesting way of looking at it... I think I might try this as well ! ;)
This would probably lead to a faster computation... then using while loops and if statements and such to determine if it died or not !
Hmmmmm interesting idea...
For now I tried the somewhat dynamic approach which included death... dead numbers cannot attack... but next I could try a static approach.
Also another idea I just had is perhaps reduce the chances of victory/damage that numbers can do on their already chances of being alive.
So the less chance a number has of being alive... the less chance it has of actually achieving victory against another number... This also seems interesting... hmm. Just writing to this group was a bit usefull for me ! ;)