Forum: >>> Magnum BBS <<<

My experiment to compare checker skill vs cube skill in gamblegammon

From MK@21:1/5 to All on Wed Nov 16 00:18:45 2022

I just ran an experiment with Gnubg playing against
itself 500 money games in each of four scenarious
below, with different player settings combinations.

The results are followed by my interpretations about
what they mean regarding checker skill vs. cube skill.

Scenario 1: grandmaster vs expert with default settings. ==========================================
After 503 games: gnubg = 2136, expert = 873 points

Chequer Play Statistics:
Error total EMG (Points) ... -0.000 ( -0.000) ... -34.460 (-57.855)
Chequerplay rating ... Supernatural ... World class
Cube Statistics:
Error total EMG (Points) ... -0.000 ( -0.000) ... -3.654 ( -5.120)
Cube decision rating ... Supernatural ... World class
Overall Statistics:
Error total EMG (Points) ... -0.000 ( -0.000) ... -38.114 (-62.975)
Overall rating ... Supernatural ... World class
Actual result ... +91.000 ... -91.000
Luck adjusted result ... +71.376 ... -71.376

My comments: nothing unusual, even with a considerable
skill difference and some cube errors, bot never beavers itself.

Scenario 2: grandmaster vs random checker + random cube. ==========================================
After 510 games: gnubg = 6993, mutant = 10 points

Chequer Play Statistics:
Error total EMG (Points) ... -0.000 ( -0.000) ... -1157.490 (-5365.473) Chequerplay rating ... Supernatural ... Awful!
Cube Statistics:
Error total EMG (Points) ... -0.000 ( -0.000) ... -30.705 (-122.124)
Cube decision rating ... Supernatural ... Awful!
Overall Statistics:
Error total EMG (Points) ... -0.000 ( -0.000) ... -1188.194 (-5487.597)
Overall rating ... Supernatural ... Awful!
Actual result ... +6983.000 ... -6983.000
Luck adjusted result ... +6685.349 ... -6685.349

My comments: again nothing unexpected, zero checker
skill + zero cube skill doesn't stand a chance, wins only
10 points due to luck, in almost all games grandmaster
doubles, mutant beavers, grandmaster raccoons and wins.

The above two runs were made to "calibrate" the tool and
establish a baseline, so to speak.

Scenario 3: grandmaster vs random checker + grandmaster cube. ==========================================
After 500 games: gnubg = 1462, mutant = 4 points

Chequer Play Statistics:
Error total EMG (Points) ... -0.000 ( -0.000) ... -907.672 (-1386.576) Chequerplay rating ... Supernatural ... Awful!
Cube Statistics:
Error total EMG (Points) ... -0.000 ( -0.000) ... -0.000 ( -0.000)
Cube decision rating ... Supernatural ... Supernatural
Overall Statistics:
Error total EMG (Points) ... -0.000 ( -0.000) ... -907.672 (-1386.576)
Overall rating ... Supernatural ... Awful!
Actual result ... +1458.000 ... -1458.000
Luck adjusted result ... +1416.664 ... -1416.664

My comments: very unexpected results, mutant wins even less,
only 4 points due to luck, no cube errors, not even one beaver,
grandmaster level cube skill is useless without checker skill.

Scenario 4: grandmaster vs grandmaster checker + random cube. ==========================================
After 503 games: gnubg = 2136, mutant = 873 points

Chequer Play Statistics:
Error total EMG (Points) ... -0.000 ( -0.000) ... -0.000 ( -0.000)
Chequerplay rating ... Supernatural ... Supernatural
Cube Statistics:
Error total EMG (Points) ... -0.000 ( -0.000) ... -220.525 (-600.634)
Cube decision rating ... Supernatural ... Awful!
Overall Statistics:
Error total EMG (Points) ... -0.000 ( -0.000) ... -220.525 (-600.634)
Overall rating ... Supernatural ... Intermediate
Actual result ... +1263.000 ... -1263.000
Luck adjusted result ... +1344.480 ... -1344.480

My comments: very unexpected results, wow!, no checker errors,
lots of beavers and raccoons like in scenario 2 above, but in this
case even with zero cube skill, mutant manages to win 30% due
to checker skill alone and achieves and overall "Intermediate"
level performance!

Notes:
==========================================
All four sessions aren't 500 games because I couldn't monitor
and stop the self playing bot exactly at 500 games each time.

Random play and zero skill refer to maximum noise settings
of 1.0 for checker and cube level, which is assumed to be near
random play.

If anyone wants to see them, I can make the sessions in SGF
or MAT format, as well as the TXT analyses, all in one ZIP file.

Conclusion:
==========================================
I know normally 500 games wouldn't be significant enough but
in this case the difference between the accomplishment of zero
checker skill and the accomplishment of zero cube skill is so
dramatically huge that I think it's proof enough, at least to show
that properly conducted, long enough experiments will be worth
the time and effort of doing them.

I believe the above experiment is a valid way of pitting the cube
skill against the checker skill, by isolating each in turn, to see
which is more decisive in gamblegammon. Clearly the extent of
the "cube skill theory" bullshit is of very limited use/value, mostly
towards the ends of games, compared to the big hype about it.

XG doesn't allow fiddling with player strength settings but Gnubg
does, (which is a big plus for Gnubg). So, you don't have to take
my word for it. You can easily do your own experiments to see
what kinds of results you will get.

Once you all convince yourselves well enough, we can proceed to
doing experiments, similar to the half-ass one that Axel has done,
to demonstrate once for all that the cube doesn't add any skill to backgammon and matters in gamblegammon much less than it's
claimed to do.

MK

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Timothy Chow@21:1/5 to All on Wed Nov 16 09:45:13 2022

On 11/16/2022 3:18 AM, MK wrote:

Scenario 3: grandmaster vs random checker + grandmaster cube. ==========================================
After 500 games: gnubg = 1462, mutant = 4 points

My comments: very unexpected results, mutant wins even less,
only 4 points due to luck, no cube errors, not even one beaver,
grandmaster level cube skill is useless without checker skill.

Scenario 4: grandmaster vs grandmaster checker + random cube. ==========================================
After 503 games: gnubg = 2136, mutant = 873 points

My comments: very unexpected results, wow!, no checker errors,
lots of beavers and raccoons like in scenario 2 above, but in this
case even with zero cube skill, mutant manages to win 30% due
to checker skill alone and achieves and overall "Intermediate"
level performance!

If you found these results surprising then you have a long way to
go before you achieve any understanding of this topic. Your results
are completely in line with what the conventional wisdom would predict.

---
Tim Chow

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Axel Reichert@21:1/5 to Timothy Chow on Thu Nov 17 09:09:41 2022

Timothy Chow <tchow12000@yahoo.com> writes:

If you found these results surprising then you have a long way to go
before you achieve any understanding of this topic. Your results are completely in line with what the conventional wisdom would predict.

I tend to agree, and I can offer more of my own results that might
give Murat food for thought. I had completed my experiments two months
ago, but did not find time to write up a post. But now it seems like
the perfect moment.

Let me introduce the main characters of my fictitious chouette. All
are expert checker players, but have very poor understanding of cube
skill. They are, however, able to assess who is favourite (> 50 %
winning chances).

- Clarence Careful: He will never double and only take if favourite.

- Danny Dropper: He will always double if favourite, but only take if
favourite.

- Toby Taker: The opposite of Danny, he will never double, but always
take.

- William Wildcube: He will always double if favourite and always take.

The chouette host of these mutants is Edward Equity, who is on expert
level with respect to checker play and has world class cube
handling. In Edward's chouette, the Jacoby rule is used, but Beavers
are forbidden. There is no consulting. The session lasts 3000
games. And of course there was no chouette, that is just the
background story. Instead, I had my simple bot mimic gnubg's checker
play and use one of the 4 mutant cubing strategies in turn against
gnubg set to Expert checker play and World Class cube handling.

The null hypothesis was that the respective mutant's cube strategy is
as good as the world class cube handling by Edward/gnubg. For all four
mutants it could be rejected with a sigma level > 4.5. Surprise,
surprise. (-;

After that was done, I did further experiments, generalizing the
mutant's "strategies". Let me introduce Rory Random:

- Never doubles if losing (non-favourite)

- If favourite, doubles with probability d/6 (d from 0 to 6)

- Always takes if winning (favourite)

- If losing (non-favourite), takes with probability t/6 (t from 0 to 6)

So Clarence Careful is a special case of Rory Random, namely d = 0 and
t = 0. Likewise, Danny Dropper uses d = 6 and t = 0, Toby Taker uses d
= 0 and t = 6, and William Wildcube uses d = 6 and t = 6 as a
"strategy".

Overall in this framework, there are 49 mutant strategies, some wild,
some not so wild. It should be clear that the wilder mutants drive the
cube up and thus the average game will have more points at stake than,
say, a session with Clarence Careful. Hence it makes sense to relate
the average loss of the mutant strategies not to the number of games,
but rather to the number of points.

For example, Edward Equity (= gnubg) versus William Wildcube ended
11392 versus 8066 after 3000 games, a net win for gnubg of 3326. This
amounts to more than 1.1 points per game, but a more meaningful number
is (11392-8066)/(11392+8066) = 0.17 points won per points played
(pwppp).

Here are these results for all the mutants I tested (the former 4
chouette characters are the "corner cases"). All could be dismissed with
a sigma level > 2.9.

| pwppp | Random take | |---------------+-------------+------+------+------+------+------+------|
| Random double | 0/6 | 1/6 | 2/6 | 3/6 | 4/6 | 5/6 | 6/6 | |---------------+-------------+------+------+------+------+------+------|
| 0/6 | 0.38 | | | 0.27 | | | 0.25 |
| 1/6 | | 0.15 | | | | 0.16 | |
| 2/6 | | | 0.13 | | 0.14 | | |
| 3/6 | 0.15 | | | 0.14 | | | 0.13 |
| 4/6 | | | 0.17 | | 0.17 | | |
| 5/6 | | 0.16 | | | | 0.16 | |
| 6/6 | 0.15 | | | 0.14 | | | 0.17 |

The timid non-doubling strategies (first row with "0/6" doubling
probability, d = 0) fare much worse than the mutants who dare to
double at all (d > 0). These latter ones all get roughly similar
results, around 0.15 pwppp. Now before one falsely believes that this
shows that cube strategies do not matter and switches to random doubling/taking, you should realize that 0.15 pwppp is not "quite an achievement", but it is pretty bad. Here is a table of three different
players (set up by using numerical noise in gnubg for checker play and
cube handling) all achieving roughly the same pwppp of 0.15:

| Checker noise | Cube noise | Checker rating | Cube rating | Overall | |---------------+------------+----------------+--------------+--------------|
| 0.000 | ? (Mutant) | Expert | Awful | Intermediate |
| 0.016 | 0.016 | Advanced | Beginner | Intermediate |
| 0.022 | 0.000 | Intermediate | Supernatural | Intermediate |

In my opinion the performance of the mutant cubers is not surprisingly
good, but rather expected and not something to be proud of. Any
ambitious backgammon player should strive for more than "Advanced" and "Beginner" as ratings (the row with 0.016 noise), which is about as
good as the mutants.

So the bottom line is: Much ado about nothing. An interesting study
for me nevertheless, and it might be fun in your next live session to
roll a dice before a cube decision. The face of your opponent might be
worth the price you will be paying for that stunt. Which, by the way,
has a prominent precedent in Phil Simborg:

https://www.bkgm.com/articles/Simborg/ACoinToss/

There were two things that puzzled me a bit, but I will address them
in a different post.

Best regards

Axel

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MK@21:1/5 to Tim Chow on Fri Nov 18 22:56:45 2022

On November 16, 2022 at 7:45:15 AM UTC-7, Tim Chow wrote:

On 11/16/2022 3:18 AM, MK wrote:

Scenario 3: grandmaster vs random checker + grandmaster cube.
After 500 games: gnubg = 1462, mutant = 4 points

Scenario 4: grandmaster vs grandmaster checker + random cube.
After 503 games: gnubg = 2136, mutant = 873 points

If you found these results surprising

By "results", I'm referring to the points won/lost,
not to my observations/comments. Is that also
how you understood?

then you have a long way to go before you
achieve any understanding of this topic.

In onder to make this statement, you must have
at least some (i.e. more than "any") or possibly
much understanding of "this topic".

Would you please briefly define whay you mean
by "this topic" first? Then help me (and perhaps
others) achieve "any understanding" of this topic?

Your results are completely in line with what
the conventional wisdom would predict.

Would you please briefly define whay you mean
by "conventional wisdom" within this context?

Since you find my reults completely in line, you
must possess the whatever wisdom that would
have predicted my results and you should be able
use it to predict my next results.

Before I go into this deeper, I would like to know
how efficient are you with using that wisdom?

If I give you the detailed settings for a pair of bot
player levels, how long would it take you to come
up with your prediction of their win/lose results?
Ten minutes? Half hour?

Considering that running a session of 500 games
can be done in less than an hour on a decent PC,
you can't take more time for your prediction. ;)

So, let me know and let's play... :)

MK

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MK@21:1/5 to Axel Reichert on Mon Nov 28 14:16:39 2022

On November 17, 2022 at 1:09:43 AM UTC-7, Axel Reichert wrote:

Timothy Chow <tchow...@yahoo.com> writes:

Your results are completely in line with what the
conventional wisdom would predict.

I tend to agree, and I can offer more of my own
results that might give Murat food for thought.

The key word in Tim's comment is "predict"! The
problem with you two is that you make generic,
after-the-fact assertions on the results that are
worthless without making predictions first.

I had completed my experiments two months
ago, but did not find time to write up a post.

It is great that you are still doing experiments. It
means there is still hope for you. ;) You deserve
praise for keeping an open mind and showing the
courage to stick your neck out by publishing your
results.

I wanted to wait a while before responding to you,
in order to give other "mathematicians" and all to
contribute, as well as to you to add more to it like
you said you would. It's noteworthy that nobody
else said anything good or bad. I think because
they lack the brains and/or the guts to open their
mouths.

They are, however, able to assess who is favourite

50 % winning chances).

It has to be noted that this intruduces bias into your
experiment by assuming that the bot's assessments
are accurate, which I don't agree with, but we have no
choice other than to make do with what we have now.

... will never double and only take if favourite.
... will always double if favourite, but only take if
favourite.
... will never double, but always take.
... will always double if favourite and always take.

You have quite a circus going with your characters :)
but I'm not sure what will they accomplish other than
just satisfy some curiosity.

Jacoby rule is used, but Beavers are forbidden.

Why Jacoby? It's stupid as it defeats the concept of
correct cube decision. Forbidding Beavers is worse
as it makes your "cube skill" experiment worthless,
unless you agree to revise your "cube skill theory" to
completely eliminate Beavers, Raccoons, etc. which
are currently argued to require more skill than simple
double/take decisions.

I had my simple bot mimic gnubg's checker play
and use one of the 4 mutant cubing strategies in
turn against gnubg set to Expert checker play and
World Class cube handling.

If I could have convinced the Gnubg team to add the
option to let users make such selections in the player
settings, everyone (including you) could do their own
experiments without having to create mutant bots but
they seem afraid to do anything that could undermine
the dogmas of the gamblegammon establishment. :(

The null hypothesis was that the respective mutant's
cube strategy is as good as the world class cube

Why do you keep going to this? The original reason
for the mutant experiment that I had proposed was
to debunk the so-called "cube skill theory" based on
some jackoffski formulas, etc. You accomplished
that in your first experiment already. Why don't you
try to refine that instead...?

For all four mutants it could be rejected with a sigma
level > 4.5. Surprise, surprise. (-;

No surprise indeed. Just more pointless mathshitting.

After that was done, I did further experiments.....

Overall in this framework, there are 49 mutant
strategies, some wild, some not so wild.....
.....
Here are these results for all the mutants.....
All could be dismissed with a sigma level > 2.9.

I can't believe that you have gone to such an extent
of spending time and effort doing more mathshitting
with no apparent benefit but you should be praised
for doing something than nothing, as something may
somehow come out of all this someday...

These latter ones all get roughly similar results

I hope you realise that this is quite telling in itself,
regardless of the magnitude of the actual pwppp.

It shows that it doesn't matter which of those mutant
strategy is used against a world class cube handling.

Now before one falsely believes that this shows that
cube strategies do not matter

There is no "falsely" about it. Your results prove that
any random or random-like cube strategy is as good
as or close to any other, without one emerging as the
clearly superior, which would help demonstrate skill.

you should realize that 0.15 pwppp is not "quite an
achievement", but it is pretty bad.

As you guys keep saying "no surprise", "bad" is indeed
what would be expected from random or random-like
cube strategies but the real question is: "how bad"!?

In other words, "how does the actual results compare
to the predicted results according to calculations per
the "cube skill theory"!?

Thus far, we really don't know because you have never
made any predictions!, not even retroactive ones!!

Here is a table of three different players (set up by
using numerical noise in gnubg for checker play and
cube handling) all achieving roughly the same pwppp
of 0.15:

Now you are comparing apples to oranges. There is
no "strategy" in the noise inserted by Gnubg. You can't
compare it to anything that you use the word "strategy"
for, such as mutant cube handling in a consistent way.

Your efforts like your above experiments are nothing
more than desperate, helpless flappings of a fish in a
bucket, in your trying to continue existing in denial of
reality, (perhaps even ironically "mathematical reality").

So the bottom line is: Much ado about nothing. An
interesting study for me nevertheless,

I agree. Now, what about sharing your data this time...?
There may be more "interestings" to be found in there. ;)

and it might be fun in your next live session to roll a
dice before a cube decision.

Again, the mutant experiment I had proposed was not
about random cube decision and your first experiment
had proven painful for you...

I doubt mocking at your own results will ease your pain.

has a prominent precedent in Phil Simborg: https://www.bkgm.com/articles/Simborg/ACoinToss/

Decorating your bullshit with more shit from others is
pathetically self-destructive. :( You can do better this.

There were two things that puzzled me a bit, but I will
address them in a different post.

I will be looking out for it.

MK

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Keyop
  Sun Apr 28 20:37:53 2024
  from Huddersfield, West Yorkshire via SSH
- Keyop
  Sun Apr 28 20:37:37 2024
  from Huddersfield, West Yorkshire via SSH
- Keyop
  Sun Apr 28 20:30:04 2024
  from Huddersfield, West Yorkshire via SSH
- Bob Worm
  Mon Apr 29 09:04:47 2024
  from Wales, Uk via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	297
Nodes:	16 (2 / 14)
Uptime:	20:22:35
Calls:	6,667
Calls today:	1
Files:	12,216
Messages:	5,337,146

My experiment to compare checker skill vs cube skill in gamblegammon

Who's Online

Recent Visitors

System Info