• My experiment to compare checker skill vs cube skill in gamblegammon

    From MK@21:1/5 to All on Wed Nov 16 00:18:45 2022
    I just ran an experiment with Gnubg playing against
    itself 500 money games in each of four scenarious
    below, with different player settings combinations.

    The results are followed by my interpretations about
    what they mean regarding checker skill vs. cube skill.

    Scenario 1: grandmaster vs expert with default settings. ==========================================
    After 503 games: gnubg = 2136, expert = 873 points

    Chequer Play Statistics:
    Error total EMG (Points) ... -0.000 ( -0.000) ... -34.460 (-57.855)
    Chequerplay rating ... Supernatural ... World class
    Cube Statistics:
    Error total EMG (Points) ... -0.000 ( -0.000) ... -3.654 ( -5.120)
    Cube decision rating ... Supernatural ... World class
    Overall Statistics:
    Error total EMG (Points) ... -0.000 ( -0.000) ... -38.114 (-62.975)
    Overall rating ... Supernatural ... World class
    Actual result ... +91.000 ... -91.000
    Luck adjusted result ... +71.376 ... -71.376

    My comments: nothing unusual, even with a considerable
    skill difference and some cube errors, bot never beavers itself.

    Scenario 2: grandmaster vs random checker + random cube. ==========================================
    After 510 games: gnubg = 6993, mutant = 10 points

    Chequer Play Statistics:
    Error total EMG (Points) ... -0.000 ( -0.000) ... -1157.490 (-5365.473) Chequerplay rating ... Supernatural ... Awful!
    Cube Statistics:
    Error total EMG (Points) ... -0.000 ( -0.000) ... -30.705 (-122.124)
    Cube decision rating ... Supernatural ... Awful!
    Overall Statistics:
    Error total EMG (Points) ... -0.000 ( -0.000) ... -1188.194 (-5487.597)
    Overall rating ... Supernatural ... Awful!
    Actual result ... +6983.000 ... -6983.000
    Luck adjusted result ... +6685.349 ... -6685.349

    My comments: again nothing unexpected, zero checker
    skill + zero cube skill doesn't stand a chance, wins only
    10 points due to luck, in almost all games grandmaster
    doubles, mutant beavers, grandmaster raccoons and wins.

    The above two runs were made to "calibrate" the tool and
    establish a baseline, so to speak.

    Scenario 3: grandmaster vs random checker + grandmaster cube. ==========================================
    After 500 games: gnubg = 1462, mutant = 4 points

    Chequer Play Statistics:
    Error total EMG (Points) ... -0.000 ( -0.000) ... -907.672 (-1386.576) Chequerplay rating ... Supernatural ... Awful!
    Cube Statistics:
    Error total EMG (Points) ... -0.000 ( -0.000) ... -0.000 ( -0.000)
    Cube decision rating ... Supernatural ... Supernatural
    Overall Statistics:
    Error total EMG (Points) ... -0.000 ( -0.000) ... -907.672 (-1386.576)
    Overall rating ... Supernatural ... Awful!
    Actual result ... +1458.000 ... -1458.000
    Luck adjusted result ... +1416.664 ... -1416.664

    My comments: very unexpected results, mutant wins even less,
    only 4 points due to luck, no cube errors, not even one beaver,
    grandmaster level cube skill is useless without checker skill.

    Scenario 4: grandmaster vs grandmaster checker + random cube. ==========================================
    After 503 games: gnubg = 2136, mutant = 873 points

    Chequer Play Statistics:
    Error total EMG (Points) ... -0.000 ( -0.000) ... -0.000 ( -0.000)
    Chequerplay rating ... Supernatural ... Supernatural
    Cube Statistics:
    Error total EMG (Points) ... -0.000 ( -0.000) ... -220.525 (-600.634)
    Cube decision rating ... Supernatural ... Awful!
    Overall Statistics:
    Error total EMG (Points) ... -0.000 ( -0.000) ... -220.525 (-600.634)
    Overall rating ... Supernatural ... Intermediate
    Actual result ... +1263.000 ... -1263.000
    Luck adjusted result ... +1344.480 ... -1344.480

    My comments: very unexpected results, wow!, no checker errors,
    lots of beavers and raccoons like in scenario 2 above, but in this
    case even with zero cube skill, mutant manages to win 30% due
    to checker skill alone and achieves and overall "Intermediate"
    level performance!

    Notes:
    ==========================================
    All four sessions aren't 500 games because I couldn't monitor
    and stop the self playing bot exactly at 500 games each time.

    Random play and zero skill refer to maximum noise settings
    of 1.0 for checker and cube level, which is assumed to be near
    random play.

    If anyone wants to see them, I can make the sessions in SGF
    or MAT format, as well as the TXT analyses, all in one ZIP file.

    Conclusion:
    ==========================================
    I know normally 500 games wouldn't be significant enough but
    in this case the difference between the accomplishment of zero
    checker skill and the accomplishment of zero cube skill is so
    dramatically huge that I think it's proof enough, at least to show
    that properly conducted, long enough experiments will be worth
    the time and effort of doing them.

    I believe the above experiment is a valid way of pitting the cube
    skill against the checker skill, by isolating each in turn, to see
    which is more decisive in gamblegammon. Clearly the extent of
    the "cube skill theory" bullshit is of very limited use/value, mostly
    towards the ends of games, compared to the big hype about it.

    XG doesn't allow fiddling with player strength settings but Gnubg
    does, (which is a big plus for Gnubg). So, you don't have to take
    my word for it. You can easily do your own experiments to see
    what kinds of results you will get.

    Once you all convince yourselves well enough, we can proceed to
    doing experiments, similar to the half-ass one that Axel has done,
    to demonstrate once for all that the cube doesn't add any skill to backgammon and matters in gamblegammon much less than it's
    claimed to do.

    MK

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Timothy Chow@21:1/5 to All on Wed Nov 16 09:45:13 2022
    On 11/16/2022 3:18 AM, MK wrote:

    Scenario 3: grandmaster vs random checker + grandmaster cube. ==========================================
    After 500 games: gnubg = 1462, mutant = 4 points

    My comments: very unexpected results, mutant wins even less,
    only 4 points due to luck, no cube errors, not even one beaver,
    grandmaster level cube skill is useless without checker skill.

    Scenario 4: grandmaster vs grandmaster checker + random cube. ==========================================
    After 503 games: gnubg = 2136, mutant = 873 points

    My comments: very unexpected results, wow!, no checker errors,
    lots of beavers and raccoons like in scenario 2 above, but in this
    case even with zero cube skill, mutant manages to win 30% due
    to checker skill alone and achieves and overall "Intermediate"
    level performance!

    If you found these results surprising then you have a long way to
    go before you achieve any understanding of this topic. Your results
    are completely in line with what the conventional wisdom would predict.

    ---
    Tim Chow

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Axel Reichert@21:1/5 to Timothy Chow on Thu Nov 17 09:09:41 2022
    Timothy Chow <tchow12000@yahoo.com> writes:

    If you found these results surprising then you have a long way to go
    before you achieve any understanding of this topic. Your results are completely in line with what the conventional wisdom would predict.

    I tend to agree, and I can offer more of my own results that might
    give Murat food for thought. I had completed my experiments two months
    ago, but did not find time to write up a post. But now it seems like
    the perfect moment.

    Let me introduce the main characters of my fictitious chouette. All
    are expert checker players, but have very poor understanding of cube
    skill. They are, however, able to assess who is favourite (> 50 %
    winning chances).

    - Clarence Careful: He will never double and only take if favourite.

    - Danny Dropper: He will always double if favourite, but only take if
    favourite.

    - Toby Taker: The opposite of Danny, he will never double, but always
    take.

    - William Wildcube: He will always double if favourite and always take.

    The chouette host of these mutants is Edward Equity, who is on expert
    level with respect to checker play and has world class cube
    handling. In Edward's chouette, the Jacoby rule is used, but Beavers
    are forbidden. There is no consulting. The session lasts 3000
    games. And of course there was no chouette, that is just the
    background story. Instead, I had my simple bot mimic gnubg's checker
    play and use one of the 4 mutant cubing strategies in turn against
    gnubg set to Expert checker play and World Class cube handling.

    The null hypothesis was that the respective mutant's cube strategy is
    as good as the world class cube handling by Edward/gnubg. For all four
    mutants it could be rejected with a sigma level > 4.5. Surprise,
    surprise. (-;

    After that was done, I did further experiments, generalizing the
    mutant's "strategies". Let me introduce Rory Random:

    - Never doubles if losing (non-favourite)

    - If favourite, doubles with probability d/6 (d from 0 to 6)

    - Always takes if winning (favourite)

    - If losing (non-favourite), takes with probability t/6 (t from 0 to 6)

    So Clarence Careful is a special case of Rory Random, namely d = 0 and
    t = 0. Likewise, Danny Dropper uses d = 6 and t = 0, Toby Taker uses d
    = 0 and t = 6, and William Wildcube uses d = 6 and t = 6 as a
    "strategy".

    Overall in this framework, there are 49 mutant strategies, some wild,
    some not so wild. It should be clear that the wilder mutants drive the
    cube up and thus the average game will have more points at stake than,
    say, a session with Clarence Careful. Hence it makes sense to relate
    the average loss of the mutant strategies not to the number of games,
    but rather to the number of points.

    For example, Edward Equity (= gnubg) versus William Wildcube ended
    11392 versus 8066 after 3000 games, a net win for gnubg of 3326. This
    amounts to more than 1.1 points per game, but a more meaningful number
    is (11392-8066)/(11392+8066) = 0.17 points won per points played
    (pwppp).

    Here are these results for all the mutants I tested (the former 4
    chouette characters are the "corner cases"). All could be dismissed with
    a sigma level > 2.9.

    | pwppp | Random take | |---------------+-------------+------+------+------+------+------+------|
    | Random double | 0/6 | 1/6 | 2/6 | 3/6 | 4/6 | 5/6 | 6/6 | |---------------+-------------+------+------+------+------+------+------|
    | 0/6 | 0.38 | | | 0.27 | | | 0.25 |
    | 1/6 | | 0.15 | | | | 0.16 | |
    | 2/6 | | | 0.13 | | 0.14 | | |
    | 3/6 | 0.15 | | | 0.14 | | | 0.13 |
    | 4/6 | | | 0.17 | | 0.17 | | |
    | 5/6 | | 0.16 | | | | 0.16 | |
    | 6/6 | 0.15 | | | 0.14 | | | 0.17 |

    The timid non-doubling strategies (first row with "0/6" doubling
    probability, d = 0) fare much worse than the mutants who dare to
    double at all (d > 0). These latter ones all get roughly similar
    results, around 0.15 pwppp. Now before one falsely believes that this
    shows that cube strategies do not matter and switches to random doubling/taking, you should realize that 0.15 pwppp is not "quite an achievement", but it is pretty bad. Here is a table of three different
    players (set up by using numerical noise in gnubg for checker play and
    cube handling) all achieving roughly the same pwppp of 0.15:

    | Checker noise | Cube noise | Checker rating | Cube rating | Overall | |---------------+------------+----------------+--------------+--------------|
    | 0.000 | ? (Mutant) | Expert | Awful | Intermediate |
    | 0.016 | 0.016 | Advanced | Beginner | Intermediate |
    | 0.022 | 0.000 | Intermediate | Supernatural | Intermediate |

    In my opinion the performance of the mutant cubers is not surprisingly
    good, but rather expected and not something to be proud of. Any
    ambitious backgammon player should strive for more than "Advanced" and "Beginner" as ratings (the row with 0.016 noise), which is about as
    good as the mutants.

    So the bottom line is: Much ado about nothing. An interesting study
    for me nevertheless, and it might be fun in your next live session to
    roll a dice before a cube decision. The face of your opponent might be
    worth the price you will be paying for that stunt. Which, by the way,
    has a prominent precedent in Phil Simborg:

    https://www.bkgm.com/articles/Simborg/ACoinToss/

    There were two things that puzzled me a bit, but I will address them
    in a different post.

    Best regards

    Axel

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MK@21:1/5 to Tim Chow on Fri Nov 18 22:56:45 2022
    On November 16, 2022 at 7:45:15 AM UTC-7, Tim Chow wrote:

    On 11/16/2022 3:18 AM, MK wrote:

    Scenario 3: grandmaster vs random checker + grandmaster cube.
    After 500 games: gnubg = 1462, mutant = 4 points

    Scenario 4: grandmaster vs grandmaster checker + random cube.
    After 503 games: gnubg = 2136, mutant = 873 points

    If you found these results surprising

    By "results", I'm referring to the points won/lost,
    not to my observations/comments. Is that also
    how you understood?

    then you have a long way to go before you
    achieve any understanding of this topic.

    In onder to make this statement, you must have
    at least some (i.e. more than "any") or possibly
    much understanding of "this topic".

    Would you please briefly define whay you mean
    by "this topic" first? Then help me (and perhaps
    others) achieve "any understanding" of this topic?

    Your results are completely in line with what
    the conventional wisdom would predict.

    Would you please briefly define whay you mean
    by "conventional wisdom" within this context?

    Since you find my reults completely in line, you
    must possess the whatever wisdom that would
    have predicted my results and you should be able
    use it to predict my next results.

    Before I go into this deeper, I would like to know
    how efficient are you with using that wisdom?

    If I give you the detailed settings for a pair of bot
    player levels, how long would it take you to come
    up with your prediction of their win/lose results?
    Ten minutes? Half hour?

    Considering that running a session of 500 games
    can be done in less than an hour on a decent PC,
    you can't take more time for your prediction. ;)

    So, let me know and let's play... :)

    MK

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MK@21:1/5 to Axel Reichert on Mon Nov 28 14:16:39 2022
    On November 17, 2022 at 1:09:43 AM UTC-7, Axel Reichert wrote:

    Timothy Chow <tchow...@yahoo.com> writes:

    Your results are completely in line with what the
    conventional wisdom would predict.

    I tend to agree, and I can offer more of my own
    results that might give Murat food for thought.

    The key word in Tim's comment is "predict"! The
    problem with you two is that you make generic,
    after-the-fact assertions on the results that are
    worthless without making predictions first.

    I had completed my experiments two months
    ago, but did not find time to write up a post.

    It is great that you are still doing experiments. It
    means there is still hope for you. ;) You deserve
    praise for keeping an open mind and showing the
    courage to stick your neck out by publishing your
    results.

    I wanted to wait a while before responding to you,
    in order to give other "mathematicians" and all to
    contribute, as well as to you to add more to it like
    you said you would. It's noteworthy that nobody
    else said anything good or bad. I think because
    they lack the brains and/or the guts to open their
    mouths.

    They are, however, able to assess who is favourite
    50 % winning chances).

    It has to be noted that this intruduces bias into your
    experiment by assuming that the bot's assessments
    are accurate, which I don't agree with, but we have no
    choice other than to make do with what we have now.

    ... will never double and only take if favourite.
    ... will always double if favourite, but only take if
    favourite.
    ... will never double, but always take.
    ... will always double if favourite and always take.

    You have quite a circus going with your characters :)
    but I'm not sure what will they accomplish other than
    just satisfy some curiosity.

    Jacoby rule is used, but Beavers are forbidden.

    Why Jacoby? It's stupid as it defeats the concept of
    correct cube decision. Forbidding Beavers is worse
    as it makes your "cube skill" experiment worthless,
    unless you agree to revise your "cube skill theory" to
    completely eliminate Beavers, Raccoons, etc. which
    are currently argued to require more skill than simple
    double/take decisions.

    I had my simple bot mimic gnubg's checker play
    and use one of the 4 mutant cubing strategies in
    turn against gnubg set to Expert checker play and
    World Class cube handling.

    If I could have convinced the Gnubg team to add the
    option to let users make such selections in the player
    settings, everyone (including you) could do their own
    experiments without having to create mutant bots but
    they seem afraid to do anything that could undermine
    the dogmas of the gamblegammon establishment. :(

    The null hypothesis was that the respective mutant's
    cube strategy is as good as the world class cube

    Why do you keep going to this? The original reason
    for the mutant experiment that I had proposed was
    to debunk the so-called "cube skill theory" based on
    some jackoffski formulas, etc. You accomplished
    that in your first experiment already. Why don't you
    try to refine that instead...?

    For all four mutants it could be rejected with a sigma
    level > 4.5. Surprise, surprise. (-;

    No surprise indeed. Just more pointless mathshitting.

    After that was done, I did further experiments.....

    Overall in this framework, there are 49 mutant
    strategies, some wild, some not so wild.....
    .....
    Here are these results for all the mutants.....
    All could be dismissed with a sigma level > 2.9.

    I can't believe that you have gone to such an extent
    of spending time and effort doing more mathshitting
    with no apparent benefit but you should be praised
    for doing something than nothing, as something may
    somehow come out of all this someday...

    These latter ones all get roughly similar results

    I hope you realise that this is quite telling in itself,
    regardless of the magnitude of the actual pwppp.

    It shows that it doesn't matter which of those mutant
    strategy is used against a world class cube handling.

    Now before one falsely believes that this shows that
    cube strategies do not matter

    There is no "falsely" about it. Your results prove that
    any random or random-like cube strategy is as good
    as or close to any other, without one emerging as the
    clearly superior, which would help demonstrate skill.

    you should realize that 0.15 pwppp is not "quite an
    achievement", but it is pretty bad.

    As you guys keep saying "no surprise", "bad" is indeed
    what would be expected from random or random-like
    cube strategies but the real question is: "how bad"!?

    In other words, "how does the actual results compare
    to the predicted results according to calculations per
    the "cube skill theory"!?

    Thus far, we really don't know because you have never
    made any predictions!, not even retroactive ones!!

    Here is a table of three different players (set up by
    using numerical noise in gnubg for checker play and
    cube handling) all achieving roughly the same pwppp
    of 0.15:

    Now you are comparing apples to oranges. There is
    no "strategy" in the noise inserted by Gnubg. You can't
    compare it to anything that you use the word "strategy"
    for, such as mutant cube handling in a consistent way.

    Your efforts like your above experiments are nothing
    more than desperate, helpless flappings of a fish in a
    bucket, in your trying to continue existing in denial of
    reality, (perhaps even ironically "mathematical reality").

    So the bottom line is: Much ado about nothing. An
    interesting study for me nevertheless,

    I agree. Now, what about sharing your data this time...?
    There may be more "interestings" to be found in there. ;)

    and it might be fun in your next live session to roll a
    dice before a cube decision.

    Again, the mutant experiment I had proposed was not
    about random cube decision and your first experiment
    had proven painful for you...

    I doubt mocking at your own results will ease your pain.

    has a prominent precedent in Phil Simborg: https://www.bkgm.com/articles/Simborg/ACoinToss/

    Decorating your bullshit with more shit from others is
    pathetically self-destructive. :( You can do better this.

    There were two things that puzzled me a bit, but I will
    address them in a different post.

    I will be looking out for it.

    MK

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)