• A few tests of Isight

    From pepstein5@gmail.com@21:1/5 to All on Mon Apr 18 07:07:27 2022
    I have 6 checkers on my acepoint and you have 5 on your acepoint.
    Will you correctly pass and will I correctly double?

    Let's try with and without adjustment.
    I'll try not to refer to the paper and just use my memory.
    6 (my raw pip count) + 2 (my acepoint stack) = 8.
    5(your pip count) + 2(your acepoint stack) = 7.
    8 + 8/6 = 9 1/3. 9 1/3 - 7 is less than 5 so I correctly double.
    However, this number is unfortunately more than 2 so you wrongly take.

    However, this application of Isight is much too literal and unthinking.
    The choice of raw pip counts of 5 and 6 is completely arbitrary and
    it makes no sense to give myself 6 and you only 5. Any sensible
    adjustment leads to a correct D/P.

    There are two sensible ways to apply Isight here.
    For all intensive purposes (deliberate error for humour),
    the raw pip counts should actually be considered the same.
    But both 5 and 6 are valid choices.
    If we choose 5 for both, we get 7 + 7/6 - 7 = 7/6 for a correct
    pass. If we choose 6 for both, we get 9 1/3 - 8 = 1 1/3 for another
    correct pass.

    Suppose I have 5 checkers on my ace point and one checker on my
    3 point. You have 6 checkers on your ace point. Will you correctly take?
    Will I correctly double.

    My raw is 8, add 2 for 10. Add 1/6 for 11 2/3 Yours is 8.
    Yes, the diff is 3 2/3 A correct D/T.
    How about if I give you only 5 checkers. This makes the diff 4 2/3
    which remains D/T territory. Isight handles this well.

    As background info, let's compute the cubeless winning probability for
    the roller in 6 on the acepoint vs 6 on the acepoint.
    I lose if A) I roll non-doubles, you roll doubles and I roll non-doubles
    or B) I roll non-doubles, you roll non-doubles, I roll non-doubles, you
    roll doubles. So my losing probability is 5/6 * (5/36 + 25/216) =
    5/6 * 55/216 = 275/1296 which is between 21 and 22% I think.

    But what happens if the roller has only 5 on the acepoint
    and one on the 3 point? This is a well-known D/T but I'm not sure
    if it's also true that the roller's cubeless winning probability is lower
    than 76%. It may be true.

    As if this isn't all exciting and thrilling enough, if you really want
    every organ in your body to be saturated with pleasure, we can even
    do the adjustment with 80 and l/3 and all that, and all in the privacy
    of your own home!!

    So what happens if I have 6 checkers and you have 5 checkers,
    with all checkers on their respective acepoints?
    80 - 8/3 - 2 * 1 = 75 1/3. This is (wrongly) D/T territory.
    However, I deliberately tried to fool Isight (shame on me!!) by
    my nasty parity trick. It is only slightly wrong, and I'm sure Isight
    is correct if I act a bit more responsibly.
    (I got confused here around l and delta l so I had to cheat to refer to
    the paper. Now I understand the problem and, unfairly as usual,
    I'm going to blame Axel for my confusion! The problem is that we
    use the raw pip count l/3 but use "l" which reminds me of "lead".
    A bit too inconsistent for my tastes. Maybe p for the pip count and
    l for the lead.)

    What happens if I have 5 on the ace point and 1 on the 3 point
    and you have 6 on the ace point?
    80 - 10/3 - 4 = 72 2/3 for a correct D/T.
    And my parity games with you having 5 on the ace point?
    80 - 10/3 - 6 = 70 2/3.
    Correct both times.

    Paul

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Axel Reichert@21:1/5 to peps...@gmail.com on Mon Apr 18 17:05:23 2022
    "peps...@gmail.com" <pepstein5@gmail.com> writes:

    I'll try not to refer to the paper and just use my memory.

    You better had. Page 28: "Add 2 pips for each checker more than 2 on
    point 1." You just added 2 pips in total, not per "additional" checker.

    There are two sensible ways to apply Isight here.

    I my opinion, there is no sensible way to apply the Isight method
    here. Acepoint stacks are about the only position where even I do not
    use my own method. Because I know them by heart:

    - Less than 4 rolls: D/P
    - Exactly 4 rolls: R/T
    - More than 4 rolls: D/T

    You also know them by heart, I am sure. And probably every player who is ambitious enough to be interested in my paper.

    The Isight method, even when applied correctly, will result probably in
    a lot of errors for acepoint stacks. But I did not even bother to
    check. It does not make sense to augment it with "ceiling" functions and
    other oddities: This would be more complex than learning the three items
    from above.

    (I got confused here around l and delta l so I had to cheat to refer
    to the paper. Now I understand the problem and, unfairly as usual,
    I'm going to blame Axel for my confusion! The problem is that we use
    the raw pip count l/3 but use "l" which reminds me of "lead". A bit
    too inconsistent for my tastes. Maybe p for the pip count and l for
    the lead.)

    l is the "length" of the race. I did not use p, because that is reserved
    for the probability in the

    p = 80 - l/3 + 2 Delta l

    formula.

    Best regards

    Axel

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From pepstein5@gmail.com@21:1/5 to Axel Reichert on Mon Apr 18 15:46:15 2022
    On Monday, April 18, 2022 at 4:05:25 PM UTC+1, Axel Reichert wrote:
    "peps...@gmail.com" <peps...@gmail.com> writes:

    I'll try not to refer to the paper and just use my memory.
    You better had. Page 28: "Add 2 pips for each checker more than 2 on
    point 1." You just added 2 pips in total, not per "additional" checker.
    There are two sensible ways to apply Isight here.
    I my opinion, there is no sensible way to apply the Isight method
    here. Acepoint stacks are about the only position where even I do not
    use my own method. Because I know them by heart:

    - Less than 4 rolls: D/P
    - Exactly 4 rolls: R/T
    - More than 4 rolls: D/T

    I don't think so. 6 or 7 rolls each is ND/T, I think.
    I meant the situation where only one player has an acepoint stack.
    There you can apply two different valid conventions.
    You can make the acepoint stack even or you can make it odd.

    These can be difficult. For example, player A has 6 on the acepoint.
    Player B has 5 on the acepoint and one on the three point.
    This is R/T if B is on roll.

    It's a good paper. I expected to have more solid feedback on it.
    My only valuable comment is that not enough discussion (if any)
    is contained on separating the in-sample results from the out-of-sample results.
    But my impression is that you've investigated thoroughly with out-of-sample tests
    that you haven't documented.
    It doesn't make sense to use the same data to calibrate the parameters, as you use
    to measure the algorithm's effectiveness. But the paper is written in such a way as
    to suggest this incorrect practice.

    It seems like kind of a lucky fluke that there were effective parameter choices consistent
    with easy applicability.

    Paul

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Axel Reichert@21:1/5 to peps...@gmail.com on Tue Apr 19 08:40:22 2022
    "peps...@gmail.com" <pepstein5@gmail.com> writes:

    On Monday, April 18, 2022 at 4:05:25 PM UTC+1, Axel Reichert wrote:

    - Less than 4 rolls: D/P
    - Exactly 4 rolls: R/T
    - More than 4 rolls: D/T

    I don't think so. 6 or 7 rolls each is ND/T, I think.

    No, I just looked it up in Danny Kleinman's "Vision laughs at counting"
    ("The complete ace-point bear-off". Even 8 rolls (15 checkers each on
    point 1) is D/T. A rollout confirms these results. My list above is
    correct.

    I meant the situation where only one player has an acepoint stack.

    I understand now. I did nothing specific for these kind of positions,
    but there should be a bunch of these positions in Tom's database. Hence
    these cases were part of the fit and I would apply the method literally.

    For example, player A has 6 on the acepoint. Player B has 5 on the
    acepoint and one on the three point. This is R/T if B is on roll.

    ... which my method gives correctly if taken literally.

    It's a good paper.

    Thank you.

    not enough discussion (if any) is contained on separating the
    in-sample results from the out-of-sample results.

    This is correct. We discussed this already, and my main reply was this
    one from Message-ID: <m28s4vit0m.fsf@axel-reichert.de>:

    [snip]
    Thanks for your principled remarks. In theory this is of course correct
    best practice, but I have some doubts that an essentially linear
    "Ansatz" with 23 parameters is able to overfit a database with more than
    50000 positions involving nonlinear effects.

    Having said this, precisely this concern was raised shortly after I
    published my paper 7 years ago. So I had GNU Backgammon generate another database of 50000 pure race positions (similar to Tom Keith's data
    gathered from FIBS). The results:

    1. Again, the Isight count fared best on this database compared to other
    counts and combinations of adjustments and decision criteria.

    2. Again, later tuning to this new database yielded exactly the same
    parameter values that were published.
    [snip]

    And, by the way, in November 2020 you already pointed out the "optional
    pass" error. I commented on this in Message-ID: <m2ft4sny3c.fsf@axel-reichert.de>.

    Best regards

    Axel

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From pepstein5@gmail.com@21:1/5 to Axel Reichert on Tue Apr 19 01:53:32 2022
    On Tuesday, April 19, 2022 at 7:40:25 AM UTC+1, Axel Reichert wrote:
    "peps...@gmail.com" <peps...@gmail.com> writes:

    On Monday, April 18, 2022 at 4:05:25 PM UTC+1, Axel Reichert wrote:

    - Less than 4 rolls: D/P
    - Exactly 4 rolls: R/T
    - More than 4 rolls: D/T

    I don't think so. 6 or 7 rolls each is ND/T, I think.
    No, I just looked it up in Danny Kleinman's "Vision laughs at counting"
    ("The complete ace-point bear-off". Even 8 rolls (15 checkers each on
    point 1) is D/T. A rollout confirms these results. My list above is
    correct.
    I meant the situation where only one player has an acepoint stack.
    I understand now. I did nothing specific for these kind of positions,
    but there should be a bunch of these positions in Tom's database. Hence
    these cases were part of the fit and I would apply the method literally.
    For example, player A has 6 on the acepoint. Player B has 5 on the
    acepoint and one on the three point. This is R/T if B is on roll.
    ... which my method gives correctly if taken literally.
    It's a good paper.
    Thank you.
    not enough discussion (if any) is contained on separating the
    in-sample results from the out-of-sample results.
    This is correct. We discussed this already, and my main reply was this
    one from Message-ID: <m28s4vi...@axel-reichert.de>:

    [snip]
    Thanks for your principled remarks. In theory this is of course correct
    best practice, but I have some doubts that an essentially linear
    "Ansatz" with 23 parameters is able to overfit a database with more than 50000 positions involving nonlinear effects.

    Having said this, precisely this concern was raised shortly after I
    published my paper 7 years ago. So I had GNU Backgammon generate another database of 50000 pure race positions (similar to Tom Keith's data
    gathered from FIBS). The results:

    1. Again, the Isight count fared best on this database compared to other counts and combinations of adjustments and decision criteria.

    2. Again, later tuning to this new database yielded exactly the same parameter values that were published.
    [snip]

    And, by the way, in November 2020 you already pointed out the "optional
    pass" error. I commented on this in Message-ID: <m2ft4sn...@axel-reichert.de>.


    Thanks for the correction on n roll positions.
    I remember Robertie stating the rule for 5 rolls / 4 rolls / 3 rolls.
    I assumed (wrongly) that he said "5" rather than "more than 4" because the 6, 7, 8 situations were
    ND/T. I don't know why he didn't say "more than 4" like you did. The result must have been known
    at the time of Robertie.
    One possible reason might be that the 6,7,8 situation is so marginal that anything even slightly
    worse strays into ND/T making the rule misleading because a pure large acepoint stack is unusual.
    Another (more likely) reason is that Robertie's books, great as they are, aren't perfect.

    One observation that might make practical development of racing tools challenging is that
    serious backgammon players can't mention which algorithm they use because the info could
    clearly be exploited by the oppostion who would know exactly how their opponents would handle
    the cube in any race.

    Paul

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)