Forum: >>> Magnum BBS <<<

A few tests of Isight

From pepstein5@gmail.com@21:1/5 to All on Mon Apr 18 07:07:27 2022

I have 6 checkers on my acepoint and you have 5 on your acepoint.
Will you correctly pass and will I correctly double?

Let's try with and without adjustment.
I'll try not to refer to the paper and just use my memory.
6 (my raw pip count) + 2 (my acepoint stack) = 8.
5(your pip count) + 2(your acepoint stack) = 7.
8 + 8/6 = 9 1/3. 9 1/3 - 7 is less than 5 so I correctly double.
However, this number is unfortunately more than 2 so you wrongly take.

However, this application of Isight is much too literal and unthinking.
The choice of raw pip counts of 5 and 6 is completely arbitrary and
it makes no sense to give myself 6 and you only 5. Any sensible
adjustment leads to a correct D/P.

There are two sensible ways to apply Isight here.
For all intensive purposes (deliberate error for humour),
the raw pip counts should actually be considered the same.
But both 5 and 6 are valid choices.
If we choose 5 for both, we get 7 + 7/6 - 7 = 7/6 for a correct
pass. If we choose 6 for both, we get 9 1/3 - 8 = 1 1/3 for another
correct pass.

Suppose I have 5 checkers on my ace point and one checker on my
3 point. You have 6 checkers on your ace point. Will you correctly take?
Will I correctly double.

My raw is 8, add 2 for 10. Add 1/6 for 11 2/3 Yours is 8.
Yes, the diff is 3 2/3 A correct D/T.
How about if I give you only 5 checkers. This makes the diff 4 2/3
which remains D/T territory. Isight handles this well.

As background info, let's compute the cubeless winning probability for
the roller in 6 on the acepoint vs 6 on the acepoint.
I lose if A) I roll non-doubles, you roll doubles and I roll non-doubles
or B) I roll non-doubles, you roll non-doubles, I roll non-doubles, you
roll doubles. So my losing probability is 5/6 * (5/36 + 25/216) =
5/6 * 55/216 = 275/1296 which is between 21 and 22% I think.

But what happens if the roller has only 5 on the acepoint
and one on the 3 point? This is a well-known D/T but I'm not sure
if it's also true that the roller's cubeless winning probability is lower
than 76%. It may be true.

As if this isn't all exciting and thrilling enough, if you really want
every organ in your body to be saturated with pleasure, we can even
do the adjustment with 80 and l/3 and all that, and all in the privacy
of your own home!!

So what happens if I have 6 checkers and you have 5 checkers,
with all checkers on their respective acepoints?
80 - 8/3 - 2 * 1 = 75 1/3. This is (wrongly) D/T territory.
However, I deliberately tried to fool Isight (shame on me!!) by
my nasty parity trick. It is only slightly wrong, and I'm sure Isight
is correct if I act a bit more responsibly.
(I got confused here around l and delta l so I had to cheat to refer to
the paper. Now I understand the problem and, unfairly as usual,
I'm going to blame Axel for my confusion! The problem is that we
use the raw pip count l/3 but use "l" which reminds me of "lead".
A bit too inconsistent for my tastes. Maybe p for the pip count and
l for the lead.)

What happens if I have 5 on the ace point and 1 on the 3 point
and you have 6 on the ace point?
80 - 10/3 - 4 = 72 2/3 for a correct D/T.
And my parity games with you having 5 on the ace point?
80 - 10/3 - 6 = 70 2/3.
Correct both times.

Paul

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Axel Reichert@21:1/5 to peps...@gmail.com on Mon Apr 18 17:05:23 2022

"peps...@gmail.com" <pepstein5@gmail.com> writes:

I'll try not to refer to the paper and just use my memory.

You better had. Page 28: "Add 2 pips for each checker more than 2 on
point 1." You just added 2 pips in total, not per "additional" checker.

There are two sensible ways to apply Isight here.

I my opinion, there is no sensible way to apply the Isight method
here. Acepoint stacks are about the only position where even I do not
use my own method. Because I know them by heart:

- Less than 4 rolls: D/P
- Exactly 4 rolls: R/T
- More than 4 rolls: D/T

You also know them by heart, I am sure. And probably every player who is ambitious enough to be interested in my paper.

The Isight method, even when applied correctly, will result probably in
a lot of errors for acepoint stacks. But I did not even bother to
check. It does not make sense to augment it with "ceiling" functions and
other oddities: This would be more complex than learning the three items
from above.

(I got confused here around l and delta l so I had to cheat to refer
to the paper. Now I understand the problem and, unfairly as usual,
I'm going to blame Axel for my confusion! The problem is that we use
the raw pip count l/3 but use "l" which reminds me of "lead". A bit
too inconsistent for my tastes. Maybe p for the pip count and l for
the lead.)

l is the "length" of the race. I did not use p, because that is reserved
for the probability in the

p = 80 - l/3 + 2 Delta l

formula.

Best regards

Axel

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From pepstein5@gmail.com@21:1/5 to Axel Reichert on Mon Apr 18 15:46:15 2022

On Monday, April 18, 2022 at 4:05:25 PM UTC+1, Axel Reichert wrote:

"peps...@gmail.com" <peps...@gmail.com> writes:

I'll try not to refer to the paper and just use my memory.

You better had. Page 28: "Add 2 pips for each checker more than 2 on
point 1." You just added 2 pips in total, not per "additional" checker.

There are two sensible ways to apply Isight here.

I my opinion, there is no sensible way to apply the Isight method
here. Acepoint stacks are about the only position where even I do not
use my own method. Because I know them by heart:

- Less than 4 rolls: D/P
- Exactly 4 rolls: R/T
- More than 4 rolls: D/T

I don't think so. 6 or 7 rolls each is ND/T, I think.
I meant the situation where only one player has an acepoint stack.
There you can apply two different valid conventions.
You can make the acepoint stack even or you can make it odd.

These can be difficult. For example, player A has 6 on the acepoint.
Player B has 5 on the acepoint and one on the three point.
This is R/T if B is on roll.

It's a good paper. I expected to have more solid feedback on it.
My only valuable comment is that not enough discussion (if any)
is contained on separating the in-sample results from the out-of-sample results.
But my impression is that you've investigated thoroughly with out-of-sample tests
that you haven't documented.
It doesn't make sense to use the same data to calibrate the parameters, as you use
to measure the algorithm's effectiveness. But the paper is written in such a way as
to suggest this incorrect practice.

It seems like kind of a lucky fluke that there were effective parameter choices consistent
with easy applicability.

Paul

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Axel Reichert@21:1/5 to peps...@gmail.com on Tue Apr 19 08:40:22 2022

"peps...@gmail.com" <pepstein5@gmail.com> writes:

On Monday, April 18, 2022 at 4:05:25 PM UTC+1, Axel Reichert wrote:

- Less than 4 rolls: D/P
- Exactly 4 rolls: R/T
- More than 4 rolls: D/T

I don't think so. 6 or 7 rolls each is ND/T, I think.

No, I just looked it up in Danny Kleinman's "Vision laughs at counting"
("The complete ace-point bear-off". Even 8 rolls (15 checkers each on
point 1) is D/T. A rollout confirms these results. My list above is
correct.

I meant the situation where only one player has an acepoint stack.

I understand now. I did nothing specific for these kind of positions,
but there should be a bunch of these positions in Tom's database. Hence
these cases were part of the fit and I would apply the method literally.

For example, player A has 6 on the acepoint. Player B has 5 on the
acepoint and one on the three point. This is R/T if B is on roll.

... which my method gives correctly if taken literally.

It's a good paper.

Thank you.

not enough discussion (if any) is contained on separating the
in-sample results from the out-of-sample results.

This is correct. We discussed this already, and my main reply was this
one from Message-ID: <m28s4vit0m.fsf@axel-reichert.de>:

[snip]
Thanks for your principled remarks. In theory this is of course correct
best practice, but I have some doubts that an essentially linear
"Ansatz" with 23 parameters is able to overfit a database with more than
50000 positions involving nonlinear effects.

Having said this, precisely this concern was raised shortly after I
published my paper 7 years ago. So I had GNU Backgammon generate another database of 50000 pure race positions (similar to Tom Keith's data
gathered from FIBS). The results:

1. Again, the Isight count fared best on this database compared to other
counts and combinations of adjustments and decision criteria.

2. Again, later tuning to this new database yielded exactly the same
parameter values that were published.
[snip]

And, by the way, in November 2020 you already pointed out the "optional
pass" error. I commented on this in Message-ID: <m2ft4sny3c.fsf@axel-reichert.de>.

Best regards

Axel

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From pepstein5@gmail.com@21:1/5 to Axel Reichert on Tue Apr 19 01:53:32 2022

On Tuesday, April 19, 2022 at 7:40:25 AM UTC+1, Axel Reichert wrote:

"peps...@gmail.com" <peps...@gmail.com> writes:

On Monday, April 18, 2022 at 4:05:25 PM UTC+1, Axel Reichert wrote:

- Less than 4 rolls: D/P
- Exactly 4 rolls: R/T
- More than 4 rolls: D/T

I don't think so. 6 or 7 rolls each is ND/T, I think.

No, I just looked it up in Danny Kleinman's "Vision laughs at counting"
("The complete ace-point bear-off". Even 8 rolls (15 checkers each on
point 1) is D/T. A rollout confirms these results. My list above is
correct.

I meant the situation where only one player has an acepoint stack.

I understand now. I did nothing specific for these kind of positions,
but there should be a bunch of these positions in Tom's database. Hence
these cases were part of the fit and I would apply the method literally.

For example, player A has 6 on the acepoint. Player B has 5 on the
acepoint and one on the three point. This is R/T if B is on roll.

... which my method gives correctly if taken literally.

It's a good paper.

Thank you.

not enough discussion (if any) is contained on separating the
in-sample results from the out-of-sample results.

This is correct. We discussed this already, and my main reply was this
one from Message-ID: <m28s4vi...@axel-reichert.de>:

[snip]
Thanks for your principled remarks. In theory this is of course correct
best practice, but I have some doubts that an essentially linear
"Ansatz" with 23 parameters is able to overfit a database with more than 50000 positions involving nonlinear effects.

Having said this, precisely this concern was raised shortly after I
published my paper 7 years ago. So I had GNU Backgammon generate another database of 50000 pure race positions (similar to Tom Keith's data
gathered from FIBS). The results:

1. Again, the Isight count fared best on this database compared to other counts and combinations of adjustments and decision criteria.

2. Again, later tuning to this new database yielded exactly the same parameter values that were published.
[snip]

And, by the way, in November 2020 you already pointed out the "optional
pass" error. I commented on this in Message-ID: <m2ft4sn...@axel-reichert.de>.

Thanks for the correction on n roll positions.
I remember Robertie stating the rule for 5 rolls / 4 rolls / 3 rolls.
I assumed (wrongly) that he said "5" rather than "more than 4" because the 6, 7, 8 situations were
ND/T. I don't know why he didn't say "more than 4" like you did. The result must have been known
at the time of Robertie.
One possible reason might be that the 6,7,8 situation is so marginal that anything even slightly
worse strays into ND/T making the rule misleading because a pure large acepoint stack is unusual.
Another (more likely) reason is that Robertie's books, great as they are, aren't perfect.

One observation that might make practical development of racing tools challenging is that
serious backgammon players can't mention which algorithm they use because the info could
clearly be exploited by the oppostion who would know exactly how their opponents would handle
the cube in any race.

Paul

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	296
Nodes:	16 (2 / 14)
Uptime:	63:59:59
Calls:	6,654
Files:	12,200
Messages:	5,331,761

A few tests of Isight

Who's Online

System Info