Forum: >>> Magnum BBS <<<

XG's predilection for blotty boards

From Timothy Chow@21:1/5 to All on Tue Oct 25 21:26:01 2022

In the position below, I played 8/4 8/2, and was surprised that XGR+
said that my play was 0.057 worse than its play of 8/2 5/1. (Story
continues below.)

XGID=---AABEBBb---B-a-abbcc--a-:0:0:1:64:0:0:0:0:10

Score is X:0 O:0. Unlimited Game
+13-14-15-16-17-18------19-20-21-22-23-24-+
| X O O O | | O O O O |
| X O | | O O O |
| | | O O |
| | | |
| | | |
| |BAR| |
| | | X |
| | | X |
| | | X |
| O X X | | X X |
| O X X | | X X X X |
+12-11-10--9--8--7-------6--5--4--3--2--1-+
Pip count X: 103 O: 104 X-O: 0-0
Cube: 1
X to play 64

1. XG Roller+ 8/2 5/1 eq:+0.391
Player: 62.83% (G:0.67% B:0.00%)
Opponent: 37.17% (G:0.28% B:0.01%)

2. XG Roller+ 7/1 6/2 eq:+0.340 (-0.052)
Player: 61.98% (G:0.27% B:0.04%)
Opponent: 38.02% (G:0.22% B:0.02%)

3. XG Roller+ 8/4 8/2 eq:+0.334 (-0.057)
Player: 62.05% (G:0.33% B:0.01%)
Opponent: 37.95% (G:0.28% B:0.00%)

4. XG Roller+ 8/2 7/3 eq:+0.333 (-0.058)
Player: 61.95% (G:0.38% B:0.01%)
Opponent: 38.05% (G:0.30% B:0.00%)

5. XG Roller+ 8/4 7/1 eq:+0.332 (-0.059)
Player: 61.93% (G:0.33% B:0.01%)
Opponent: 38.07% (G:0.34% B:0.01%)

6. XG Roller+ 8/2 6/2 eq:+0.331 (-0.060)
Player: 61.79% (G:0.32% B:0.01%)
Opponent: 38.21% (G:0.22% B:0.00%)

7. XG Roller+ 7/3 7/1 eq:+0.327 (-0.064)
Player: 61.61% (G:0.48% B:0.01%)
Opponent: 38.39% (G:0.36% B:0.00%)

8. XG Roller+ 7/1 5/1 eq:+0.308 (-0.084)
Player: 60.96% (G:0.31% B:0.01%)
Opponent: 39.04% (G:0.24% B:0.00%)

eXtreme Gammon Version: 2.19.211.pre-release

I tried appealing to XGR++, and it narrowed the gap between the two
plays, but still insisted that creating five blots in its board was
the best play.

XGID=---AABEBBb---B-a-abbcc--a-:0:0:1:64:0:0:0:0:10

Score is X:0 O:0. Unlimited Game
+13-14-15-16-17-18------19-20-21-22-23-24-+
| X O O O | | O O O O |
| X O | | O O O |
| | | O O |
| | | |
| | | |
| |BAR| |
| | | X |
| | | X |
| | | X |
| O X X | | X X |
| O X X | | X X X X |
+12-11-10--9--8--7-------6--5--4--3--2--1-+
Pip count X: 103 O: 104 X-O: 0-0
Cube: 1
X to play 64

1. XG Roller++ 8/2 5/1 eq:+0.375
Player: 62.31% (G:0.51% B:0.01%)
Opponent: 37.69% (G:0.33% B:0.01%)

2. XG Roller++ 8/4 8/2 eq:+0.351 (-0.024)
Player: 62.02% (G:0.39% B:0.00%)
Opponent: 37.98% (G:0.34% B:0.00%)

3. XG Roller++ 7/1 6/2 eq:+0.343 (-0.031)
Player: 61.82% (G:0.29% B:0.00%)
Opponent: 38.18% (G:0.29% B:0.00%)

4. XG Roller++ 8/4 7/1 eq:+0.342 (-0.033)
Player: 61.71% (G:0.35% B:0.00%)
Opponent: 38.29% (G:0.36% B:0.00%)

5. XG Roller++ 7/3 7/1 eq:+0.341 (-0.034)
Player: 61.61% (G:0.38% B:0.01%)
Opponent: 38.39% (G:0.29% B:0.01%)

6. XG Roller++ 8/2 6/2 eq:+0.338 (-0.037)
Player: 61.61% (G:0.27% B:0.01%)
Opponent: 38.39% (G:0.23% B:0.00%)

7. XG Roller++ 8/2 7/3 eq:+0.337 (-0.038)
Player: 61.58% (G:0.28% B:0.00%)
Opponent: 38.42% (G:0.28% B:0.00%)

eXtreme Gammon Version: 2.19.211.pre-release

Finally, I decided to do a rollout, with stronger parameters than
usual. I was pleased to see that sanity was restored. But this
position illustrates what I believe is a systematic weakness in XG,
which is that it doesn't evaluate blotty boards very well. See also
this old BGOnline post:

http://timothychow.net/cg/www.bgonline.org/forums/164769.html

XGID=---AABEBBb---B-a-abbcc--a-:0:0:1:64:0:0:0:0:10

Score is X:0 O:0. Unlimited Game
+13-14-15-16-17-18------19-20-21-22-23-24-+
| X O O O | | O O O O |
| X O | | O O O |
| | | O O |
| | | |
| | | |
| |BAR| |
| | | X |
| | | X |
| | | X |
| O X X | | X X |
| O X X | | X X X X |
+12-11-10--9--8--7-------6--5--4--3--2--1-+
Pip count X: 103 O: 104 X-O: 0-0
Cube: 1
X to play 64

1. Rollout¹ 8/4 8/2 eq:+0.343
Player: 61.74% (G:0.31% B:0.01%)
Opponent: 38.26% (G:0.27% B:0.01%)
Confidence: ±0.008 (+0.335..+0.352) - [41.7%]

2. Rollout¹ 8/4 7/1 eq:+0.342 (-0.002)
Player: 61.70% (G:0.30% B:0.01%)
Opponent: 38.30% (G:0.28% B:0.01%)
Confidence: ±0.008 (+0.334..+0.350) - [24.7%]

3. Rollout¹ 7/3 7/1 eq:+0.342 (-0.002)
Player: 61.38% (G:0.49% B:0.04%)
Opponent: 38.62% (G:0.37% B:0.01%)
Confidence: ±0.008 (+0.334..+0.349) - [23.1%]

4. Rollout¹ 7/1 6/2 eq:+0.338 (-0.006)
Player: 61.54% (G:0.32% B:0.01%)
Opponent: 38.46% (G:0.31% B:0.01%)
Confidence: ±0.008 (+0.330..+0.346) - [5.2%]

5. Rollout¹ 8/2 6/2 eq:+0.338 (-0.006)
Player: 61.49% (G:0.26% B:0.01%)
Opponent: 38.51% (G:0.22% B:0.01%)
Confidence: ±0.008 (+0.330..+0.346) - [4.8%]

6. Rollout¹ 8/2 7/3 eq:+0.334 (-0.010)
Player: 61.47% (G:0.29% B:0.01%)
Opponent: 38.53% (G:0.30% B:0.01%)
Confidence: ±0.007 (+0.327..+0.341) - [0.4%]

7. Rollout¹ 8/2 5/1 eq:+0.326 (-0.018)
Player: 61.13% (G:0.44% B:0.02%)
Opponent: 38.87% (G:0.48% B:0.02%)
Confidence: ±0.008 (+0.318..+0.334) - [0.0%]

¹ 1296 Games rolled with Variance Reduction.
Dice Seed: 271828
Moves and cube decisions: XG Roller+
Search interval: Large

eXtreme Gammon Version: 2.19.211.pre-release

---
Tim Chow

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From pepstein5@gmail.com@21:1/5 to Tim Chow on Wed Oct 26 00:52:24 2022

On Wednesday, October 26, 2022 at 2:26:04 AM UTC+1, Tim Chow wrote:

In the position below, I played 8/4 8/2, and was surprised that XGR+
said that my play was 0.057 worse than its play of 8/2 5/1. (Story
continues below.)

XGID=---AABEBBb---B-a-abbcc--a-:0:0:1:64:0:0:0:0:10

Score is X:0 O:0. Unlimited Game
+13-14-15-16-17-18------19-20-21-22-23-24-+
| X O O O | | O O O O |
| X O | | O O O |
| | | O O |
| | | |
| | | |
| |BAR| |
| | | X |
| | | X |
| | | X |
| O X X | | X X |
| O X X | | X X X X |
+12-11-10--9--8--7-------6--5--4--3--2--1-+
Pip count X: 103 O: 104 X-O: 0-0
Cube: 1
X to play 64

1. XG Roller+ 8/2 5/1 eq:+0.391
Player: 62.83% (G:0.67% B:0.00%)
Opponent: 37.17% (G:0.28% B:0.01%)

2. XG Roller+ 7/1 6/2 eq:+0.340 (-0.052)
Player: 61.98% (G:0.27% B:0.04%)
Opponent: 38.02% (G:0.22% B:0.02%)

3. XG Roller+ 8/4 8/2 eq:+0.334 (-0.057)
Player: 62.05% (G:0.33% B:0.01%)
Opponent: 37.95% (G:0.28% B:0.00%)

4. XG Roller+ 8/2 7/3 eq:+0.333 (-0.058)
Player: 61.95% (G:0.38% B:0.01%)
Opponent: 38.05% (G:0.30% B:0.00%)

5. XG Roller+ 8/4 7/1 eq:+0.332 (-0.059)
Player: 61.93% (G:0.33% B:0.01%)
Opponent: 38.07% (G:0.34% B:0.01%)

6. XG Roller+ 8/2 6/2 eq:+0.331 (-0.060)
Player: 61.79% (G:0.32% B:0.01%)
Opponent: 38.21% (G:0.22% B:0.00%)

7. XG Roller+ 7/3 7/1 eq:+0.327 (-0.064)
Player: 61.61% (G:0.48% B:0.01%)
Opponent: 38.39% (G:0.36% B:0.00%)

8. XG Roller+ 7/1 5/1 eq:+0.308 (-0.084)
Player: 60.96% (G:0.31% B:0.01%)
Opponent: 39.04% (G:0.24% B:0.00%)

eXtreme Gammon Version: 2.19.211.pre-release

I tried appealing to XGR++, and it narrowed the gap between the two
plays, but still insisted that creating five blots in its board was
the best play.

XGID=---AABEBBb---B-a-abbcc--a-:0:0:1:64:0:0:0:0:10

Score is X:0 O:0. Unlimited Game
+13-14-15-16-17-18------19-20-21-22-23-24-+
| X O O O | | O O O O |
| X O | | O O O |
| | | O O |
| | | |
| | | |
| |BAR| |
| | | X |
| | | X |
| | | X |
| O X X | | X X |
| O X X | | X X X X |
+12-11-10--9--8--7-------6--5--4--3--2--1-+
Pip count X: 103 O: 104 X-O: 0-0
Cube: 1
X to play 64

1. XG Roller++ 8/2 5/1 eq:+0.375
Player: 62.31% (G:0.51% B:0.01%)
Opponent: 37.69% (G:0.33% B:0.01%)

2. XG Roller++ 8/4 8/2 eq:+0.351 (-0.024)
Player: 62.02% (G:0.39% B:0.00%)
Opponent: 37.98% (G:0.34% B:0.00%)

3. XG Roller++ 7/1 6/2 eq:+0.343 (-0.031)
Player: 61.82% (G:0.29% B:0.00%)
Opponent: 38.18% (G:0.29% B:0.00%)

4. XG Roller++ 8/4 7/1 eq:+0.342 (-0.033)
Player: 61.71% (G:0.35% B:0.00%)
Opponent: 38.29% (G:0.36% B:0.00%)

5. XG Roller++ 7/3 7/1 eq:+0.341 (-0.034)
Player: 61.61% (G:0.38% B:0.01%)
Opponent: 38.39% (G:0.29% B:0.01%)

6. XG Roller++ 8/2 6/2 eq:+0.338 (-0.037)
Player: 61.61% (G:0.27% B:0.01%)
Opponent: 38.39% (G:0.23% B:0.00%)

7. XG Roller++ 8/2 7/3 eq:+0.337 (-0.038)
Player: 61.58% (G:0.28% B:0.00%)
Opponent: 38.42% (G:0.28% B:0.00%)

eXtreme Gammon Version: 2.19.211.pre-release

Finally, I decided to do a rollout, with stronger parameters than
usual. I was pleased to see that sanity was restored. But this
position illustrates what I believe is a systematic weakness in XG,
which is that it doesn't evaluate blotty boards very well. See also
this old BGOnline post:

http://timothychow.net/cg/www.bgonline.org/forums/164769.html

XGID=---AABEBBb---B-a-abbcc--a-:0:0:1:64:0:0:0:0:10

Score is X:0 O:0. Unlimited Game
+13-14-15-16-17-18------19-20-21-22-23-24-+
| X O O O | | O O O O |
| X O | | O O O |
| | | O O |
| | | |
| | | |
| |BAR| |
| | | X |
| | | X |
| | | X |
| O X X | | X X |
| O X X | | X X X X |
+12-11-10--9--8--7-------6--5--4--3--2--1-+
Pip count X: 103 O: 104 X-O: 0-0
Cube: 1
X to play 64

1. Rollout¹ 8/4 8/2 eq:+0.343
Player: 61.74% (G:0.31% B:0.01%)
Opponent: 38.26% (G:0.27% B:0.01%)
Confidence: ±0.008 (+0.335..+0.352) - [41.7%]

2. Rollout¹ 8/4 7/1 eq:+0.342 (-0.002)
Player: 61.70% (G:0.30% B:0.01%)
Opponent: 38.30% (G:0.28% B:0.01%)
Confidence: ±0.008 (+0.334..+0.350) - [24.7%]

3. Rollout¹ 7/3 7/1 eq:+0.342 (-0.002)
Player: 61.38% (G:0.49% B:0.04%)
Opponent: 38.62% (G:0.37% B:0.01%)
Confidence: ±0.008 (+0.334..+0.349) - [23.1%]

4. Rollout¹ 7/1 6/2 eq:+0.338 (-0.006)
Player: 61.54% (G:0.32% B:0.01%)
Opponent: 38.46% (G:0.31% B:0.01%)
Confidence: ±0.008 (+0.330..+0.346) - [5.2%]

5. Rollout¹ 8/2 6/2 eq:+0.338 (-0.006)
Player: 61.49% (G:0.26% B:0.01%)
Opponent: 38.51% (G:0.22% B:0.01%)
Confidence: ±0.008 (+0.330..+0.346) - [4.8%]

6. Rollout¹ 8/2 7/3 eq:+0.334 (-0.010)
Player: 61.47% (G:0.29% B:0.01%)
Opponent: 38.53% (G:0.30% B:0.01%)
Confidence: ±0.007 (+0.327..+0.341) - [0.4%]

7. Rollout¹ 8/2 5/1 eq:+0.326 (-0.018)
Player: 61.13% (G:0.44% B:0.02%)
Opponent: 38.87% (G:0.48% B:0.02%)
Confidence: ±0.008 (+0.318..+0.334) - [0.0%]

¹ 1296 Games rolled with Variance Reduction.
Dice Seed: 271828
Moves and cube decisions: XG Roller+
Search interval: Large

I strongly suspect flawed thinking, on your part.
I disagree that you've found evidence of a weakness.
For example, it's perfectly possible that the rollout
was somehow biased against the blotty play and that
XG's original play only loses 0.01 instead of 0.018.

But let's assume that the rollout is correct, and that XG's play does indeed lose 0.018 equity. How bad is this?
Well, suppose that you were playing O against a world-class human X.
And suppose that you were able to pay X to make the play of 8/2 5/1 and you were able to negotiate a price for this. The price would normally be much more than just 0.018.
XG is simply trying to maximise the equity, and the above thought experiment (if correct)
shows that XG understands the play much better than most humans do, by being wrong about
the blotty play by only 0.018.

Your fallacy is to mark out the zero-equity level as being particularly significant.
The difference betwee losing zero equity (by the optimal play) and losing 0.018 equity
is no more significant than the difference between losing 0.1 equity and 0.118 equity.

Suppose there was some position which everyone (humans and bots) systematically got wrong.
However, some got it wrong by 0.1 and some got it wrong by 0.118.
Would you make a big deal out of this discrepancy between the 0.1 errors and the 0.118 errors?
I bet you wouldn't.

So your thinking seems flawed and inconsistent to me.

Paul

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Timothy Chow@21:1/5 to peps...@gmail.com on Wed Oct 26 08:26:30 2022

On 10/26/2022 3:52 AM, peps...@gmail.com wrote:

I disagree that you've found evidence of a weakness.

This is just one example out of many similar examples I've
encountered over the years. When neither player is likely to
leave a blot in the next couple of rolls, XG frequently makes
all kind of nutty plays, making a complete mess of its board
for no good reason.

But let's assume that the rollout is correct, and that XG's play does indeed lose 0.018 equity. How bad is this?

The issue isn't that XG's play loses 0.018 equity. The issue
is that when we pass from XGR+ to a rollout, there's a swing from
-0.057 to +0.018, showing that XG doesn't understand the position
very well. This would be true even if XGR+ is "right" and the rollout
is "wrong." If Alice says yes and Bob says no, at most one of them
can be right, and if they disagree strongly then at least one of them
is misinformed.

---
Tim Chow

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stick Rice@21:1/5 to Tim Chow on Wed Oct 26 12:46:31 2022

On Wednesday, October 26, 2022 at 8:26:33 AM UTC-4, Tim Chow wrote:

On 10/26/2022 3:52 AM, peps...@gmail.com wrote:

I disagree that you've found evidence of a weakness.

This is just one example out of many similar examples I've
encountered over the years. When neither player is likely to
leave a blot in the next couple of rolls, XG frequently makes
all kind of nutty plays, making a complete mess of its board
for no good reason.

But let's assume that the rollout is correct, and that XG's play does indeed
lose 0.018 equity. How bad is this?

The issue isn't that XG's play loses 0.018 equity. The issue
is that when we pass from XGR+ to a rollout, there's a swing from
-0.057 to +0.018, showing that XG doesn't understand the position
very well. This would be true even if XGR+ is "right" and the rollout
is "wrong." If Alice says yes and Bob says no, at most one of them
can be right, and if they disagree strongly then at least one of them
is misinformed.

---
Tim Chow

But there is good reason to make a mess of our board here. By far the most likely scenario is for this game to turn into a race. (which we're currently leading semi comfortably) So...I make a racing driven play. It's a fine line in maintaining some
timing so we are able to clear the midpoint without leaving a shot and distributing perfectly for the race bear in/bear off but that's why I'd have played 7/1 6/2 OtB and maintain it's best.

Stick

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From pepstein5@gmail.com@21:1/5 to Tim Chow on Wed Oct 26 14:07:44 2022

On Wednesday, October 26, 2022 at 1:26:33 PM UTC+1, Tim Chow wrote:

On 10/26/2022 3:52 AM, peps...@gmail.com wrote:

I disagree that you've found evidence of a weakness.

This is just one example out of many similar examples I've
encountered over the years. When neither player is likely to
leave a blot in the next couple of rolls, XG frequently makes
all kind of nutty plays, making a complete mess of its board
for no good reason.

But let's assume that the rollout is correct, and that XG's play does indeed
lose 0.018 equity. How bad is this?

The issue isn't that XG's play loses 0.018 equity. The issue
is that when we pass from XGR+ to a rollout, there's a swing from
-0.057 to +0.018, showing that XG doesn't understand the position
very well. This would be true even if XGR+ is "right" and the rollout
is "wrong." If Alice says yes and Bob says no, at most one of them
can be right, and if they disagree strongly then at least one of them
is misinformed.

Assuming Alice and Bob are in a relationship, it's quite likely that when one of them says "yes", the other always says "no" (and vice versa) on principle, regardless of what they really think.

Paul

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Timothy Chow@21:1/5 to Stick Rice on Wed Oct 26 23:44:37 2022

On 10/26/2022 3:46 PM, Stick Rice wrote:

But there is good reason to make a mess of our board here. By far the most likely scenario is for this game to turn into a race. (which we're currently leading semi comfortably) So...I make a racing driven play. It's a fine line in maintaining some

timing so we are able to clear the midpoint without leaving a shot and distributing perfectly for the race bear in/bear off but that's why I'd have played 7/1 6/2 OtB and maintain it's best.

How is dumping checkers on low points good for the race?
Generally speaking, we should be trying to avoid wastage,
and not worry about gaps on the 1pt and 2pt. Suppose we
remove the four checkers in the outfield that are creating
contact. Admittedly the resulting position is artificial,
but it illustrates the point that 7/1 6/2 doesn't seem to
be good for the race.

XGID=---AABEBB------a-abbcc--a-:0:0:1:64:0:0:0:0:10

Score is X:0 O:0. Unlimited Game
+13-14-15-16-17-18------19-20-21-22-23-24-+
| O O O | | O O O O |
| O | | O O O |
| | | O O |
| | | |
| | | |
| |BAR| |
| | | X |
| | | X |
| | | X |
| X X | | X X |
| X X | | X X X X |
+12-11-10--9--8--7-------6--5--4--3--2--1-+
Pip count X: 77 O: 72 X-O: 0-0
Cube: 1
X to play 64

1. Rollout¹ 8/4 7/1 eq:+0.095
Player: 53.26% (G:0.00% B:0.00%)
Opponent: 46.74% (G:0.00% B:0.00%)
Confidence: ±0.004 (+0.091..+0.098) - [75.0%]

2. Rollout¹ 8/4 8/2 eq:+0.093 (-0.002)
Player: 53.22% (G:0.00% B:0.00%)
Opponent: 46.78% (G:0.00% B:0.00%)
Confidence: ±0.003 (+0.089..+0.096) - [25.0%]

3. Rollout¹ 8/2 7/3 eq:+0.080 (-0.015)
Player: 52.76% (G:0.00% B:0.00%)
Opponent: 47.24% (G:0.00% B:0.00%)
Confidence: ±0.004 (+0.076..+0.083) - [0.0%]

4. Rollout¹ 7/3 7/1 eq:+0.071 (-0.023)
Player: 52.37% (G:0.00% B:0.00%)
Opponent: 47.63% (G:0.00% B:0.00%)
Confidence: ±0.004 (+0.068..+0.075) - [0.0%]

5. Rollout¹ 7/1 6/2 eq:+0.064 (-0.030)
Player: 52.11% (G:0.00% B:0.00%)
Opponent: 47.89% (G:0.00% B:0.00%)
Confidence: ±0.004 (+0.061..+0.068) - [0.0%]

¹ 1296 Games rolled with Variance Reduction.
Dice Seed: 271828
Moves: 3-ply, cube decisions: XG Roller

eXtreme Gammon Version: 2.19.211.pre-release

---
Tim Chow

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stick Rice@21:1/5 to Tim Chow on Thu Oct 27 06:53:28 2022

On Wednesday, October 26, 2022 at 11:44:39 PM UTC-4, Tim Chow wrote:

On 10/26/2022 3:46 PM, Stick Rice wrote:

But there is good reason to make a mess of our board here. By far the most likely scenario is for this game to turn into a race. (which we're currently leading semi comfortably) So...I make a racing driven play. It's a fine line in maintaining some

timing so we are able to clear the midpoint without leaving a shot and distributing perfectly for the race bear in/bear off but that's why I'd have played 7/1 6/2 OtB and maintain it's best.

How is dumping checkers on low points good for the race?
Generally speaking, we should be trying to avoid wastage,
and not worry about gaps on the 1pt and 2pt. Suppose we
remove the four checkers in the outfield that are creating
contact. Admittedly the resulting position is artificial,
but it illustrates the point that 7/1 6/2 doesn't seem to
be good for the race.

XGID=---AABEBB------a-abbcc--a-:0:0:1:64:0:0:0:0:10
Score is X:0 O:0. Unlimited Game
+13-14-15-16-17-18------19-20-21-22-23-24-+
| O O O | | O O O O |
| O | | O O O |
| | | O O |
| | | |
| | | |
| |BAR| |
| | | X |
| | | X |
| | | X |
| X X | | X X |
| X X | | X X X X |
+12-11-10--9--8--7-------6--5--4--3--2--1-+
Pip count X: 77 O: 72 X-O: 0-0
Cube: 1
X to play 64
1. Rollout¹ 8/4 7/1 eq:+0.095
Player: 53.26% (G:0.00% B:0.00%)
Opponent: 46.74% (G:0.00% B:0.00%)
Confidence: ±0.004 (+0.091..+0.098) - [75.0%]

2. Rollout¹ 8/4 8/2 eq:+0.093 (-0.002)
Player: 53.22% (G:0.00% B:0.00%)
Opponent: 46.78% (G:0.00% B:0.00%)
Confidence: ±0.003 (+0.089..+0.096) - [25.0%]

3. Rollout¹ 8/2 7/3 eq:+0.080 (-0.015)
Player: 52.76% (G:0.00% B:0.00%)
Opponent: 47.24% (G:0.00% B:0.00%)
Confidence: ±0.004 (+0.076..+0.083) - [0.0%]

4. Rollout¹ 7/3 7/1 eq:+0.071 (-0.023)
Player: 52.37% (G:0.00% B:0.00%)
Opponent: 47.63% (G:0.00% B:0.00%)
Confidence: ±0.004 (+0.068..+0.075) - [0.0%]

5. Rollout¹ 7/1 6/2 eq:+0.064 (-0.030)
Player: 52.11% (G:0.00% B:0.00%)
Opponent: 47.89% (G:0.00% B:0.00%)
Confidence: ±0.004 (+0.061..+0.068) - [0.0%]
¹ 1296 Games rolled with Variance Reduction.
Dice Seed: 271828
Moves: 3-ply, cube decisions: XG Roller
eXtreme Gammon Version: 2.19.211.pre-release

---
Tim Chow

As I said, it's a fine line distributing for the race and keeping some timing so we are able to clear the midpoint without leaving a shot. Putting one checker on a lower point does no real harm race wise.

Stick

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Timothy Chow@21:1/5 to Stick Rice on Sun Oct 30 17:41:23 2022

On 10/27/2022 9:53 AM, Stick Rice wrote:

As I said, it's a fine line distributing for the race and keeping some timing so we are able to clear the midpoint without leaving a shot. Putting one checker on a lower point does no real harm race wise.

Okay. At least it seems we agree that there's no good reason
for XGR+ to insist that 5/1 is best by a clear margin.

---
Tim Chow

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MK@21:1/5 to Tim Chow on Tue Nov 1 14:54:34 2022

On October 30, 2022 at 3:41:25 PM UTC-6, Tim Chow wrote:

On 10/27/2022 9:53 AM, Stick Rice wrote:

As I said, it's a fine line distributing for the race.....

At least it seems we agree that there's no good reason
for XGR+ to insist that 5/1 is best by a clear margin.

I'm glad to see comments like this but also sad to see
that examples like this don't really do you any lasting
good or get you anywhere because you all ignore the
implications of your own acknowledgements.

After finding so many cases like this, how can you be
sure that there aren't many thousands more of them
that you haven't encountered or recognized yet? How
many straws does it take to break the camel's back??

MK

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MK@21:1/5 to Tim Chow on Tue Nov 1 14:42:03 2022

On October 26, 2022 at 6:26:33 AM UTC-6, Tim Chow wrote:

The issue isn't that XG's play loses 0.018 equity.
The issue is that when we pass from XGR+ to a
rollout, there's a swing from -0.057 to +0.018...

This is a very interesting example. It's not a case
where the top two or there plays trade places but
the rankings of all plays scramble all over the place
in XGR+ XGR++ and rollout.

In addition to your correctly making the important
point that 8/4 8/2 goes from -0.057 to +0.018, the
"raw equity" goes from +0.334 in XGR+ to +0.351 in
XGR++ and back to +0.343 in rollout, while the top
play 8/2 5/1 goes even more drastically from +0.391
in XGR+ to +0.375 in XGR++ and then further down
to +0.326 in rollout, i.e. -0.065 difference accross the
three evaluations.

This would be true even if XGR+ is "right" and the
rollout is "wrong."

True indeed and this comment adds to the credibility
of your objectivity on the subject.

If Alice says yes and Bob says no, at most one of
them can be right, and if they disagree strongly
then at least one of them is misinformed.

Then how would you decide which one is right?

Let me be asking this question first by renaming
Alice and Bob as Gnubg and XG?

Second and more importantly, in your example it's
not two people (or bots) contradicting each other.
It's the same bot XG contradicting itself. I'm sure
the same would be true for Gnubg also.

Thus the question becomes "how do you *trust*"
that XG or Gnubg is right in any given evaluation?

Would you go by a ratio? And if so, what would be
your treshold? Would it be enough for you if the bot
was 10% right? 20%? 30%?

An that, of course, assuming that you can decide if
XGR+ or XGR++ or XG-rollout is "right"...

I'm just wondering what would it take for you folks
to some day say enough is enough, these bots are
just unpredictable, unreliable pieces of shit...??

MK

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Simon Woodhead@21:1/5 to All on Wed Nov 2 08:08:44 2022

On 2/11/2022 7:42 am, MK wrote:

I'm just wondering what would it take for you folks
to some day say enough is enough, these bots are
just unpredictable, unreliable pieces of shit...??

It would take a better bot.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From pepstein5@gmail.com@21:1/5 to All on Tue Nov 1 15:49:56 2022

On Tuesday, November 1, 2022 at 9:42:04 PM UTC, MK wrote:

On October 26, 2022 at 6:26:33 AM UTC-6, Tim Chow wrote:

The issue isn't that XG's play loses 0.018 equity.
The issue is that when we pass from XGR+ to a
rollout, there's a swing from -0.057 to +0.018...

This is a very interesting example. It's not a case
where the top two or there plays trade places but
the rankings of all plays scramble all over the place
in XGR+ XGR++ and rollout.

In addition to your correctly making the important
point that 8/4 8/2 goes from -0.057 to +0.018, the
"raw equity" goes from +0.334 in XGR+ to +0.351 in
XGR++ and back to +0.343 in rollout, while the top
play 8/2 5/1 goes even more drastically from +0.391
in XGR+ to +0.375 in XGR++ and then further down
to +0.326 in rollout, i.e. -0.065 difference accross the
three evaluations.

This would be true even if XGR+ is "right" and the
rollout is "wrong."

True indeed and this comment adds to the credibility
of your objectivity on the subject.

If Alice says yes and Bob says no, at most one of
them can be right, and if they disagree strongly
then at least one of them is misinformed.

Then how would you decide which one is right?

Let me be asking this question first by renaming
Alice and Bob as Gnubg and XG?

Second and more importantly, in your example it's
not two people (or bots) contradicting each other.
It's the same bot XG contradicting itself. I'm sure
the same would be true for Gnubg also.

Thus the question becomes "how do you *trust*"
that XG or Gnubg is right in any given evaluation?

Would you go by a ratio? And if so, what would be
your treshold? Would it be enough for you if the bot
was 10% right? 20%? 30%?

An that, of course, assuming that you can decide if
XGR+ or XGR++ or XG-rollout is "right"...

I'm just wondering what would it take for you folks
to some day say enough is enough, these bots are
just unpredictable, unreliable pieces of shit...??

I think we're impressed by the bots because they're so
clearly better than the best humans. I think that's what
commands respect.
That seemed to be the case with chess computers.
They were laughed at when all experts could beat all
chess computers easily, and respected around the time
that they could compete with the world's strongest grandmasters.

And I don't think you ever claim to actually be able to beat a bot consistently.

Paul

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Timothy Chow@21:1/5 to All on Tue Nov 1 19:59:43 2022

On 11/1/2022 5:42 PM, MK wrote:

Thus the question becomes "how do you *trust*"
that XG or Gnubg is right in any given evaluation?

It's a reasonable question.

I would say that if a bot disagrees with itself then that is a
good reason *not* to trust it.

If it mostly agrees with itself when you perform various cross-
checks, then that doesn't prove that it is trustworthy, just as
when a lawyer cross-examines a witness and finds no contradictions,
it doesn't prove the witness is telling the truth. But as Paul
said, if the bot plays well overall, generally outperforming human
beings, then that's some evidence that it "knows what it is doing."

One can of course insist on adopting a skeptical posture under
all circumstances. This might mean that you avoid getting fooled
by lies, but it also means that you risk missing the truth. It's
up to every individual to decide how to make that tradeoff.

---
Tim Chow

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From pepstein5@gmail.com@21:1/5 to Tim Chow on Tue Nov 1 17:23:36 2022

On Tuesday, November 1, 2022 at 11:59:48 PM UTC, Tim Chow wrote:

On 11/1/2022 5:42 PM, MK wrote:

Thus the question becomes "how do you *trust*"
that XG or Gnubg is right in any given evaluation?

It's a reasonable question.

I would say that if a bot disagrees with itself then that is a
good reason *not* to trust it.

If it mostly agrees with itself when you perform various cross-
checks, then that doesn't prove that it is trustworthy, just as
when a lawyer cross-examines a witness and finds no contradictions,
it doesn't prove the witness is telling the truth. But as Paul
said, if the bot plays well overall, generally outperforming human
beings, then that's some evidence that it "knows what it is doing."

One can of course insist on adopting a skeptical posture under
all circumstances. This might mean that you avoid getting fooled
by lies, but it also means that you risk missing the truth. It's
up to every individual to decide how to make that tradeoff.

And an individual might make that tradeoff very differently, depending
on the matter that is being evaluated. They might be very skeptical
about statistical claims about non-randomness of dice, but not at all
skeptical about beliefs that conform to the religious or philosophical traditions that they identify with.

Paul

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MK@21:1/5 to peps...@gmail.com on Tue Nov 1 18:40:31 2022

On November 1, 2022 at 6:23:38 PM UTC-6, peps...@gmail.com wrote:

On November 1, 2022 at 11:59:48 PM UTC, Tim Chow wrote:

as Paul said, if the bot plays well overall, generally
outperforming human beings, then that's some
evidence that it "knows what it is doing."

If this was tested and proven, I wouldn't object to it.
One big problem is that the bots' "performance" has
never been blind-tested. All of the compared human
gamblegammon players try to play like the few bots
(which are all descendents of TD-Gammon v.2), so
much so that lately they have started to compete in
lowering their PR's (as computed by the same bots)
instead of achieving more wins against humans or
bots. This is dog chasing its tail...

And an individual might make that tradeoff very
differently, depending on the matter that is being
evaluated.

Yes. My example would be people who believe that
human players would but bot players wouldn't cheat.

They might be very skeptical about statistical claims
about non-randomness of dice, but not at all skeptical
about beliefs that conform to the religious or
philosophical traditions that they identify with.

I don't think sceptical is the counterpart of beliver. A
believer can believe in what may be true or what may
be false. Often, no proof will change one's belief. For
example, no amount of mutant bot experiments will
be enough to convince "cube skill theory believers"
that it's bullshit. If you hear someone say "I can't
believe their eyes", consider believing that they can't...

MK

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MK@21:1/5 to Simon Woodhead on Tue Nov 1 19:02:39 2022

On November 1, 2022 at 4:08:47 PM UTC-6, Simon Woodhead wrote:

On 2/11/2022 7:42 am, MK wrote:

I'm just wondering what would it take for you folks
to some day say enough is enough, these bots are
just unpredictable, unreliable pieces of shit...??

It would take a better bot.

A better bot would surely do that with the caveat
that the better bot can also be merely a better (or
worse depending on whether shit means positive
or negative) piece of shit... ;)

Even so, I've always said that we need and we can
easily develop better bots with today's computing
power.

One way would be to go back to TD-Gammon v.01,
(i.e. prior to the version Tesauro's bastardizing his
own bot in seeking validation/recognition from bg
gamblers), and do it the right way from there on...

BTW: I'm not asking you to do it. So, don't tell me to
do it myself.

MK

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MK@21:1/5 to peps...@gmail.com on Tue Nov 1 19:32:59 2022

On November 1, 2022 at 4:49:57 PM UTC-6, peps...@gmail.com wrote:

I think we're impressed by the bots because
they're so clearly better than the best humans.

There is no proof of this. At least nothing that
you yourself would call "rigorous". ;)

And I don't think you ever claim to actually be
able to beat a bot consistently.

I have made that claim. I conducted numerous
experiments and played quite a number of long
sessions to show that I could achieve it, (which
I shared at my web site, some accompanied by
youtube videos recorded in real time), see:

http://montanaonline.net/backgammon/xg.php

But you don't need to trust me if you don't want
to. That's why I urged you all for years to do your
own experiments. Though half-ass, the one that
Axel has done showed that even a crude "mutant"
can do well beyond expectations against a strong
bot. If you do better, more extensive, "rigorous" ;)
experiments, the proof will surely become clearer.

MK

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Keyop
  Sun Apr 28 20:37:53 2024
  from Huddersfield, West Yorkshire via SSH
- Keyop
  Sun Apr 28 20:37:37 2024
  from Huddersfield, West Yorkshire via SSH
- Keyop
  Sun Apr 28 20:30:04 2024
  from Huddersfield, West Yorkshire via SSH
- Bob Worm
  Mon Apr 29 09:04:47 2024
  from Wales, Uk via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	297
Nodes:	16 (2 / 14)
Uptime:	20:51:43
Calls:	6,667
Calls today:	1
Files:	12,216
Messages:	5,337,252

XG's predilection for blotty boards

Who's Online

Recent Visitors

System Info