In the position below, I played 8/4 8/2, and was surprised that XGR+
said that my play was 0.057 worse than its play of 8/2 5/1. (Story
continues below.)
XGID=---AABEBBb---B-a-abbcc--a-:0:0:1:64:0:0:0:0:10
Score is X:0 O:0. Unlimited Game
+13-14-15-16-17-18------19-20-21-22-23-24-+
| X O O O | | O O O O |
| X O | | O O O |
| | | O O |
| | | |
| | | |
| |BAR| |
| | | X |
| | | X |
| | | X |
| O X X | | X X |
| O X X | | X X X X |
+12-11-10--9--8--7-------6--5--4--3--2--1-+
Pip count X: 103 O: 104 X-O: 0-0
Cube: 1
X to play 64
1. XG Roller+ 8/2 5/1 eq:+0.391
Player: 62.83% (G:0.67% B:0.00%)
Opponent: 37.17% (G:0.28% B:0.01%)
2. XG Roller+ 7/1 6/2 eq:+0.340 (-0.052)
Player: 61.98% (G:0.27% B:0.04%)
Opponent: 38.02% (G:0.22% B:0.02%)
3. XG Roller+ 8/4 8/2 eq:+0.334 (-0.057)
Player: 62.05% (G:0.33% B:0.01%)
Opponent: 37.95% (G:0.28% B:0.00%)
4. XG Roller+ 8/2 7/3 eq:+0.333 (-0.058)
Player: 61.95% (G:0.38% B:0.01%)
Opponent: 38.05% (G:0.30% B:0.00%)
5. XG Roller+ 8/4 7/1 eq:+0.332 (-0.059)
Player: 61.93% (G:0.33% B:0.01%)
Opponent: 38.07% (G:0.34% B:0.01%)
6. XG Roller+ 8/2 6/2 eq:+0.331 (-0.060)
Player: 61.79% (G:0.32% B:0.01%)
Opponent: 38.21% (G:0.22% B:0.00%)
7. XG Roller+ 7/3 7/1 eq:+0.327 (-0.064)
Player: 61.61% (G:0.48% B:0.01%)
Opponent: 38.39% (G:0.36% B:0.00%)
8. XG Roller+ 7/1 5/1 eq:+0.308 (-0.084)
Player: 60.96% (G:0.31% B:0.01%)
Opponent: 39.04% (G:0.24% B:0.00%)
eXtreme Gammon Version: 2.19.211.pre-release
I tried appealing to XGR++, and it narrowed the gap between the two
plays, but still insisted that creating five blots in its board was
the best play.
XGID=---AABEBBb---B-a-abbcc--a-:0:0:1:64:0:0:0:0:10
Score is X:0 O:0. Unlimited Game
+13-14-15-16-17-18------19-20-21-22-23-24-+
| X O O O | | O O O O |
| X O | | O O O |
| | | O O |
| | | |
| | | |
| |BAR| |
| | | X |
| | | X |
| | | X |
| O X X | | X X |
| O X X | | X X X X |
+12-11-10--9--8--7-------6--5--4--3--2--1-+
Pip count X: 103 O: 104 X-O: 0-0
Cube: 1
X to play 64
1. XG Roller++ 8/2 5/1 eq:+0.375
Player: 62.31% (G:0.51% B:0.01%)
Opponent: 37.69% (G:0.33% B:0.01%)
2. XG Roller++ 8/4 8/2 eq:+0.351 (-0.024)
Player: 62.02% (G:0.39% B:0.00%)
Opponent: 37.98% (G:0.34% B:0.00%)
3. XG Roller++ 7/1 6/2 eq:+0.343 (-0.031)
Player: 61.82% (G:0.29% B:0.00%)
Opponent: 38.18% (G:0.29% B:0.00%)
4. XG Roller++ 8/4 7/1 eq:+0.342 (-0.033)
Player: 61.71% (G:0.35% B:0.00%)
Opponent: 38.29% (G:0.36% B:0.00%)
5. XG Roller++ 7/3 7/1 eq:+0.341 (-0.034)
Player: 61.61% (G:0.38% B:0.01%)
Opponent: 38.39% (G:0.29% B:0.01%)
6. XG Roller++ 8/2 6/2 eq:+0.338 (-0.037)
Player: 61.61% (G:0.27% B:0.01%)
Opponent: 38.39% (G:0.23% B:0.00%)
7. XG Roller++ 8/2 7/3 eq:+0.337 (-0.038)
Player: 61.58% (G:0.28% B:0.00%)
Opponent: 38.42% (G:0.28% B:0.00%)
eXtreme Gammon Version: 2.19.211.pre-release
Finally, I decided to do a rollout, with stronger parameters than
usual. I was pleased to see that sanity was restored. But this
position illustrates what I believe is a systematic weakness in XG,
which is that it doesn't evaluate blotty boards very well. See also
this old BGOnline post:
http://timothychow.net/cg/www.bgonline.org/forums/164769.html
XGID=---AABEBBb---B-a-abbcc--a-:0:0:1:64:0:0:0:0:10
Score is X:0 O:0. Unlimited Game
+13-14-15-16-17-18------19-20-21-22-23-24-+
| X O O O | | O O O O |
| X O | | O O O |
| | | O O |
| | | |
| | | |
| |BAR| |
| | | X |
| | | X |
| | | X |
| O X X | | X X |
| O X X | | X X X X |
+12-11-10--9--8--7-------6--5--4--3--2--1-+
Pip count X: 103 O: 104 X-O: 0-0
Cube: 1
X to play 64
1. Rollout¹ 8/4 8/2 eq:+0.343
Player: 61.74% (G:0.31% B:0.01%)
Opponent: 38.26% (G:0.27% B:0.01%)
Confidence: ±0.008 (+0.335..+0.352) - [41.7%]
2. Rollout¹ 8/4 7/1 eq:+0.342 (-0.002)
Player: 61.70% (G:0.30% B:0.01%)
Opponent: 38.30% (G:0.28% B:0.01%)
Confidence: ±0.008 (+0.334..+0.350) - [24.7%]
3. Rollout¹ 7/3 7/1 eq:+0.342 (-0.002)
Player: 61.38% (G:0.49% B:0.04%)
Opponent: 38.62% (G:0.37% B:0.01%)
Confidence: ±0.008 (+0.334..+0.349) - [23.1%]
4. Rollout¹ 7/1 6/2 eq:+0.338 (-0.006)
Player: 61.54% (G:0.32% B:0.01%)
Opponent: 38.46% (G:0.31% B:0.01%)
Confidence: ±0.008 (+0.330..+0.346) - [5.2%]
5. Rollout¹ 8/2 6/2 eq:+0.338 (-0.006)
Player: 61.49% (G:0.26% B:0.01%)
Opponent: 38.51% (G:0.22% B:0.01%)
Confidence: ±0.008 (+0.330..+0.346) - [4.8%]
6. Rollout¹ 8/2 7/3 eq:+0.334 (-0.010)
Player: 61.47% (G:0.29% B:0.01%)
Opponent: 38.53% (G:0.30% B:0.01%)
Confidence: ±0.007 (+0.327..+0.341) - [0.4%]
7. Rollout¹ 8/2 5/1 eq:+0.326 (-0.018)
Player: 61.13% (G:0.44% B:0.02%)
Opponent: 38.87% (G:0.48% B:0.02%)
Confidence: ±0.008 (+0.318..+0.334) - [0.0%]
¹ 1296 Games rolled with Variance Reduction.
Dice Seed: 271828
Moves and cube decisions: XG Roller+
Search interval: Large
I disagree that you've found evidence of a weakness.
But let's assume that the rollout is correct, and that XG's play does indeed lose 0.018 equity. How bad is this?
On 10/26/2022 3:52 AM, peps...@gmail.com wrote:
I disagree that you've found evidence of a weakness.This is just one example out of many similar examples I've
encountered over the years. When neither player is likely to
leave a blot in the next couple of rolls, XG frequently makes
all kind of nutty plays, making a complete mess of its board
for no good reason.
But let's assume that the rollout is correct, and that XG's play does indeedThe issue isn't that XG's play loses 0.018 equity. The issue
lose 0.018 equity. How bad is this?
is that when we pass from XGR+ to a rollout, there's a swing from
-0.057 to +0.018, showing that XG doesn't understand the position
very well. This would be true even if XGR+ is "right" and the rollout
is "wrong." If Alice says yes and Bob says no, at most one of them
can be right, and if they disagree strongly then at least one of them
is misinformed.
---
Tim Chow
On 10/26/2022 3:52 AM, peps...@gmail.com wrote:
I disagree that you've found evidence of a weakness.This is just one example out of many similar examples I've
encountered over the years. When neither player is likely to
leave a blot in the next couple of rolls, XG frequently makes
all kind of nutty plays, making a complete mess of its board
for no good reason.
But let's assume that the rollout is correct, and that XG's play does indeedThe issue isn't that XG's play loses 0.018 equity. The issue
lose 0.018 equity. How bad is this?
is that when we pass from XGR+ to a rollout, there's a swing from
-0.057 to +0.018, showing that XG doesn't understand the position
very well. This would be true even if XGR+ is "right" and the rollout
is "wrong." If Alice says yes and Bob says no, at most one of them
can be right, and if they disagree strongly then at least one of them
is misinformed.
But there is good reason to make a mess of our board here. By far the most likely scenario is for this game to turn into a race. (which we're currently leading semi comfortably) So...I make a racing driven play. It's a fine line in maintaining sometiming so we are able to clear the midpoint without leaving a shot and distributing perfectly for the race bear in/bear off but that's why I'd have played 7/1 6/2 OtB and maintain it's best.
On 10/26/2022 3:46 PM, Stick Rice wrote:timing so we are able to clear the midpoint without leaving a shot and distributing perfectly for the race bear in/bear off but that's why I'd have played 7/1 6/2 OtB and maintain it's best.
But there is good reason to make a mess of our board here. By far the most likely scenario is for this game to turn into a race. (which we're currently leading semi comfortably) So...I make a racing driven play. It's a fine line in maintaining some
How is dumping checkers on low points good for the race?
Generally speaking, we should be trying to avoid wastage,
and not worry about gaps on the 1pt and 2pt. Suppose we
remove the four checkers in the outfield that are creating
contact. Admittedly the resulting position is artificial,
but it illustrates the point that 7/1 6/2 doesn't seem to
be good for the race.
XGID=---AABEBB------a-abbcc--a-:0:0:1:64:0:0:0:0:10
Score is X:0 O:0. Unlimited Game
+13-14-15-16-17-18------19-20-21-22-23-24-+
| O O O | | O O O O |
| O | | O O O |
| | | O O |
| | | |
| | | |
| |BAR| |
| | | X |
| | | X |
| | | X |
| X X | | X X |
| X X | | X X X X |
+12-11-10--9--8--7-------6--5--4--3--2--1-+
Pip count X: 77 O: 72 X-O: 0-0
Cube: 1
X to play 64
1. Rollout¹ 8/4 7/1 eq:+0.095
Player: 53.26% (G:0.00% B:0.00%)
Opponent: 46.74% (G:0.00% B:0.00%)
Confidence: ±0.004 (+0.091..+0.098) - [75.0%]
2. Rollout¹ 8/4 8/2 eq:+0.093 (-0.002)
Player: 53.22% (G:0.00% B:0.00%)
Opponent: 46.78% (G:0.00% B:0.00%)
Confidence: ±0.003 (+0.089..+0.096) - [25.0%]
3. Rollout¹ 8/2 7/3 eq:+0.080 (-0.015)
Player: 52.76% (G:0.00% B:0.00%)
Opponent: 47.24% (G:0.00% B:0.00%)
Confidence: ±0.004 (+0.076..+0.083) - [0.0%]
4. Rollout¹ 7/3 7/1 eq:+0.071 (-0.023)
Player: 52.37% (G:0.00% B:0.00%)
Opponent: 47.63% (G:0.00% B:0.00%)
Confidence: ±0.004 (+0.068..+0.075) - [0.0%]
5. Rollout¹ 7/1 6/2 eq:+0.064 (-0.030)
Player: 52.11% (G:0.00% B:0.00%)
Opponent: 47.89% (G:0.00% B:0.00%)
Confidence: ±0.004 (+0.061..+0.068) - [0.0%]
¹ 1296 Games rolled with Variance Reduction.
Dice Seed: 271828
Moves: 3-ply, cube decisions: XG Roller
eXtreme Gammon Version: 2.19.211.pre-release
---
Tim Chow
As I said, it's a fine line distributing for the race and keeping some timing so we are able to clear the midpoint without leaving a shot. Putting one checker on a lower point does no real harm race wise.
On 10/27/2022 9:53 AM, Stick Rice wrote:
As I said, it's a fine line distributing for the race.....
At least it seems we agree that there's no good reason
for XGR+ to insist that 5/1 is best by a clear margin.
The issue isn't that XG's play loses 0.018 equity.
The issue is that when we pass from XGR+ to a
rollout, there's a swing from -0.057 to +0.018...
This would be true even if XGR+ is "right" and the
rollout is "wrong."
If Alice says yes and Bob says no, at most one of
them can be right, and if they disagree strongly
then at least one of them is misinformed.
I'm just wondering what would it take for you folks
to some day say enough is enough, these bots are
just unpredictable, unreliable pieces of shit...??
On October 26, 2022 at 6:26:33 AM UTC-6, Tim Chow wrote:
The issue isn't that XG's play loses 0.018 equity.
The issue is that when we pass from XGR+ to a
rollout, there's a swing from -0.057 to +0.018...
This is a very interesting example. It's not a case
where the top two or there plays trade places but
the rankings of all plays scramble all over the place
in XGR+ XGR++ and rollout.
In addition to your correctly making the important
point that 8/4 8/2 goes from -0.057 to +0.018, the
"raw equity" goes from +0.334 in XGR+ to +0.351 in
XGR++ and back to +0.343 in rollout, while the top
play 8/2 5/1 goes even more drastically from +0.391
in XGR+ to +0.375 in XGR++ and then further down
to +0.326 in rollout, i.e. -0.065 difference accross the
three evaluations.
This would be true even if XGR+ is "right" and theTrue indeed and this comment adds to the credibility
rollout is "wrong."
of your objectivity on the subject.
If Alice says yes and Bob says no, at most one ofThen how would you decide which one is right?
them can be right, and if they disagree strongly
then at least one of them is misinformed.
Let me be asking this question first by renaming
Alice and Bob as Gnubg and XG?
Second and more importantly, in your example it's
not two people (or bots) contradicting each other.
It's the same bot XG contradicting itself. I'm sure
the same would be true for Gnubg also.
Thus the question becomes "how do you *trust*"
that XG or Gnubg is right in any given evaluation?
Would you go by a ratio? And if so, what would be
your treshold? Would it be enough for you if the bot
was 10% right? 20%? 30%?
An that, of course, assuming that you can decide if
XGR+ or XGR++ or XG-rollout is "right"...
I'm just wondering what would it take for you folks
to some day say enough is enough, these bots are
just unpredictable, unreliable pieces of shit...??
Thus the question becomes "how do you *trust*"
that XG or Gnubg is right in any given evaluation?
On 11/1/2022 5:42 PM, MK wrote:
Thus the question becomes "how do you *trust*"It's a reasonable question.
that XG or Gnubg is right in any given evaluation?
I would say that if a bot disagrees with itself then that is a
good reason *not* to trust it.
If it mostly agrees with itself when you perform various cross-
checks, then that doesn't prove that it is trustworthy, just as
when a lawyer cross-examines a witness and finds no contradictions,
it doesn't prove the witness is telling the truth. But as Paul
said, if the bot plays well overall, generally outperforming human
beings, then that's some evidence that it "knows what it is doing."
One can of course insist on adopting a skeptical posture under
all circumstances. This might mean that you avoid getting fooled
by lies, but it also means that you risk missing the truth. It's
up to every individual to decide how to make that tradeoff.
On November 1, 2022 at 11:59:48 PM UTC, Tim Chow wrote:
as Paul said, if the bot plays well overall, generally
outperforming human beings, then that's some
evidence that it "knows what it is doing."
And an individual might make that tradeoff very
differently, depending on the matter that is being
evaluated.
They might be very skeptical about statistical claims
about non-randomness of dice, but not at all skeptical
about beliefs that conform to the religious or
philosophical traditions that they identify with.
On 2/11/2022 7:42 am, MK wrote:
I'm just wondering what would it take for you folks
to some day say enough is enough, these bots are
just unpredictable, unreliable pieces of shit...??
It would take a better bot.
I think we're impressed by the bots because
they're so clearly better than the best humans.
And I don't think you ever claim to actually be
able to beat a bot consistently.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 297 |
Nodes: | 16 (2 / 14) |
Uptime: | 20:51:43 |
Calls: | 6,667 |
Calls today: | 1 |
Files: | 12,216 |
Messages: | 5,337,252 |