Forum: >>> Magnum BBS <<<

The hypothesis that XG "doesn't evaluate blotty boards well."

From pepstein5@gmail.com@21:1/5 to All on Thu Oct 27 01:02:59 2022

Tim has spoken earlier about the need to be statistically careful
before reaching conclusions in backgammon. A particular
example of this is that people can naively see unusual runs
of the dice and assert non-randomness without testing.

So, surely we should apply similar statistical discipline to the
idea that XG has a specific problem with blotty boards.
First, let's assume that the concept of a blotty-board position can be well-defined. This might mean that the board is already blotty or it
might mean that blots can be created in the inner board.

At a minimum, this assertion should require at least some evidence for the following:
1) XG tends to lose more equity in blotty-board positions than in other non-contact positions.

2) XG loses more equity by choosing unnecessarily blotty plays than
by missing opportunities to correctly leave inner board blots.

3) There is an identifiable category of blotty-board positions where XG's
play is worse than the best humans. [I would rigorously define "not good"
play by a bot as being below best-human standard, but that might be idiosyncratic].

I don't see Tim looking into any of the above three points.
At the moment, what Tim seems to be doing is to note what good
statistical practice is, and then do exactly the opposite. He seems
to be doing this:

1) Cherry-pick positions where XG surprisingly leaves blots.
2) Roll these out.
3) Report it whenever XG has made an error.
4) Ignore the situation whenever XG has been correct [this is admittedly
a guess].
5) Naively assert that he has discovered a problem.

Paul

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From pepstein5@gmail.com@21:1/5 to peps...@gmail.com on Thu Oct 27 04:44:48 2022

On Thursday, October 27, 2022 at 9:03:01 AM UTC+1, peps...@gmail.com wrote:

Tim has spoken earlier about the need to be statistically careful
before reaching conclusions in backgammon. A particular
example of this is that people can naively see unusual runs
of the dice and assert non-randomness without testing.

So, surely we should apply similar statistical discipline to the
idea that XG has a specific problem with blotty boards.
First, let's assume that the concept of a blotty-board position can be well-defined. This might mean that the board is already blotty or it
might mean that blots can be created in the inner board.

At a minimum, this assertion should require at least some evidence for the following:
1) XG tends to lose more equity in blotty-board positions than in other non-contact positions.

2) XG loses more equity by choosing unnecessarily blotty plays than
by missing opportunities to correctly leave inner board blots.

3) There is an identifiable category of blotty-board positions where XG's play is worse than the best humans. [I would rigorously define "not good" play by a bot as being below best-human standard, but that might be idiosyncratic].

I don't see Tim looking into any of the above three points.
At the moment, what Tim seems to be doing is to note what good
statistical practice is, and then do exactly the opposite. He seems
to be doing this:

1) Cherry-pick positions where XG surprisingly leaves blots.
2) Roll these out.
3) Report it whenever XG has made an error.
4) Ignore the situation whenever XG has been correct [this is admittedly
a guess].
5) Naively assert that he has discovered a problem.

Paul

non-contact -> contact

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Timothy Chow@21:1/5 to peps...@gmail.com on Thu Oct 27 08:14:49 2022

On 10/27/2022 4:02 AM, peps...@gmail.com wrote:

Tim has spoken earlier about the need to be statistically careful
before reaching conclusions in backgammon. A particular
example of this is that people can naively see unusual runs
of the dice and assert non-randomness without testing.

So, surely we should apply similar statistical discipline to the
idea that XG has a specific problem with blotty boards.

One could do that, of course. But for comparison, let's look at
how serious backgammon players try to lower their PR. What do they
do? Do they apply rigorous statistical discipline as you suggest?
No. They run their games through a bot, examine what the bot says
are errors, try to understand them, and then try to adjust their
play accordingly. I don't know of anyone who applies rigorous
statistical procedures to determine whether (for example) they should
step out to the bar point more often if they want to lower their PR. Nevertheless, by all accounts, this non-rigorous procedure seems to
work.

You could take the point of view that what's going on is that people
are fooling themselves. Maybe their PR isn't actually getting better,
or maybe it's getting better for reasons that have nothing to do with
the patterns they think they have discerned. Maybe their PR would
decrease even faster if they were to *not* consult XG at all. I don't
find these hypotheses plausible, but I'm also not going to invest any
time trying to support or refute them using rigorous statistical
methodology.

The observation about XG and blotty boards is similar. If you pay
some attention, you'll see the pattern for yourself, just as when I
pay attention to XG's evaluations of my play, I notice that I cash
when TG far too often. Can I show you a rigorous statistical experiment
that proves that I cash when TG too often? No. Do I care that I have
no such experiment? No.

To be clear, I'm not saying that XG's blotty-board tendencies lose a
lot of equity. In fact, I would say they typically don't, and it's
precisely *because* they usually don't matter much that XG does this
sort of thing. I have a folder with over 400 positions I've collected
where XG makes errors I found interesting, and not very many of these
are blotty-board errors, because after I learned that they usually
don't cost a lot of equity, I mostly stopped collecting them. What
I'm doing by posting to r.g.b. is offering some free advice that if
XGR+ dings you with a 0.057 "error" for making a natural play instead
of its nutty 5/1 board-breaking play, then you should take it with a
grain of salt. If you prefer to ignore the advice until you see
statistical proof, you're of course free to do so.

---
Tim Chow

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From pepstein5@gmail.com@21:1/5 to Tim Chow on Thu Oct 27 05:32:37 2022

On Thursday, October 27, 2022 at 1:14:52 PM UTC+1, Tim Chow wrote:

On 10/27/2022 4:02 AM, peps...@gmail.com wrote:

Tim has spoken earlier about the need to be statistically careful
before reaching conclusions in backgammon. A particular
example of this is that people can naively see unusual runs
of the dice and assert non-randomness without testing.

So, surely we should apply similar statistical discipline to the
idea that XG has a specific problem with blotty boards.

One could do that, of course. But for comparison, let's look at
how serious backgammon players try to lower their PR. What do they
do? Do they apply rigorous statistical discipline as you suggest?
No. They run their games through a bot, examine what the bot says
are errors, try to understand them, and then try to adjust their
play accordingly. I don't know of anyone who applies rigorous
statistical procedures to determine whether (for example) they should
step out to the bar point more often if they want to lower their PR. Nevertheless, by all accounts, this non-rigorous procedure seems to
work.

You could take the point of view that what's going on is that people
are fooling themselves. Maybe their PR isn't actually getting better,
or maybe it's getting better for reasons that have nothing to do with
the patterns they think they have discerned. Maybe their PR would
decrease even faster if they were to *not* consult XG at all. I don't
find these hypotheses plausible, but I'm also not going to invest any
time trying to support or refute them using rigorous statistical
methodology.

The observation about XG and blotty boards is similar. If you pay
some attention, you'll see the pattern for yourself, just as when I
pay attention to XG's evaluations of my play, I notice that I cash
when TG far too often. Can I show you a rigorous statistical experiment
that proves that I cash when TG too often? No. Do I care that I have
no such experiment? No.

To be clear, I'm not saying that XG's blotty-board tendencies lose a
lot of equity. In fact, I would say they typically don't, and it's
precisely *because* they usually don't matter much that XG does this
sort of thing. I have a folder with over 400 positions I've collected
where XG makes errors I found interesting, and not very many of these
are blotty-board errors, because after I learned that they usually
don't cost a lot of equity, I mostly stopped collecting them. What
I'm doing by posting to r.g.b. is offering some free advice that if
XGR+ dings you with a 0.057 "error" for making a natural play instead
of its nutty 5/1 board-breaking play, then you should take it with a
grain of salt. If you prefer to ignore the advice until you see
statistical proof, you're of course free to do so.

A thoughtful response. Thanks.
BTW, I've long had a hypothesis (also unproven and not very well
substantiated) that sub-world-class players generally have a tendency to
cash too readily in TG positions.
Thanks for being one confirmatory data point in this totally biased hypothesis.

No, I don't want to ignore your advice. I think the blotty play you allude to might be a bit less significant than you think it is, but, of course, I'm not telepathic
and don't know your exact thoughts.

Thanks,

Paul

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Timothy Chow@21:1/5 to peps...@gmail.com on Thu Oct 27 09:07:40 2022

On 10/27/2022 8:32 AM, peps...@gmail.com wrote:

No, I don't want to ignore your advice. I think the blotty play you allude to
might be a bit less significant than you think it is, but, of course, I'm not telepathic
and don't know your exact thoughts.

As a point of clarification, there are two different types
of "errors" by XG that we might care about (and by "errors"
I mean a play for which XG's verdict is different depending
on whether you roll it out or use a lower-strength evaluation
or truncated rollout, and where we assume that the rollout
with the strongest settings is "correct"). For simplicity,
let's say that there are just two plausible candidate plays,
A and B, and let's use "XGR+" to refer to the weaker setting.
Without loss of generality, assume that the rollout favors
play A and XGR+ favors play B.

1. The rollout says the equity difference is large.

2. XGR+ says the equity difference is large but the rollout
says the equity difference is small.

(The remaining possibility, that both settings say that the
equity difference is small, I don't really care about.)

People mostly pay attention to 1. This is understandable
if your goal is to assess how well XGR+ plays; in case 2,
the error that XGR+ makes is small, so who cares? But note
that increasingly, computers are being used to *assess human
play*. The BMAB awards titles based on how XG rates your play.
People privately use XG to identify their own errors, usually
focusing on cases where XG says they made a big error. If XG
is playing this kind of role, then case 2 matters just as much
as case 1.

I think that the importance of case 2 errors has been
underestimated or even ignored, because people fail to grasp
the difference between XGR+ as a player and XGR+ as a judge.
So part of the reason I posted that position was to give an
example of a case 2 error. The smallness of the rollout equity
difference was therefore a feature and not a bug.

---
Tim Chow

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From pepstein5@gmail.com@21:1/5 to Tim Chow on Thu Oct 27 06:43:20 2022

On Thursday, October 27, 2022 at 2:07:43 PM UTC+1, Tim Chow wrote:

On 10/27/2022 8:32 AM, peps...@gmail.com wrote:

No, I don't want to ignore your advice. I think the blotty play you allude to
might be a bit less significant than you think it is, but, of course, I'm not telepathic
and don't know your exact thoughts.

As a point of clarification, there are two different types
of "errors" by XG that we might care about (and by "errors"
I mean a play for which XG's verdict is different depending
on whether you roll it out or use a lower-strength evaluation
or truncated rollout, and where we assume that the rollout
with the strongest settings is "correct"). For simplicity,
let's say that there are just two plausible candidate plays,
A and B, and let's use "XGR+" to refer to the weaker setting.
Without loss of generality, assume that the rollout favors
play A and XGR+ favors play B.

1. The rollout says the equity difference is large.

2. XGR+ says the equity difference is large but the rollout
says the equity difference is small.

(The remaining possibility, that both settings say that the
equity difference is small, I don't really care about.)

People mostly pay attention to 1. This is understandable
if your goal is to assess how well XGR+ plays; in case 2,
the error that XGR+ makes is small, so who cares? But note
that increasingly, computers are being used to *assess human
play*. The BMAB awards titles based on how XG rates your play.
People privately use XG to identify their own errors, usually
focusing on cases where XG says they made a big error. If XG
is playing this kind of role, then case 2 matters just as much
as case 1.

I think that the importance of case 2 errors has been
underestimated or even ignored, because people fail to grasp
the difference between XGR+ as a player and XGR+ as a judge.
So part of the reason I posted that position was to give an
example of a case 2 error. The smallness of the rollout equity
difference was therefore a feature and not a bug.

---
Tim Chow

Could a solution to all this be to find a bg expert who has access
to extremely powerful supercomputers? I would think there are
powerful technologies that could use parallel processing to do
all rollouts instantly for an entire match.

With there being a significant intersection between highly skilled maths/computing people and bg people, I would have thought finding
such a person could be feasible.

Or is all this harder than I think it is?

Paul

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Timothy Chow@21:1/5 to peps...@gmail.com on Fri Oct 28 08:23:51 2022

On 10/27/2022 9:43 AM, peps...@gmail.com wrote:

Could a solution to all this be to find a bg expert who has access
to extremely powerful supercomputers? I would think there are
powerful technologies that could use parallel processing to do
all rollouts instantly for an entire match.

With there being a significant intersection between highly skilled maths/computing people and bg people, I would have thought finding
such a person could be feasible.

Or is all this harder than I think it is?

First of all, my impression is that the BMAB runs on a shoestring
budget, so what is theoretically possible may not be doable in
practice. I recall that one thing Stick wanted was for players to
be able to mark some plays in advance for rolling out (so that he
wouldn't be penalized for plays which he knew would be misevaluated
by XGR+), but the BMAB does not do this. I don't know the reasons,
but my guess is that it would require too much work for them to
accommodate Stick's request.

The other thing is that I don't think XG is designed to run on a
cloud. I'm also not sure there's an easy way to get it to roll
out every last candidate for every decision. For starters, it
has a built-in limit of 32 checker-play candidates for each move.
(Maybe that's enough for BMAB purposes, though.) Probably one
would have to pay Xavier to do some development work to enable
what you're suggesting, and again, presumably the BMAB doesn't
think this is an effective use of whatever limited money it has.

---
Tim Chow

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From pepstein5@gmail.com@21:1/5 to Tim Chow on Fri Oct 28 07:50:20 2022

On Friday, October 28, 2022 at 1:23:53 PM UTC+1, Tim Chow wrote:

On 10/27/2022 9:43 AM, peps...@gmail.com wrote:

Could a solution to all this be to find a bg expert who has access
to extremely powerful supercomputers? I would think there are
powerful technologies that could use parallel processing to do
all rollouts instantly for an entire match.

With there being a significant intersection between highly skilled maths/computing people and bg people, I would have thought finding
such a person could be feasible.

Or is all this harder than I think it is?

First of all, my impression is that the BMAB runs on a shoestring
budget, so what is theoretically possible may not be doable in
practice. I recall that one thing Stick wanted was for players to
be able to mark some plays in advance for rolling out (so that he
wouldn't be penalized for plays which he knew would be misevaluated
by XGR+), but the BMAB does not do this. I don't know the reasons,
but my guess is that it would require too much work for them to
accommodate Stick's request.

The other thing is that I don't think XG is designed to run on a
cloud. I'm also not sure there's an easy way to get it to roll
out every last candidate for every decision. For starters, it
has a built-in limit of 32 checker-play candidates for each move.
(Maybe that's enough for BMAB purposes, though.) Probably one
would have to pay Xavier to do some development work to enable
what you're suggesting, and again, presumably the BMAB doesn't
think this is an effective use of whatever limited money it has.

I really like Stick's idea on the rollout (which I can remember from
previous threads). One possible objection (which I don't share) is that
it creates confustion to combine the task of competing with the post-competition
evaluation. However, there is a neat precedent for this.
In some major soccer competitions (I think this includes the World Cup), players can "claim the goal" if they think that they scored but another player could also plausibly claim to have scored.
Clearly, the identity of the scorers is not an issue for the purpose of identifying
the winning team, but players who "claim the goal" have an eye to their evaluation
afterwards.

Paul

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Timothy Chow@21:1/5 to peps...@gmail.com on Fri Oct 28 22:35:40 2022

On 10/28/2022 10:50 AM, peps...@gmail.com wrote:

I really like Stick's idea on the rollout (which I can remember from
previous threads). One possible objection (which I don't share) is that
it creates confustion to combine the task of competing with the post-competition
evaluation. However, there is a neat precedent for this.

Again, I don't know the true objections, but I know that if I were
in charge, what I would dread most would be the logistics.
How do people submit their candidates, how do you do this in a
standardized manner, how do you take elementary precautions against
people trying to cheat by secretly checking a bot before submitting
their candidates, how do you handle lost records, how do you settle
disputes, etc. It's just a nightmare. I'm sure that even with the
current relatively simple system, irregularities occur with some
frequency and cause more headaches than the BMAB would like. In the
end, it's going to make only a small difference for a small number
of people. Not much bang for the buck from the BMAB's point of view.

---
Tim Chow

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From pepstein5@gmail.com@21:1/5 to Tim Chow on Sat Oct 29 01:51:17 2022

On Saturday, October 29, 2022 at 3:35:43 AM UTC+1, Tim Chow wrote:

On 10/28/2022 10:50 AM, peps...@gmail.com wrote:

I really like Stick's idea on the rollout (which I can remember from previous threads). One possible objection (which I don't share) is that
it creates confustion to combine the task of competing with the post-competition
evaluation. However, there is a neat precedent for this.

Again, I don't know the true objections, but I know that if I were
in charge, what I would dread most would be the logistics.
How do people submit their candidates, how do you do this in a
standardized manner, how do you take elementary precautions against
people trying to cheat by secretly checking a bot before submitting
their candidates, how do you handle lost records, how do you settle
disputes, etc. It's just a nightmare. I'm sure that even with the
current relatively simple system, irregularities occur with some
frequency and cause more headaches than the BMAB would like. In the
end, it's going to make only a small difference for a small number
of people. Not much bang for the buck from the BMAB's point of view.

---
Tim Chow

Fair points. Here's a suggestion below.
Someone sets up a website (or a facility on an existing website) where,
after for a period of one week after any BMAB-regulated match, players can
post plays. They tick a checkbox that they haven't previously rolled out the plays. Players agree (informally, no one checks this) to do rollouts
(with clear settings) for each play that they've posted to the website.
Players then post the rollout results to the website, and the website then adjusts their PR.
Of course, players can totally cheat this system if they want to.
Players will then have an official BMAB PR, and a self-reported adjusted BMAB PR.

It's then up to the bg public whether they want to believe in the self-reported adjustments or be skeptical.
I would be interested in the self-reported figures even though players can't prove these.
(I'm certainly interested in Stick's claims about his abilities (and I certainly choose to
believe him) even when I haven't seen any proof).
And I think this lack of skepticism is somewhat natural, particularly when players have been
proven (by the offical BMAB results) to be strong players.
For example, if an acclaimed mathematician (say someone with a postdoctoral position
at Harvard) claimed that they tried the most recent Putnam exam by themselves without
cheating, and scored 100%, don't you think people would believe the mathematician rather
than suspect that they're just lying and actually googled the solutions?

The only catch I can see is that I'm not sure how much work setting up the website would be.
I certainly wouldn't know how to do this. But it must surely be far far less work than setting up
bglog.org for example.

Some sort of website or semi-official reporting system does seem necessary though. Without something
like this, it would be too easy for players to forget the cases where the adjustment goes against them
and include the others.
With my reporting system, I don't think that such a "forgetting" problem can happen, (but clearly
blatant cheating can).

Paul

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Timothy Chow@21:1/5 to peps...@gmail.com on Sat Oct 29 08:04:22 2022

On 10/29/2022 4:51 AM, peps...@gmail.com wrote:

For example, if an acclaimed mathematician (say someone with a
postdoctoral position at Harvard) claimed that they tried the most
recent Putnam exam by themselves without cheating, and scored 100%,
don't you think people would believe the mathematician rather
than suspect that they're just lying and actually googled the
solutions?

Some people would believe and others were not. People cheat all the
time, sometimes for seemingly inexplicable reasons. And even if they
don't cheat, others will suspect them of cheating.

Years ago, Iancho Hristov was informally tracking various people's
PR's, based on available data from online and in-person recorded
matches. Stick was at the top of Iancho's list, based on his online
play. At one point, Stick's PR, averaged over 50 matches, was 2.0.
Many people were convinced that Stick was cheating, presumably by
consulting a bot at crucial moments. (For the record, at the time,
I was one of the few people posting to BGO who was saying that I
didn't believe that Stick was cheating.) They were saying he should demonstrate a 2.0 PR in live play, or submit to live proctoring.
On another occasion, Neil Kazaross was playing an on-line match
against the rest of BGO and was achieving very low PR's. Again,
there were accusations that he was consulting a bot at crucial moments.

You must have heard about the current hullabaloo over cheating in
chess at high levels. Some people, it seems, cheat *more* when nothing
is at stake; others cheat more when the stakes are higher.

Having said all that, I do think there is one potential benefit to your suggestion, which is that it would probably generate a lot of debate
about whether cheating was going on, and the extra publicity would
probably be good for the BMAB. There's nothing like a good controversy
to get people to pay attention to something they would otherwise have
no interest in.

---
Tim Chow

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From pepstein5@gmail.com@21:1/5 to Tim Chow on Sat Oct 29 15:31:29 2022

On Saturday, October 29, 2022 at 1:04:24 PM UTC+1, Tim Chow wrote:

On 10/29/2022 4:51 AM, peps...@gmail.com wrote:

For example, if an acclaimed mathematician (say someone with a
postdoctoral position at Harvard) claimed that they tried the most
recent Putnam exam by themselves without cheating, and scored 100%,
don't you think people would believe the mathematician rather
than suspect that they're just lying and actually googled the
solutions?

Some people would believe and others were not. People cheat all the
time, sometimes for seemingly inexplicable reasons. And even if they
don't cheat, others will suspect them of cheating.

Years ago, Iancho Hristov was informally tracking various people's
PR's, based on available data from online and in-person recorded
matches. Stick was at the top of Iancho's list, based on his online
play. At one point, Stick's PR, averaged over 50 matches, was 2.0.
Many people were convinced that Stick was cheating, presumably by
consulting a bot at crucial moments. (For the record, at the time,
I was one of the few people posting to BGO who was saying that I
didn't believe that Stick was cheating.) They were saying he should demonstrate a 2.0 PR in live play, or submit to live proctoring.
On another occasion, Neil Kazaross was playing an on-line match
against the rest of BGO and was achieving very low PR's. Again,
there were accusations that he was consulting a bot at crucial moments.

You must have heard about the current hullabaloo over cheating in
chess at high levels. Some people, it seems, cheat *more* when nothing
is at stake; others cheat more when the stakes are higher.

Having said all that, I do think there is one potential benefit to your suggestion, which is that it would probably generate a lot of debate
about whether cheating was going on, and the extra publicity would
probably be good for the BMAB. There's nothing like a good controversy
to get people to pay attention to something they would otherwise have
no interest in.

My point is that it may be of interest how accomplished people evaluate themselves, even if people are likely to exaggerate.

I think that for a "you cheated" argument to have any credibility, actual positions
need to be pointed out where strong human play is unlikely (or
unlikely to be achievable with consistency).
For example, in chess, to beat a GM where every single move of the winner follows Stockfish's
first preference does, at first, sound suspicious. But the suspicion vanishes if it is discovered
that the two players were following well-established opening theory until the GM blundered
into a four-move winning combo (losing from the GM's point of view) which the GM's opponent spotted,
and that the GM resigned at the end of the combo.

For statistical purposes, 50 matches might not be all that significant.
A PR of 3.0 is very believable for a world-class player and this might be obtained with long stretches
averaging 4.0 and long stretches averaging 2.0. Finding a specific sample with a PR of 2.0 might
be cherry-picking. So yes, there is no evidence of cheating in your post.

As far as I know, the main hullabaloo in chess, related to this is over cheating accusations, rather than
actual cheating. The evidence against Niemann is just pitifully weak.
Of course, he could have cheated anyway, despite there being no evidence, but that is obviously
not a fruitful or fair line of discourse.

Paul

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Timothy Chow@21:1/5 to peps...@gmail.com on Sun Oct 30 00:13:05 2022

On 10/29/2022 6:31 PM, peps...@gmail.com wrote:

For statistical purposes, 50 matches might not be all that significant.
A PR of 3.0 is very believable for a world-class player and this might be obtained with long stretches
averaging 4.0 and long stretches averaging 2.0. Finding a specific sample with a PR of 2.0 might
be cherry-picking. So yes, there is no evidence of cheating in your post.

I think that 50 matches is significant.

I'm not a sub-4.0 PR player, but it's not uncommon for me to play 10 consecutive 7-point matches where my overall PR is sub-4.0. But I
don't think I've ever managed to play 50 consecutive 7-point matches
with an overall PR under 4.0. I think the best I've managed is around
4.2 or maybe 4.1-something.

During the "Stick cheats" controversy, the debate wasn't about whether
50 matches was statistically significant. The debate was about how
much of a boost you get from playing online in the comfort of your
home, with the pip count conveniently displayed at all times. Stick
wasn't claiming at the time that he would be able to consistently
play a 2.0 in live play. Several other well-known players acknowledged
that favorable playing conditions would help somewhat, but didn't
believe that it was enough to fully explain Stick's 2.0 performance.
They suggested that a proctor pay Stick a visit at his home and observe
him playing online to confirm that he wasn't secretly consulting a bot,
but Stick said that the presence of a proctor would disturb his
concentration. No "resolution" was ever reached, AFAIK.

As far as I know, the main hullabaloo in chess, related to this is over cheating accusations, rather than
actual cheating. The evidence against Niemann is just pitifully weak.
Of course, he could have cheated anyway, despite there being no evidence, but that is obviously
not a fruitful or fair line of discourse.

Well, there's a distinction between online cheating and OTB cheating.
Niemann admitted to cheating online, and chess.com and Ken Regan say
they have no statistical evidence that Niemann has cheated OTB. So
far so good, but chess.com says that Niemann cheated a lot more online
than Niemann said he did. Do you think that chess.com's 70-page report
about Niemann's alleged online cheating is "pitifully weak"?

---
Tim Chow

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Timothy Chow@21:1/5 to peps...@gmail.com on Sun Oct 30 00:27:15 2022

On 10/29/2022 6:31 PM, peps...@gmail.com wrote:

A PR of 3.0 is very believable for a world-class player and this might be obtained with long stretches
averaging 4.0 and long stretches averaging 2.0.

By the way, depending on what you mean by "long," I don't think that
"long stretches averaging 4.0 and long stretches averaging 2.0" is a
realistic scenario. If by "long" you mean 10 matches, then sure. But
if you take one of the players with a BMAB PR of close to 3.0, I don't
think you'll find stretches of 50 matches averaging 4.0 or stretches
of 50 matches averaging 2.0. I'd guess anyone who can manage 50 matches
in a row (out of a total of 1000 matches, say) with an average of 2.0
is going to have an overall PR of 2.5 or better, and similarly anyone
who plays 50 matches in a row (out of 1000) with an overall PR of 4.0
is going to have an overall PR of 3.5 or worse.

---
Tim Chow

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From pepstein5@gmail.com@21:1/5 to Tim Chow on Sun Oct 30 01:54:59 2022

On Sunday, October 30, 2022 at 4:13:10 AM UTC, Tim Chow wrote:

On 10/29/2022 6:31 PM, peps...@gmail.com wrote:

For statistical purposes, 50 matches might not be all that significant.
A PR of 3.0 is very believable for a world-class player and this might be obtained with long stretches
averaging 4.0 and long stretches averaging 2.0. Finding a specific sample with a PR of 2.0 might
be cherry-picking. So yes, there is no evidence of cheating in your post.

I think that 50 matches is significant.

I'm not a sub-4.0 PR player, but it's not uncommon for me to play 10 consecutive 7-point matches where my overall PR is sub-4.0. But I
don't think I've ever managed to play 50 consecutive 7-point matches
with an overall PR under 4.0. I think the best I've managed is around
4.2 or maybe 4.1-something.

During the "Stick cheats" controversy, the debate wasn't about whether
50 matches was statistically significant. The debate was about how
much of a boost you get from playing online in the comfort of your
home, with the pip count conveniently displayed at all times. Stick
wasn't claiming at the time that he would be able to consistently
play a 2.0 in live play. Several other well-known players acknowledged
that favorable playing conditions would help somewhat, but didn't
believe that it was enough to fully explain Stick's 2.0 performance.
They suggested that a proctor pay Stick a visit at his home and observe
him playing online to confirm that he wasn't secretly consulting a bot,
but Stick said that the presence of a proctor would disturb his concentration. No "resolution" was ever reached, AFAIK.

As far as I know, the main hullabaloo in chess, related to this is over cheating accusations, rather than
actual cheating. The evidence against Niemann is just pitifully weak.
Of course, he could have cheated anyway, despite there being no evidence, but that is obviously
not a fruitful or fair line of discourse.

Well, there's a distinction between online cheating and OTB cheating.
Niemann admitted to cheating online, and chess.com and Ken Regan say
they have no statistical evidence that Niemann has cheated OTB. So
far so good, but chess.com says that Niemann cheated a lot more online
than Niemann said he did. Do you think that chess.com's 70-page report
about Niemann's alleged online cheating is "pitifully weak"?

---
Tim Chow

My phrase "evidence against Niemann" refers to the OTB allegations.
I think if all his OTB accusers can do is point to the online evidence, then their
case is pitifully weak. I haven't seen the chess.com report.

Re bg, an important question is whether it constitutes "cheating" to consult match equity tables. Furthermore, does it constitute cheating to take out
a pen and paper, and write down the computations rather than do them in your head?
I think that now (almost) everyone would say that all these behaviours constitute "cheating".
But I don't think that has always been the case.

Paul

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From pepstein5@gmail.com@21:1/5 to peps...@gmail.com on Sun Oct 30 03:51:28 2022

On Sunday, October 30, 2022 at 8:55:00 AM UTC, peps...@gmail.com wrote:

On Sunday, October 30, 2022 at 4:13:10 AM UTC, Tim Chow wrote:

On 10/29/2022 6:31 PM, peps...@gmail.com wrote:

For statistical purposes, 50 matches might not be all that significant.
A PR of 3.0 is very believable for a world-class player and this might be obtained with long stretches
averaging 4.0 and long stretches averaging 2.0. Finding a specific sample with a PR of 2.0 might
be cherry-picking. So yes, there is no evidence of cheating in your post.

I think that 50 matches is significant.

I'm not a sub-4.0 PR player, but it's not uncommon for me to play 10 consecutive 7-point matches where my overall PR is sub-4.0. But I
don't think I've ever managed to play 50 consecutive 7-point matches
with an overall PR under 4.0. I think the best I've managed is around
4.2 or maybe 4.1-something.

During the "Stick cheats" controversy, the debate wasn't about whether
50 matches was statistically significant. The debate was about how
much of a boost you get from playing online in the comfort of your
home, with the pip count conveniently displayed at all times. Stick
wasn't claiming at the time that he would be able to consistently
play a 2.0 in live play. Several other well-known players acknowledged
that favorable playing conditions would help somewhat, but didn't
believe that it was enough to fully explain Stick's 2.0 performance.
They suggested that a proctor pay Stick a visit at his home and observe
him playing online to confirm that he wasn't secretly consulting a bot,
but Stick said that the presence of a proctor would disturb his concentration. No "resolution" was ever reached, AFAIK.

As far as I know, the main hullabaloo in chess, related to this is over cheating accusations, rather than
actual cheating. The evidence against Niemann is just pitifully weak.
Of course, he could have cheated anyway, despite there being no evidence, but that is obviously
not a fruitful or fair line of discourse.

Well, there's a distinction between online cheating and OTB cheating. Niemann admitted to cheating online, and chess.com and Ken Regan say
they have no statistical evidence that Niemann has cheated OTB. So
far so good, but chess.com says that Niemann cheated a lot more online
than Niemann said he did. Do you think that chess.com's 70-page report about Niemann's alleged online cheating is "pitifully weak"?

---
Tim Chow

My phrase "evidence against Niemann" refers to the OTB allegations.
I think if all his OTB accusers can do is point to the online evidence, then their
case is pitifully weak. I haven't seen the chess.com report.

Re bg, an important question is whether it constitutes "cheating" to consult match equity tables. Furthermore, does it constitute cheating to take out
a pen and paper, and write down the computations rather than do them in your head?
I think that now (almost) everyone would say that all these behaviours constitute "cheating".
But I don't think that has always been the case.

Paul

BTW, one of the "Niemann cheated" arguments is that Niemann was often not fully concentrating
when playing Carlsen.
I think this youtube video gives a powerful rebuttal to that line: https://www.youtube.com/watch?v=MeoZ0fKmOrk
Look at how often, Rodriguez has a completely non-concentrated expression, even on the verge of laughter sometimes,
while playing excellent darts.
I don't think pro darts requires less concentration than pro chess.

Paul

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Timothy Chow@21:1/5 to peps...@gmail.com on Sun Oct 30 08:39:11 2022

On 10/30/2022 4:54 AM, peps...@gmail.com wrote:

My phrase "evidence against Niemann" refers to the OTB allegations.
I think if all his OTB accusers can do is point to the online evidence, then their
case is pitifully weak. I haven't seen the chess.com report.

I agree with you regarding the OTB allegations. Here's the chess.com
report:

https://www.chess.com/blog/CHESScom/hans-niemann-report

I find their evidence for Niemann's online cheating pretty strong. If chess.com is right, then in Niemann's high-profile interview in which
he was apparently "coming clean" and being totally honest about all his
past cheating, he was actually blatantly lying and covering up most of
his actual cheating. Of course, this doesn't mean that he cheated OTB,
but I do sympathize with those who think that Niemann wasn't officially punished enough for his infractions. At the same time, I recognize that chess.com and FIDE are in a difficult position when they mostly have to
rely ton statistical evidence.

Re bg, an important question is whether it constitutes "cheating" to consult match equity tables. Furthermore, does it constitute cheating to take out
a pen and paper, and write down the computations rather than do them in your head?
I think that now (almost) everyone would say that all these behaviours constitute "cheating".
But I don't think that has always been the case.

I agree with you that these are now considered "cheating" OTB. But in
the two cases I mentioned (Neil Kazaross and Stick), I don't think these
issues were in dispute---nobody on either side thought that doing these
things would be sufficient to explain the low PRs.

---
Tim Chow

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Nasti Chestikov@21:1/5 to Tim Chow on Sun Oct 30 09:22:44 2022

On Sunday, 30 October 2022 at 12:39:14 UTC, Tim Chow wrote:

https://www.chess.com/blog/CHESScom/hans-niemann-report

I find their evidence for Niemann's online cheating pretty strong.

Tim Chow

Your problem is that Carlsen and chess.com are staring down the barrels of a $100m libel lawsuit and "neither Carlsen nor Chess.com produced concrete evidence for their cheating accusations"

https://www.bbc.co.uk/news/world-us-canada-63338375

This isn't going to end nicely, I suspect Carlsen will be financially ruined unless he comes up with a lot more than suspicions.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From pepstein5@gmail.com@21:1/5 to Nasti Chestikov on Sun Oct 30 11:28:46 2022

On Sunday, October 30, 2022 at 4:22:47 PM UTC, Nasti Chestikov wrote:

On Sunday, 30 October 2022 at 12:39:14 UTC, Tim Chow wrote:

https://www.chess.com/blog/CHESScom/hans-niemann-report

I find their evidence for Niemann's online cheating pretty strong.

Tim Chow

Your problem is that Carlsen and chess.com are staring down the barrels of a $100m libel lawsuit and "neither Carlsen nor Chess.com produced concrete evidence for their cheating accusations"

https://www.bbc.co.uk/news/world-us-canada-63338375

This isn't going to end nicely, I suspect Carlsen will be financially ruined unless he comes up with a lot more than suspicions.

So this is Tim's problem? I didn't know that Tim was acting as a guarantor for the pay awards.

Paul

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Timothy Chow@21:1/5 to Nasti Chestikov on Sun Oct 30 16:45:21 2022

On 10/30/2022 12:22 PM, Nasti Chestikov wrote:

Your problem is that Carlsen and chess.com are staring down the barrels of a $100m libel lawsuit and "neither Carlsen nor Chess.com produced concrete evidence for their cheating accusations"

There's no way Niemann will win this lawsuit, and I'm sure he knows
it. He's just making a political statement, and maybe hoping he'll
bluff one of the defendants into settling out of court.

---
Tim Chow

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Timothy Chow@21:1/5 to peps...@gmail.com on Sun Oct 30 16:49:35 2022

On 10/30/2022 2:28 PM, peps...@gmail.com wrote:

So this is Tim's problem? I didn't know that Tim was acting as a guarantor for the pay awards.

Oh yes, I have a large fleet of fast cars as collateral!

---
Tim Chow

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Nasti Chestikov@21:1/5 to Tim Chow on Mon Oct 31 09:51:36 2022

On Sunday, 30 October 2022 at 20:45:24 UTC, Tim Chow wrote:

There's no way Niemann will win this lawsuit, and I'm sure he knows
it. He's just making a political statement, and maybe hoping he'll
bluff one of the defendants into settling out of court.

---
Tim Chow

The burden of proof in a libel case lies solely with the person making the claim that led to the case being brought. Niemann doesn't have to do anything.

So Carlsen is going to have to come up with **actual** proof to back up his claim that Niemann cheated....or he loses.

It's that simple.

Just saying "well a lot of his moves were the same as the ones that leading bots recommended" isn't enough.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Timothy Chow@21:1/5 to Nasti Chestikov on Mon Oct 31 22:19:28 2022

On 10/31/2022 12:51 PM, Nasti Chestikov wrote:

The burden of proof in a libel case lies solely with the person making the claim that led to the case being brought. Niemann doesn't have to do anything.

LOL. You don't know anything about law, do you?

---
Tim Chow

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MK@21:1/5 to peps...@gmail.com on Tue Nov 1 17:39:44 2022

On October 27, 2022 at 7:43:22 AM UTC-6, peps...@gmail.com wrote:

..... I would think there are powerful technologies
that could use parallel processing to do all rollouts
instantly for an entire match.

I think this is an interesting thread that would
be worth participating in but I seem to post in
sporadic surges and my brain is starting to
feel satiated for one day... :(

But very briefly I would like to state/repeat my
argument that rollouts don't/can't amount to a
"rigorous procedure" to really prove anything.

With there being a significant intersection between
highly skilled maths/computing people and bg people

If I can replace bg with gg (gamblegammon), I
find the "significant intersection between highly
skilled maths/computing people and gamblers"
very intriguing... Does anyone know any serious
research into this?

MK

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Nasti Chestikov@21:1/5 to Tim Chow on Thu Nov 3 10:19:58 2022

On Tuesday, 1 November 2022 at 02:19:34 UTC, Tim Chow wrote:

LOL. You don't know anything about law, do you?

---
Tim Chow

From my (aborted) law school days, the common laws of libel generally only require that the claimant prove that a statement was made by the defendant, and that it was defamatory.

Ergo, for example, I claim that Stick's PR is so low that he must be cheating.

Stick sues me, claiming that a statement was made by me and that it was defamatory.

I then have to justify why I think Stick is cheating.

You *do* see that?

Or has too much exposure to exhaust fumes from those Lambos you're hawking around Las Vegas addled your brain?

Shame. One of this newsgroups most prolific posters has lost the plot.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Timothy Chow@21:1/5 to Nasti Chestikov on Thu Nov 3 22:06:44 2022

On 11/3/2022 1:19 PM, Nasti Chestikov wrote:

From my (aborted) law school days

Did you go to law school in the U.S.? Libel law in the U.S. is
rather different from that of many other countries because of the
First Amendment.

The burden of proof is on the plaintiff to prove that the defendant's
claim is false. You don't need to go to law school to know that this
is the way things work in the U.S.

---
Tim Chow

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Timothy Chow@21:1/5 to All on Tue Nov 15 22:35:21 2022

On 11/3/2022 10:06 PM, I wrote:

On 11/3/2022 1:19 PM, Nasti Chestikov wrote:

From my (aborted) law school days

Did you go to law school in the U.S.? Libel law in the U.S. is
rather different from that of many other countries because of the
First Amendment.

The burden of proof is on the plaintiff to prove that the defendant's
claim is false. You don't need to go to law school to know that this
is the way things work in the U.S.

Here's an interesting Legal Eagle video about the lawsuit.

https://www.youtube.com/watch?v=Gkd1Q0Ntt9s&t=605s

On the topic of burden of proof, listen particularly to remarks starting
at 13:33, 14:18, 16:37.

I'm still curious which law school you attended?

---
Tim Chow

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Timothy Chow@21:1/5 to All on Tue Nov 15 22:37:59 2022

On 11/3/2022 10:06 PM, I wrote:

On 11/3/2022 1:19 PM, Nasti Chestikov wrote:

From my (aborted) law school days

Did you go to law school in the U.S.? Libel law in the U.S. is
rather different from that of many other countries because of the
First Amendment.

The burden of proof is on the plaintiff to prove that the defendant's
claim is false. You don't need to go to law school to know that this
is the way things work in the U.S.

There's an interesting Legal Eagle video about Niemann's lawsuit.

https://www.youtube.com/watch?v=Gkd1Q0Ntt9s&t=605s

As you'll see, for Niemann to prevail, he would need to prove a lot of
things that he probably can't prove.

I'm still curious which law school you attended? How many classes did
you complete?

---
Tim Chow

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Timothy Chow@21:1/5 to Nasti Chestikov on Tue Jun 27 22:44:21 2023

On 10/30/2022 12:22 PM, Nasti Chestikov wrote:

On Sunday, 30 October 2022 at 12:39:14 UTC, Tim Chow wrote:

https://www.chess.com/blog/CHESScom/hans-niemann-report

I find their evidence for Niemann's online cheating pretty strong.

Tim Chow

Your problem is that Carlsen and chess.com are staring down the barrels of a $100m libel lawsuit and "neither Carlsen nor Chess.com produced concrete evidence for their cheating accusations"

https://www.bbc.co.uk/news/world-us-canada-63338375

This isn't going to end nicely, I suspect Carlsen will be financially ruined unless he comes up with a lot more than suspicions.

Since Nasti Chestikov said this was my problem, I figured I should
alert r.g.b. that my problem has been solved. Niemann's lawsuit has
been dismissed, as you can easily confirm from any major news source.
Evidently Nasti's law school credentials didn't help him accurately
predict the future of Carlsen's finances.

---
Tim Chow

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Keyop
  Sun Apr 28 20:37:53 2024
  from Huddersfield, West Yorkshire via SSH
- Keyop
  Sun Apr 28 20:37:37 2024
  from Huddersfield, West Yorkshire via SSH
- Keyop
  Sun Apr 28 20:30:04 2024
  from Huddersfield, West Yorkshire via SSH
- Bob Worm
  Mon Apr 29 09:04:47 2024
  from Wales, Uk via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	297
Nodes:	16 (2 / 14)
Uptime:	19:41:47
Calls:	6,667
Calls today:	1
Files:	12,216
Messages:	5,337,043

The hypothesis that XG "doesn't evaluate blotty boards well."

Who's Online

Recent Visitors

System Info