• The hypothesis that XG "doesn't evaluate blotty boards well."

    From pepstein5@gmail.com@21:1/5 to All on Thu Oct 27 01:02:59 2022
    Tim has spoken earlier about the need to be statistically careful
    before reaching conclusions in backgammon. A particular
    example of this is that people can naively see unusual runs
    of the dice and assert non-randomness without testing.

    So, surely we should apply similar statistical discipline to the
    idea that XG has a specific problem with blotty boards.
    First, let's assume that the concept of a blotty-board position can be well-defined. This might mean that the board is already blotty or it
    might mean that blots can be created in the inner board.

    At a minimum, this assertion should require at least some evidence for the following:
    1) XG tends to lose more equity in blotty-board positions than in other non-contact positions.

    2) XG loses more equity by choosing unnecessarily blotty plays than
    by missing opportunities to correctly leave inner board blots.

    3) There is an identifiable category of blotty-board positions where XG's
    play is worse than the best humans. [I would rigorously define "not good"
    play by a bot as being below best-human standard, but that might be idiosyncratic].

    I don't see Tim looking into any of the above three points.
    At the moment, what Tim seems to be doing is to note what good
    statistical practice is, and then do exactly the opposite. He seems
    to be doing this:

    1) Cherry-pick positions where XG surprisingly leaves blots.
    2) Roll these out.
    3) Report it whenever XG has made an error.
    4) Ignore the situation whenever XG has been correct [this is admittedly
    a guess].
    5) Naively assert that he has discovered a problem.

    Paul

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From pepstein5@gmail.com@21:1/5 to peps...@gmail.com on Thu Oct 27 04:44:48 2022
    On Thursday, October 27, 2022 at 9:03:01 AM UTC+1, peps...@gmail.com wrote:
    Tim has spoken earlier about the need to be statistically careful
    before reaching conclusions in backgammon. A particular
    example of this is that people can naively see unusual runs
    of the dice and assert non-randomness without testing.

    So, surely we should apply similar statistical discipline to the
    idea that XG has a specific problem with blotty boards.
    First, let's assume that the concept of a blotty-board position can be well-defined. This might mean that the board is already blotty or it
    might mean that blots can be created in the inner board.

    At a minimum, this assertion should require at least some evidence for the following:
    1) XG tends to lose more equity in blotty-board positions than in other non-contact positions.

    2) XG loses more equity by choosing unnecessarily blotty plays than
    by missing opportunities to correctly leave inner board blots.

    3) There is an identifiable category of blotty-board positions where XG's play is worse than the best humans. [I would rigorously define "not good" play by a bot as being below best-human standard, but that might be idiosyncratic].

    I don't see Tim looking into any of the above three points.
    At the moment, what Tim seems to be doing is to note what good
    statistical practice is, and then do exactly the opposite. He seems
    to be doing this:

    1) Cherry-pick positions where XG surprisingly leaves blots.
    2) Roll these out.
    3) Report it whenever XG has made an error.
    4) Ignore the situation whenever XG has been correct [this is admittedly
    a guess].
    5) Naively assert that he has discovered a problem.

    Paul

    non-contact -> contact

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Timothy Chow@21:1/5 to peps...@gmail.com on Thu Oct 27 08:14:49 2022
    On 10/27/2022 4:02 AM, peps...@gmail.com wrote:
    Tim has spoken earlier about the need to be statistically careful
    before reaching conclusions in backgammon. A particular
    example of this is that people can naively see unusual runs
    of the dice and assert non-randomness without testing.

    So, surely we should apply similar statistical discipline to the
    idea that XG has a specific problem with blotty boards.

    One could do that, of course. But for comparison, let's look at
    how serious backgammon players try to lower their PR. What do they
    do? Do they apply rigorous statistical discipline as you suggest?
    No. They run their games through a bot, examine what the bot says
    are errors, try to understand them, and then try to adjust their
    play accordingly. I don't know of anyone who applies rigorous
    statistical procedures to determine whether (for example) they should
    step out to the bar point more often if they want to lower their PR. Nevertheless, by all accounts, this non-rigorous procedure seems to
    work.

    You could take the point of view that what's going on is that people
    are fooling themselves. Maybe their PR isn't actually getting better,
    or maybe it's getting better for reasons that have nothing to do with
    the patterns they think they have discerned. Maybe their PR would
    decrease even faster if they were to *not* consult XG at all. I don't
    find these hypotheses plausible, but I'm also not going to invest any
    time trying to support or refute them using rigorous statistical
    methodology.

    The observation about XG and blotty boards is similar. If you pay
    some attention, you'll see the pattern for yourself, just as when I
    pay attention to XG's evaluations of my play, I notice that I cash
    when TG far too often. Can I show you a rigorous statistical experiment
    that proves that I cash when TG too often? No. Do I care that I have
    no such experiment? No.

    To be clear, I'm not saying that XG's blotty-board tendencies lose a
    lot of equity. In fact, I would say they typically don't, and it's
    precisely *because* they usually don't matter much that XG does this
    sort of thing. I have a folder with over 400 positions I've collected
    where XG makes errors I found interesting, and not very many of these
    are blotty-board errors, because after I learned that they usually
    don't cost a lot of equity, I mostly stopped collecting them. What
    I'm doing by posting to r.g.b. is offering some free advice that if
    XGR+ dings you with a 0.057 "error" for making a natural play instead
    of its nutty 5/1 board-breaking play, then you should take it with a
    grain of salt. If you prefer to ignore the advice until you see
    statistical proof, you're of course free to do so.

    ---
    Tim Chow

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From pepstein5@gmail.com@21:1/5 to Tim Chow on Thu Oct 27 05:32:37 2022
    On Thursday, October 27, 2022 at 1:14:52 PM UTC+1, Tim Chow wrote:
    On 10/27/2022 4:02 AM, peps...@gmail.com wrote:
    Tim has spoken earlier about the need to be statistically careful
    before reaching conclusions in backgammon. A particular
    example of this is that people can naively see unusual runs
    of the dice and assert non-randomness without testing.

    So, surely we should apply similar statistical discipline to the
    idea that XG has a specific problem with blotty boards.
    One could do that, of course. But for comparison, let's look at
    how serious backgammon players try to lower their PR. What do they
    do? Do they apply rigorous statistical discipline as you suggest?
    No. They run their games through a bot, examine what the bot says
    are errors, try to understand them, and then try to adjust their
    play accordingly. I don't know of anyone who applies rigorous
    statistical procedures to determine whether (for example) they should
    step out to the bar point more often if they want to lower their PR. Nevertheless, by all accounts, this non-rigorous procedure seems to
    work.

    You could take the point of view that what's going on is that people
    are fooling themselves. Maybe their PR isn't actually getting better,
    or maybe it's getting better for reasons that have nothing to do with
    the patterns they think they have discerned. Maybe their PR would
    decrease even faster if they were to *not* consult XG at all. I don't
    find these hypotheses plausible, but I'm also not going to invest any
    time trying to support or refute them using rigorous statistical
    methodology.

    The observation about XG and blotty boards is similar. If you pay
    some attention, you'll see the pattern for yourself, just as when I
    pay attention to XG's evaluations of my play, I notice that I cash
    when TG far too often. Can I show you a rigorous statistical experiment
    that proves that I cash when TG too often? No. Do I care that I have
    no such experiment? No.

    To be clear, I'm not saying that XG's blotty-board tendencies lose a
    lot of equity. In fact, I would say they typically don't, and it's
    precisely *because* they usually don't matter much that XG does this
    sort of thing. I have a folder with over 400 positions I've collected
    where XG makes errors I found interesting, and not very many of these
    are blotty-board errors, because after I learned that they usually
    don't cost a lot of equity, I mostly stopped collecting them. What
    I'm doing by posting to r.g.b. is offering some free advice that if
    XGR+ dings you with a 0.057 "error" for making a natural play instead
    of its nutty 5/1 board-breaking play, then you should take it with a
    grain of salt. If you prefer to ignore the advice until you see
    statistical proof, you're of course free to do so.

    A thoughtful response. Thanks.
    BTW, I've long had a hypothesis (also unproven and not very well
    substantiated) that sub-world-class players generally have a tendency to
    cash too readily in TG positions.
    Thanks for being one confirmatory data point in this totally biased hypothesis.

    No, I don't want to ignore your advice. I think the blotty play you allude to might be a bit less significant than you think it is, but, of course, I'm not telepathic
    and don't know your exact thoughts.

    Thanks,

    Paul

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Timothy Chow@21:1/5 to peps...@gmail.com on Thu Oct 27 09:07:40 2022
    On 10/27/2022 8:32 AM, peps...@gmail.com wrote:
    No, I don't want to ignore your advice. I think the blotty play you allude to
    might be a bit less significant than you think it is, but, of course, I'm not telepathic
    and don't know your exact thoughts.

    As a point of clarification, there are two different types
    of "errors" by XG that we might care about (and by "errors"
    I mean a play for which XG's verdict is different depending
    on whether you roll it out or use a lower-strength evaluation
    or truncated rollout, and where we assume that the rollout
    with the strongest settings is "correct"). For simplicity,
    let's say that there are just two plausible candidate plays,
    A and B, and let's use "XGR+" to refer to the weaker setting.
    Without loss of generality, assume that the rollout favors
    play A and XGR+ favors play B.

    1. The rollout says the equity difference is large.

    2. XGR+ says the equity difference is large but the rollout
    says the equity difference is small.

    (The remaining possibility, that both settings say that the
    equity difference is small, I don't really care about.)

    People mostly pay attention to 1. This is understandable
    if your goal is to assess how well XGR+ plays; in case 2,
    the error that XGR+ makes is small, so who cares? But note
    that increasingly, computers are being used to *assess human
    play*. The BMAB awards titles based on how XG rates your play.
    People privately use XG to identify their own errors, usually
    focusing on cases where XG says they made a big error. If XG
    is playing this kind of role, then case 2 matters just as much
    as case 1.

    I think that the importance of case 2 errors has been
    underestimated or even ignored, because people fail to grasp
    the difference between XGR+ as a player and XGR+ as a judge.
    So part of the reason I posted that position was to give an
    example of a case 2 error. The smallness of the rollout equity
    difference was therefore a feature and not a bug.

    ---
    Tim Chow

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From pepstein5@gmail.com@21:1/5 to Tim Chow on Thu Oct 27 06:43:20 2022
    On Thursday, October 27, 2022 at 2:07:43 PM UTC+1, Tim Chow wrote:
    On 10/27/2022 8:32 AM, peps...@gmail.com wrote:
    No, I don't want to ignore your advice. I think the blotty play you allude to
    might be a bit less significant than you think it is, but, of course, I'm not telepathic
    and don't know your exact thoughts.
    As a point of clarification, there are two different types
    of "errors" by XG that we might care about (and by "errors"
    I mean a play for which XG's verdict is different depending
    on whether you roll it out or use a lower-strength evaluation
    or truncated rollout, and where we assume that the rollout
    with the strongest settings is "correct"). For simplicity,
    let's say that there are just two plausible candidate plays,
    A and B, and let's use "XGR+" to refer to the weaker setting.
    Without loss of generality, assume that the rollout favors
    play A and XGR+ favors play B.

    1. The rollout says the equity difference is large.

    2. XGR+ says the equity difference is large but the rollout
    says the equity difference is small.

    (The remaining possibility, that both settings say that the
    equity difference is small, I don't really care about.)

    People mostly pay attention to 1. This is understandable
    if your goal is to assess how well XGR+ plays; in case 2,
    the error that XGR+ makes is small, so who cares? But note
    that increasingly, computers are being used to *assess human
    play*. The BMAB awards titles based on how XG rates your play.
    People privately use XG to identify their own errors, usually
    focusing on cases where XG says they made a big error. If XG
    is playing this kind of role, then case 2 matters just as much
    as case 1.

    I think that the importance of case 2 errors has been
    underestimated or even ignored, because people fail to grasp
    the difference between XGR+ as a player and XGR+ as a judge.
    So part of the reason I posted that position was to give an
    example of a case 2 error. The smallness of the rollout equity
    difference was therefore a feature and not a bug.

    ---
    Tim Chow

    Could a solution to all this be to find a bg expert who has access
    to extremely powerful supercomputers? I would think there are
    powerful technologies that could use parallel processing to do
    all rollouts instantly for an entire match.

    With there being a significant intersection between highly skilled maths/computing people and bg people, I would have thought finding
    such a person could be feasible.

    Or is all this harder than I think it is?

    Paul

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Timothy Chow@21:1/5 to peps...@gmail.com on Fri Oct 28 08:23:51 2022
    On 10/27/2022 9:43 AM, peps...@gmail.com wrote:
    Could a solution to all this be to find a bg expert who has access
    to extremely powerful supercomputers? I would think there are
    powerful technologies that could use parallel processing to do
    all rollouts instantly for an entire match.

    With there being a significant intersection between highly skilled maths/computing people and bg people, I would have thought finding
    such a person could be feasible.

    Or is all this harder than I think it is?

    First of all, my impression is that the BMAB runs on a shoestring
    budget, so what is theoretically possible may not be doable in
    practice. I recall that one thing Stick wanted was for players to
    be able to mark some plays in advance for rolling out (so that he
    wouldn't be penalized for plays which he knew would be misevaluated
    by XGR+), but the BMAB does not do this. I don't know the reasons,
    but my guess is that it would require too much work for them to
    accommodate Stick's request.

    The other thing is that I don't think XG is designed to run on a
    cloud. I'm also not sure there's an easy way to get it to roll
    out every last candidate for every decision. For starters, it
    has a built-in limit of 32 checker-play candidates for each move.
    (Maybe that's enough for BMAB purposes, though.) Probably one
    would have to pay Xavier to do some development work to enable
    what you're suggesting, and again, presumably the BMAB doesn't
    think this is an effective use of whatever limited money it has.

    ---
    Tim Chow

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From pepstein5@gmail.com@21:1/5 to Tim Chow on Fri Oct 28 07:50:20 2022
    On Friday, October 28, 2022 at 1:23:53 PM UTC+1, Tim Chow wrote:
    On 10/27/2022 9:43 AM, peps...@gmail.com wrote:
    Could a solution to all this be to find a bg expert who has access
    to extremely powerful supercomputers? I would think there are
    powerful technologies that could use parallel processing to do
    all rollouts instantly for an entire match.

    With there being a significant intersection between highly skilled maths/computing people and bg people, I would have thought finding
    such a person could be feasible.

    Or is all this harder than I think it is?
    First of all, my impression is that the BMAB runs on a shoestring
    budget, so what is theoretically possible may not be doable in
    practice. I recall that one thing Stick wanted was for players to
    be able to mark some plays in advance for rolling out (so that he
    wouldn't be penalized for plays which he knew would be misevaluated
    by XGR+), but the BMAB does not do this. I don't know the reasons,
    but my guess is that it would require too much work for them to
    accommodate Stick's request.

    The other thing is that I don't think XG is designed to run on a
    cloud. I'm also not sure there's an easy way to get it to roll
    out every last candidate for every decision. For starters, it
    has a built-in limit of 32 checker-play candidates for each move.
    (Maybe that's enough for BMAB purposes, though.) Probably one
    would have to pay Xavier to do some development work to enable
    what you're suggesting, and again, presumably the BMAB doesn't
    think this is an effective use of whatever limited money it has.

    I really like Stick's idea on the rollout (which I can remember from
    previous threads). One possible objection (which I don't share) is that
    it creates confustion to combine the task of competing with the post-competition
    evaluation. However, there is a neat precedent for this.
    In some major soccer competitions (I think this includes the World Cup), players can "claim the goal" if they think that they scored but another player could also plausibly claim to have scored.
    Clearly, the identity of the scorers is not an issue for the purpose of identifying
    the winning team, but players who "claim the goal" have an eye to their evaluation
    afterwards.

    Paul

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Timothy Chow@21:1/5 to peps...@gmail.com on Fri Oct 28 22:35:40 2022
    On 10/28/2022 10:50 AM, peps...@gmail.com wrote:
    I really like Stick's idea on the rollout (which I can remember from
    previous threads). One possible objection (which I don't share) is that
    it creates confustion to combine the task of competing with the post-competition
    evaluation. However, there is a neat precedent for this.

    Again, I don't know the true objections, but I know that if I were
    in charge, what I would dread most would be the logistics.
    How do people submit their candidates, how do you do this in a
    standardized manner, how do you take elementary precautions against
    people trying to cheat by secretly checking a bot before submitting
    their candidates, how do you handle lost records, how do you settle
    disputes, etc. It's just a nightmare. I'm sure that even with the
    current relatively simple system, irregularities occur with some
    frequency and cause more headaches than the BMAB would like. In the
    end, it's going to make only a small difference for a small number
    of people. Not much bang for the buck from the BMAB's point of view.

    ---
    Tim Chow

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From pepstein5@gmail.com@21:1/5 to Tim Chow on Sat Oct 29 01:51:17 2022
    On Saturday, October 29, 2022 at 3:35:43 AM UTC+1, Tim Chow wrote:
    On 10/28/2022 10:50 AM, peps...@gmail.com wrote:
    I really like Stick's idea on the rollout (which I can remember from previous threads). One possible objection (which I don't share) is that
    it creates confustion to combine the task of competing with the post-competition
    evaluation. However, there is a neat precedent for this.
    Again, I don't know the true objections, but I know that if I were
    in charge, what I would dread most would be the logistics.
    How do people submit their candidates, how do you do this in a
    standardized manner, how do you take elementary precautions against
    people trying to cheat by secretly checking a bot before submitting
    their candidates, how do you handle lost records, how do you settle
    disputes, etc. It's just a nightmare. I'm sure that even with the
    current relatively simple system, irregularities occur with some
    frequency and cause more headaches than the BMAB would like. In the
    end, it's going to make only a small difference for a small number
    of people. Not much bang for the buck from the BMAB's point of view.

    ---
    Tim Chow

    Fair points. Here's a suggestion below.
    Someone sets up a website (or a facility on an existing website) where,
    after for a period of one week after any BMAB-regulated match, players can
    post plays. They tick a checkbox that they haven't previously rolled out the plays. Players agree (informally, no one checks this) to do rollouts
    (with clear settings) for each play that they've posted to the website.
    Players then post the rollout results to the website, and the website then adjusts their PR.
    Of course, players can totally cheat this system if they want to.
    Players will then have an official BMAB PR, and a self-reported adjusted BMAB PR.

    It's then up to the bg public whether they want to believe in the self-reported adjustments or be skeptical.
    I would be interested in the self-reported figures even though players can't prove these.
    (I'm certainly interested in Stick's claims about his abilities (and I certainly choose to
    believe him) even when I haven't seen any proof).
    And I think this lack of skepticism is somewhat natural, particularly when players have been
    proven (by the offical BMAB results) to be strong players.
    For example, if an acclaimed mathematician (say someone with a postdoctoral position
    at Harvard) claimed that they tried the most recent Putnam exam by themselves without
    cheating, and scored 100%, don't you think people would believe the mathematician rather
    than suspect that they're just lying and actually googled the solutions?

    The only catch I can see is that I'm not sure how much work setting up the website would be.
    I certainly wouldn't know how to do this. But it must surely be far far less work than setting up
    bglog.org for example.

    Some sort of website or semi-official reporting system does seem necessary though. Without something
    like this, it would be too easy for players to forget the cases where the adjustment goes against them
    and include the others.
    With my reporting system, I don't think that such a "forgetting" problem can happen, (but clearly
    blatant cheating can).

    Paul

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Timothy Chow@21:1/5 to peps...@gmail.com on Sat Oct 29 08:04:22 2022
    On 10/29/2022 4:51 AM, peps...@gmail.com wrote:
    For example, if an acclaimed mathematician (say someone with a
    postdoctoral position at Harvard) claimed that they tried the most
    recent Putnam exam by themselves without cheating, and scored 100%,
    don't you think people would believe the mathematician rather
    than suspect that they're just lying and actually googled the
    solutions?

    Some people would believe and others were not. People cheat all the
    time, sometimes for seemingly inexplicable reasons. And even if they
    don't cheat, others will suspect them of cheating.

    Years ago, Iancho Hristov was informally tracking various people's
    PR's, based on available data from online and in-person recorded
    matches. Stick was at the top of Iancho's list, based on his online
    play. At one point, Stick's PR, averaged over 50 matches, was 2.0.
    Many people were convinced that Stick was cheating, presumably by
    consulting a bot at crucial moments. (For the record, at the time,
    I was one of the few people posting to BGO who was saying that I
    didn't believe that Stick was cheating.) They were saying he should demonstrate a 2.0 PR in live play, or submit to live proctoring.
    On another occasion, Neil Kazaross was playing an on-line match
    against the rest of BGO and was achieving very low PR's. Again,
    there were accusations that he was consulting a bot at crucial moments.

    You must have heard about the current hullabaloo over cheating in
    chess at high levels. Some people, it seems, cheat *more* when nothing
    is at stake; others cheat more when the stakes are higher.

    Having said all that, I do think there is one potential benefit to your suggestion, which is that it would probably generate a lot of debate
    about whether cheating was going on, and the extra publicity would
    probably be good for the BMAB. There's nothing like a good controversy
    to get people to pay attention to something they would otherwise have
    no interest in.

    ---
    Tim Chow

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From pepstein5@gmail.com@21:1/5 to Tim Chow on Sat Oct 29 15:31:29 2022
    On Saturday, October 29, 2022 at 1:04:24 PM UTC+1, Tim Chow wrote:
    On 10/29/2022 4:51 AM, peps...@gmail.com wrote:
    For example, if an acclaimed mathematician (say someone with a
    postdoctoral position at Harvard) claimed that they tried the most
    recent Putnam exam by themselves without cheating, and scored 100%,
    don't you think people would believe the mathematician rather
    than suspect that they're just lying and actually googled the
    solutions?
    Some people would believe and others were not. People cheat all the
    time, sometimes for seemingly inexplicable reasons. And even if they
    don't cheat, others will suspect them of cheating.

    Years ago, Iancho Hristov was informally tracking various people's
    PR's, based on available data from online and in-person recorded
    matches. Stick was at the top of Iancho's list, based on his online
    play. At one point, Stick's PR, averaged over 50 matches, was 2.0.
    Many people were convinced that Stick was cheating, presumably by
    consulting a bot at crucial moments. (For the record, at the time,
    I was one of the few people posting to BGO who was saying that I
    didn't believe that Stick was cheating.) They were saying he should demonstrate a 2.0 PR in live play, or submit to live proctoring.
    On another occasion, Neil Kazaross was playing an on-line match
    against the rest of BGO and was achieving very low PR's. Again,
    there were accusations that he was consulting a bot at crucial moments.

    You must have heard about the current hullabaloo over cheating in
    chess at high levels. Some people, it seems, cheat *more* when nothing
    is at stake; others cheat more when the stakes are higher.

    Having said all that, I do think there is one potential benefit to your suggestion, which is that it would probably generate a lot of debate
    about whether cheating was going on, and the extra publicity would
    probably be good for the BMAB. There's nothing like a good controversy
    to get people to pay attention to something they would otherwise have
    no interest in.

    My point is that it may be of interest how accomplished people evaluate themselves, even if people are likely to exaggerate.

    I think that for a "you cheated" argument to have any credibility, actual positions
    need to be pointed out where strong human play is unlikely (or
    unlikely to be achievable with consistency).
    For example, in chess, to beat a GM where every single move of the winner follows Stockfish's
    first preference does, at first, sound suspicious. But the suspicion vanishes if it is discovered
    that the two players were following well-established opening theory until the GM blundered
    into a four-move winning combo (losing from the GM's point of view) which the GM's opponent spotted,
    and that the GM resigned at the end of the combo.

    For statistical purposes, 50 matches might not be all that significant.
    A PR of 3.0 is very believable for a world-class player and this might be obtained with long stretches
    averaging 4.0 and long stretches averaging 2.0. Finding a specific sample with a PR of 2.0 might
    be cherry-picking. So yes, there is no evidence of cheating in your post.

    As far as I know, the main hullabaloo in chess, related to this is over cheating accusations, rather than
    actual cheating. The evidence against Niemann is just pitifully weak.
    Of course, he could have cheated anyway, despite there being no evidence, but that is obviously
    not a fruitful or fair line of discourse.

    Paul

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Timothy Chow@21:1/5 to peps...@gmail.com on Sun Oct 30 00:13:05 2022
    On 10/29/2022 6:31 PM, peps...@gmail.com wrote:
    For statistical purposes, 50 matches might not be all that significant.
    A PR of 3.0 is very believable for a world-class player and this might be obtained with long stretches
    averaging 4.0 and long stretches averaging 2.0. Finding a specific sample with a PR of 2.0 might
    be cherry-picking. So yes, there is no evidence of cheating in your post.

    I think that 50 matches is significant.

    I'm not a sub-4.0 PR player, but it's not uncommon for me to play 10 consecutive 7-point matches where my overall PR is sub-4.0. But I
    don't think I've ever managed to play 50 consecutive 7-point matches
    with an overall PR under 4.0. I think the best I've managed is around
    4.2 or maybe 4.1-something.

    During the "Stick cheats" controversy, the debate wasn't about whether
    50 matches was statistically significant. The debate was about how
    much of a boost you get from playing online in the comfort of your
    home, with the pip count conveniently displayed at all times. Stick
    wasn't claiming at the time that he would be able to consistently
    play a 2.0 in live play. Several other well-known players acknowledged
    that favorable playing conditions would help somewhat, but didn't
    believe that it was enough to fully explain Stick's 2.0 performance.
    They suggested that a proctor pay Stick a visit at his home and observe
    him playing online to confirm that he wasn't secretly consulting a bot,
    but Stick said that the presence of a proctor would disturb his
    concentration. No "resolution" was ever reached, AFAIK.

    As far as I know, the main hullabaloo in chess, related to this is over cheating accusations, rather than
    actual cheating. The evidence against Niemann is just pitifully weak.
    Of course, he could have cheated anyway, despite there being no evidence, but that is obviously
    not a fruitful or fair line of discourse.

    Well, there's a distinction between online cheating and OTB cheating.
    Niemann admitted to cheating online, and chess.com and Ken Regan say
    they have no statistical evidence that Niemann has cheated OTB. So
    far so good, but chess.com says that Niemann cheated a lot more online
    than Niemann said he did. Do you think that chess.com's 70-page report
    about Niemann's alleged online cheating is "pitifully weak"?

    ---
    Tim Chow

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Timothy Chow@21:1/5 to peps...@gmail.com on Sun Oct 30 00:27:15 2022
    On 10/29/2022 6:31 PM, peps...@gmail.com wrote:
    A PR of 3.0 is very believable for a world-class player and this might be obtained with long stretches
    averaging 4.0 and long stretches averaging 2.0.

    By the way, depending on what you mean by "long," I don't think that
    "long stretches averaging 4.0 and long stretches averaging 2.0" is a
    realistic scenario. If by "long" you mean 10 matches, then sure. But
    if you take one of the players with a BMAB PR of close to 3.0, I don't
    think you'll find stretches of 50 matches averaging 4.0 or stretches
    of 50 matches averaging 2.0. I'd guess anyone who can manage 50 matches
    in a row (out of a total of 1000 matches, say) with an average of 2.0
    is going to have an overall PR of 2.5 or better, and similarly anyone
    who plays 50 matches in a row (out of 1000) with an overall PR of 4.0
    is going to have an overall PR of 3.5 or worse.

    ---
    Tim Chow

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From pepstein5@gmail.com@21:1/5 to Tim Chow on Sun Oct 30 01:54:59 2022
    On Sunday, October 30, 2022 at 4:13:10 AM UTC, Tim Chow wrote:
    On 10/29/2022 6:31 PM, peps...@gmail.com wrote:
    For statistical purposes, 50 matches might not be all that significant.
    A PR of 3.0 is very believable for a world-class player and this might be obtained with long stretches
    averaging 4.0 and long stretches averaging 2.0. Finding a specific sample with a PR of 2.0 might
    be cherry-picking. So yes, there is no evidence of cheating in your post.
    I think that 50 matches is significant.

    I'm not a sub-4.0 PR player, but it's not uncommon for me to play 10 consecutive 7-point matches where my overall PR is sub-4.0. But I
    don't think I've ever managed to play 50 consecutive 7-point matches
    with an overall PR under 4.0. I think the best I've managed is around
    4.2 or maybe 4.1-something.

    During the "Stick cheats" controversy, the debate wasn't about whether
    50 matches was statistically significant. The debate was about how
    much of a boost you get from playing online in the comfort of your
    home, with the pip count conveniently displayed at all times. Stick
    wasn't claiming at the time that he would be able to consistently
    play a 2.0 in live play. Several other well-known players acknowledged
    that favorable playing conditions would help somewhat, but didn't
    believe that it was enough to fully explain Stick's 2.0 performance.
    They suggested that a proctor pay Stick a visit at his home and observe
    him playing online to confirm that he wasn't secretly consulting a bot,
    but Stick said that the presence of a proctor would disturb his concentration. No "resolution" was ever reached, AFAIK.
    As far as I know, the main hullabaloo in chess, related to this is over cheating accusations, rather than
    actual cheating. The evidence against Niemann is just pitifully weak.
    Of course, he could have cheated anyway, despite there being no evidence, but that is obviously
    not a fruitful or fair line of discourse.
    Well, there's a distinction between online cheating and OTB cheating.
    Niemann admitted to cheating online, and chess.com and Ken Regan say
    they have no statistical evidence that Niemann has cheated OTB. So
    far so good, but chess.com says that Niemann cheated a lot more online
    than Niemann said he did. Do you think that chess.com's 70-page report
    about Niemann's alleged online cheating is "pitifully weak"?

    ---
    Tim Chow

    My phrase "evidence against Niemann" refers to the OTB allegations.
    I think if all his OTB accusers can do is point to the online evidence, then their
    case is pitifully weak. I haven't seen the chess.com report.

    Re bg, an important question is whether it constitutes "cheating" to consult match equity tables. Furthermore, does it constitute cheating to take out
    a pen and paper, and write down the computations rather than do them in your head?
    I think that now (almost) everyone would say that all these behaviours constitute "cheating".
    But I don't think that has always been the case.

    Paul

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From pepstein5@gmail.com@21:1/5 to peps...@gmail.com on Sun Oct 30 03:51:28 2022
    On Sunday, October 30, 2022 at 8:55:00 AM UTC, peps...@gmail.com wrote:
    On Sunday, October 30, 2022 at 4:13:10 AM UTC, Tim Chow wrote:
    On 10/29/2022 6:31 PM, peps...@gmail.com wrote:
    For statistical purposes, 50 matches might not be all that significant.
    A PR of 3.0 is very believable for a world-class player and this might be obtained with long stretches
    averaging 4.0 and long stretches averaging 2.0. Finding a specific sample with a PR of 2.0 might
    be cherry-picking. So yes, there is no evidence of cheating in your post.
    I think that 50 matches is significant.

    I'm not a sub-4.0 PR player, but it's not uncommon for me to play 10 consecutive 7-point matches where my overall PR is sub-4.0. But I
    don't think I've ever managed to play 50 consecutive 7-point matches
    with an overall PR under 4.0. I think the best I've managed is around
    4.2 or maybe 4.1-something.

    During the "Stick cheats" controversy, the debate wasn't about whether
    50 matches was statistically significant. The debate was about how
    much of a boost you get from playing online in the comfort of your
    home, with the pip count conveniently displayed at all times. Stick
    wasn't claiming at the time that he would be able to consistently
    play a 2.0 in live play. Several other well-known players acknowledged
    that favorable playing conditions would help somewhat, but didn't
    believe that it was enough to fully explain Stick's 2.0 performance.
    They suggested that a proctor pay Stick a visit at his home and observe
    him playing online to confirm that he wasn't secretly consulting a bot,
    but Stick said that the presence of a proctor would disturb his concentration. No "resolution" was ever reached, AFAIK.
    As far as I know, the main hullabaloo in chess, related to this is over cheating accusations, rather than
    actual cheating. The evidence against Niemann is just pitifully weak.
    Of course, he could have cheated anyway, despite there being no evidence, but that is obviously
    not a fruitful or fair line of discourse.
    Well, there's a distinction between online cheating and OTB cheating. Niemann admitted to cheating online, and chess.com and Ken Regan say
    they have no statistical evidence that Niemann has cheated OTB. So
    far so good, but chess.com says that Niemann cheated a lot more online
    than Niemann said he did. Do you think that chess.com's 70-page report about Niemann's alleged online cheating is "pitifully weak"?

    ---
    Tim Chow
    My phrase "evidence against Niemann" refers to the OTB allegations.
    I think if all his OTB accusers can do is point to the online evidence, then their
    case is pitifully weak. I haven't seen the chess.com report.

    Re bg, an important question is whether it constitutes "cheating" to consult match equity tables. Furthermore, does it constitute cheating to take out
    a pen and paper, and write down the computations rather than do them in your head?
    I think that now (almost) everyone would say that all these behaviours constitute "cheating".
    But I don't think that has always been the case.

    Paul

    BTW, one of the "Niemann cheated" arguments is that Niemann was often not fully concentrating
    when playing Carlsen.
    I think this youtube video gives a powerful rebuttal to that line: https://www.youtube.com/watch?v=MeoZ0fKmOrk
    Look at how often, Rodriguez has a completely non-concentrated expression, even on the verge of laughter sometimes,
    while playing excellent darts.
    I don't think pro darts requires less concentration than pro chess.

    Paul

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Timothy Chow@21:1/5 to peps...@gmail.com on Sun Oct 30 08:39:11 2022
    On 10/30/2022 4:54 AM, peps...@gmail.com wrote:
    My phrase "evidence against Niemann" refers to the OTB allegations.
    I think if all his OTB accusers can do is point to the online evidence, then their
    case is pitifully weak. I haven't seen the chess.com report.

    I agree with you regarding the OTB allegations. Here's the chess.com
    report:

    https://www.chess.com/blog/CHESScom/hans-niemann-report

    I find their evidence for Niemann's online cheating pretty strong. If chess.com is right, then in Niemann's high-profile interview in which
    he was apparently "coming clean" and being totally honest about all his
    past cheating, he was actually blatantly lying and covering up most of
    his actual cheating. Of course, this doesn't mean that he cheated OTB,
    but I do sympathize with those who think that Niemann wasn't officially punished enough for his infractions. At the same time, I recognize that chess.com and FIDE are in a difficult position when they mostly have to
    rely ton statistical evidence.

    Re bg, an important question is whether it constitutes "cheating" to consult match equity tables. Furthermore, does it constitute cheating to take out
    a pen and paper, and write down the computations rather than do them in your head?
    I think that now (almost) everyone would say that all these behaviours constitute "cheating".
    But I don't think that has always been the case.

    I agree with you that these are now considered "cheating" OTB. But in
    the two cases I mentioned (Neil Kazaross and Stick), I don't think these
    issues were in dispute---nobody on either side thought that doing these
    things would be sufficient to explain the low PRs.

    ---
    Tim Chow

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Nasti Chestikov@21:1/5 to Tim Chow on Sun Oct 30 09:22:44 2022
    On Sunday, 30 October 2022 at 12:39:14 UTC, Tim Chow wrote:

    https://www.chess.com/blog/CHESScom/hans-niemann-report

    I find their evidence for Niemann's online cheating pretty strong.

    Tim Chow

    Your problem is that Carlsen and chess.com are staring down the barrels of a $100m libel lawsuit and "neither Carlsen nor Chess.com produced concrete evidence for their cheating accusations"

    https://www.bbc.co.uk/news/world-us-canada-63338375

    This isn't going to end nicely, I suspect Carlsen will be financially ruined unless he comes up with a lot more than suspicions.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From pepstein5@gmail.com@21:1/5 to Nasti Chestikov on Sun Oct 30 11:28:46 2022
    On Sunday, October 30, 2022 at 4:22:47 PM UTC, Nasti Chestikov wrote:
    On Sunday, 30 October 2022 at 12:39:14 UTC, Tim Chow wrote:

    https://www.chess.com/blog/CHESScom/hans-niemann-report

    I find their evidence for Niemann's online cheating pretty strong.

    Tim Chow

    Your problem is that Carlsen and chess.com are staring down the barrels of a $100m libel lawsuit and "neither Carlsen nor Chess.com produced concrete evidence for their cheating accusations"

    https://www.bbc.co.uk/news/world-us-canada-63338375

    This isn't going to end nicely, I suspect Carlsen will be financially ruined unless he comes up with a lot more than suspicions.

    So this is Tim's problem? I didn't know that Tim was acting as a guarantor for the pay awards.

    Paul

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Timothy Chow@21:1/5 to Nasti Chestikov on Sun Oct 30 16:45:21 2022
    On 10/30/2022 12:22 PM, Nasti Chestikov wrote:
    Your problem is that Carlsen and chess.com are staring down the barrels of a $100m libel lawsuit and "neither Carlsen nor Chess.com produced concrete evidence for their cheating accusations"

    There's no way Niemann will win this lawsuit, and I'm sure he knows
    it. He's just making a political statement, and maybe hoping he'll
    bluff one of the defendants into settling out of court.

    ---
    Tim Chow

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Timothy Chow@21:1/5 to peps...@gmail.com on Sun Oct 30 16:49:35 2022
    On 10/30/2022 2:28 PM, peps...@gmail.com wrote:
    So this is Tim's problem? I didn't know that Tim was acting as a guarantor for the pay awards.

    Oh yes, I have a large fleet of fast cars as collateral!

    ---
    Tim Chow

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Nasti Chestikov@21:1/5 to Tim Chow on Mon Oct 31 09:51:36 2022
    On Sunday, 30 October 2022 at 20:45:24 UTC, Tim Chow wrote:

    There's no way Niemann will win this lawsuit, and I'm sure he knows
    it. He's just making a political statement, and maybe hoping he'll
    bluff one of the defendants into settling out of court.

    ---
    Tim Chow

    The burden of proof in a libel case lies solely with the person making the claim that led to the case being brought. Niemann doesn't have to do anything.

    So Carlsen is going to have to come up with **actual** proof to back up his claim that Niemann cheated....or he loses.

    It's that simple.

    Just saying "well a lot of his moves were the same as the ones that leading bots recommended" isn't enough.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Timothy Chow@21:1/5 to Nasti Chestikov on Mon Oct 31 22:19:28 2022
    On 10/31/2022 12:51 PM, Nasti Chestikov wrote:
    The burden of proof in a libel case lies solely with the person making the claim that led to the case being brought. Niemann doesn't have to do anything.

    LOL. You don't know anything about law, do you?

    ---
    Tim Chow

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MK@21:1/5 to peps...@gmail.com on Tue Nov 1 17:39:44 2022
    On October 27, 2022 at 7:43:22 AM UTC-6, peps...@gmail.com wrote:

    ..... I would think there are powerful technologies
    that could use parallel processing to do all rollouts
    instantly for an entire match.

    I think this is an interesting thread that would
    be worth participating in but I seem to post in
    sporadic surges and my brain is starting to
    feel satiated for one day... :(

    But very briefly I would like to state/repeat my
    argument that rollouts don't/can't amount to a
    "rigorous procedure" to really prove anything.

    With there being a significant intersection between
    highly skilled maths/computing people and bg people

    If I can replace bg with gg (gamblegammon), I
    find the "significant intersection between highly
    skilled maths/computing people and gamblers"
    very intriguing... Does anyone know any serious
    research into this?

    MK

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Nasti Chestikov@21:1/5 to Tim Chow on Thu Nov 3 10:19:58 2022
    On Tuesday, 1 November 2022 at 02:19:34 UTC, Tim Chow wrote:

    LOL. You don't know anything about law, do you?

    ---
    Tim Chow

    From my (aborted) law school days, the common laws of libel generally only require that the claimant prove that a statement was made by the defendant, and that it was defamatory.

    Ergo, for example, I claim that Stick's PR is so low that he must be cheating.

    Stick sues me, claiming that a statement was made by me and that it was defamatory.

    I then have to justify why I think Stick is cheating.

    You *do* see that?

    Or has too much exposure to exhaust fumes from those Lambos you're hawking around Las Vegas addled your brain?

    Shame. One of this newsgroups most prolific posters has lost the plot.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Timothy Chow@21:1/5 to Nasti Chestikov on Thu Nov 3 22:06:44 2022
    On 11/3/2022 1:19 PM, Nasti Chestikov wrote:

    From my (aborted) law school days

    Did you go to law school in the U.S.? Libel law in the U.S. is
    rather different from that of many other countries because of the
    First Amendment.

    The burden of proof is on the plaintiff to prove that the defendant's
    claim is false. You don't need to go to law school to know that this
    is the way things work in the U.S.

    ---
    Tim Chow

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Timothy Chow@21:1/5 to All on Tue Nov 15 22:35:21 2022
    On 11/3/2022 10:06 PM, I wrote:
    On 11/3/2022 1:19 PM, Nasti Chestikov wrote:

     From my (aborted) law school days

    Did you go to law school in the U.S.?  Libel law in the U.S. is
    rather different from that of many other countries because of the
    First Amendment.

    The burden of proof is on the plaintiff to prove that the defendant's
    claim is false.  You don't need to go to law school to know that this
    is the way things work in the U.S.

    Here's an interesting Legal Eagle video about the lawsuit.

    https://www.youtube.com/watch?v=Gkd1Q0Ntt9s&t=605s

    On the topic of burden of proof, listen particularly to remarks starting
    at 13:33, 14:18, 16:37.

    I'm still curious which law school you attended?

    ---
    Tim Chow

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Timothy Chow@21:1/5 to All on Tue Nov 15 22:37:59 2022
    On 11/3/2022 10:06 PM, I wrote:
    On 11/3/2022 1:19 PM, Nasti Chestikov wrote:

     From my (aborted) law school days

    Did you go to law school in the U.S.?  Libel law in the U.S. is
    rather different from that of many other countries because of the
    First Amendment.

    The burden of proof is on the plaintiff to prove that the defendant's
    claim is false.  You don't need to go to law school to know that this
    is the way things work in the U.S.

    There's an interesting Legal Eagle video about Niemann's lawsuit.

    https://www.youtube.com/watch?v=Gkd1Q0Ntt9s&t=605s

    As you'll see, for Niemann to prevail, he would need to prove a lot of
    things that he probably can't prove.

    I'm still curious which law school you attended? How many classes did
    you complete?

    ---
    Tim Chow

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Timothy Chow@21:1/5 to Nasti Chestikov on Tue Jun 27 22:44:21 2023
    On 10/30/2022 12:22 PM, Nasti Chestikov wrote:
    On Sunday, 30 October 2022 at 12:39:14 UTC, Tim Chow wrote:

    https://www.chess.com/blog/CHESScom/hans-niemann-report

    I find their evidence for Niemann's online cheating pretty strong.

    Tim Chow

    Your problem is that Carlsen and chess.com are staring down the barrels of a $100m libel lawsuit and "neither Carlsen nor Chess.com produced concrete evidence for their cheating accusations"

    https://www.bbc.co.uk/news/world-us-canada-63338375

    This isn't going to end nicely, I suspect Carlsen will be financially ruined unless he comes up with a lot more than suspicions.

    Since Nasti Chestikov said this was my problem, I figured I should
    alert r.g.b. that my problem has been solved. Niemann's lawsuit has
    been dismissed, as you can easily confirm from any major news source.
    Evidently Nasti's law school credentials didn't help him accurately
    predict the future of Carlsen's finances.

    ---
    Tim Chow

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)