• Borrowing a page from Paul Epstein's book

    From Timothy Chow@21:1/5 to All on Wed May 24 08:29:40 2023
    Paul has more than once suggested that it might be nice to
    take into account not just a person's choice of play, but how
    confident the person is in the play. Here's an idea for how
    to score the Othello quiz in a way that partially takes into
    account your confidence.

    For each problem in the quiz, you may, in addition to selecting
    a play, optionally declare, "I am confident my play is correct."

    If you make such a declaration and your play is indeed correct,
    then you score 2 points. But if you make such a declaration and
    your play is incorrect, then you score -2 points.

    If you do not declare that you are confident, then you score
    1 point for a correct play and 0 points for an incorrect play
    (or no play at all).

    In effect, this scoring system allows you to offer 2 to 1 odds
    that your play is correct. That is, you should declare, "I am
    confident" if you believe your chances of being correct are at
    least 2/3; otherwise, you should remain silent.

    To get a feeling for this scoring system, let's consider some
    examples. Suppose someone indiscriminately declares, "I am
    confident" for every problem. This strategy will boost the
    contestant's score if the contestant gets at least 7 out of 10
    problems right, but will decrease the contestant's score
    otherwise. So weaker contestants cannot trivially boost their
    scores this way.

    On the other hand, someone who only gets 3 problems right but
    correctly declares "I am confident" for those 3 problems and no
    others will get credit for that confidence, and will score as
    well as someone who gets 6 problems right but isn't confident
    about any problems.

    ---
    Tim Chow

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MK@21:1/5 to Timothy Chow on Wed May 24 13:06:52 2023
    On May 24, 2023 at 6:29:44 AM UTC-6, Timothy Chow wrote:

    your play is incorrect, then you score -2 points.

    If you do not declare that you are confident,
    then you score 1 point for a correct play and
    0 points for an incorrect play (or no play at all).

    0 for no play is okay but correct/incorrect
    plays should be +1/-1 similar to +2/-2

    MK

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From pepstein5@gmail.com@21:1/5 to All on Wed May 24 15:55:51 2023
    On Wednesday, May 24, 2023 at 9:06:53 PM UTC+1, MK wrote:
    On May 24, 2023 at 6:29:44 AM UTC-6, Timothy Chow wrote:

    your play is incorrect, then you score -2 points.

    If you do not declare that you are confident,
    then you score 1 point for a correct play and
    0 points for an incorrect play (or no play at all).
    0 for no play is okay but correct/incorrect
    plays should be +1/-1 similar to +2/-2

    MK

    This is a simpler scheme, and might be more practical
    because there are already clear precedents with many
    academic multiple-choice exams having negative grading.

    I think Tim's idea was to enable a solver to say "I think
    X is the play but I'm not confident."
    The 1/0/-1 system doesn't really cater to that.
    But I think the 1/0/-1 system is a better suggestion.
    It decreases the luck in the test because it discourages
    wild guessing.

    Paul

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Simon Woodhead@21:1/5 to peps...@gmail.com on Thu May 25 11:09:23 2023
    Rather than right or wrong, the /size/ of the error is what is
    most interesting (to me at least).

    On 25/05/2023 8:55 am, peps...@gmail.com wrote:

    On Wednesday, May 24, 2023 at 9:06:53 PM UTC+1, MK wrote:
    On May 24, 2023 at 6:29:44 AM UTC-6, Timothy Chow wrote:

    your play is incorrect, then you score -2 points.

    If you do not declare that you are confident,
    then you score 1 point for a correct play and
    0 points for an incorrect play (or no play at all).
    0 for no play is okay but correct/incorrect
    plays should be +1/-1 similar to +2/-2

    MK

    This is a simpler scheme, and might be more practical
    because there are already clear precedents with many
    academic multiple-choice exams having negative grading.

    I think Tim's idea was to enable a solver to say "I think
    X is the play but I'm not confident."
    The 1/0/-1 system doesn't really cater to that.
    But I think the 1/0/-1 system is a better suggestion.
    It decreases the luck in the test because it discourages
    wild guessing.

    Paul

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MK@21:1/5 to peps...@gmail.com on Wed May 24 18:24:38 2023
    On May 24, 2023 at 4:55:52 PM UTC-6, peps...@gmail.com wrote:

    On Wednesday, May 24, 2023 at 9:06:53 PM UTC+1, MK wrote:

    0 for no play is okay but correct/incorrect
    plays should be +1/-1 similar to +2/-2

    I think Tim's idea was to enable a solver to
    say "I think X is the play but I'm not confident."

    I understood that. His scoring system is unfair
    regardless of what his inital idea/intention was.

    If you can't say it for fear of hurting his feelings,
    you're contibuting to RGB incorrectly/negatively.

    MK

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MK@21:1/5 to Simon Woodhead on Wed May 24 19:15:03 2023
    On May 24, 2023 at 7:09:27 PM UTC-6, Simon Woodhead wrote:

    Rather than right or wrong, the /size/ of the
    error is what is most interesting (to me at least).

    I agree in principle but we would need better
    bots with consistently more accurate rollouts
    for that kind of mesuring/scoring.

    I was just thinking about posting an article by
    likening the current bots to accordions, biased
    rollouts to bellows and the estimated equities
    to the folds of the bellow of an accordion.

    The maximum size of the "error" expands and
    contracts on multiple axes at once, while the
    distances between intermediate error values
    also expand and contract (but not proportionally),
    based on even things like for how many trials a
    rollout is done.

    Until we have unbiased AI bots, it's as absurd
    to take quizzes, do rollouts, compare ERs/PRs,
    etc. as walking around measuring things with
    a rubber tape...

    MK

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From pepstein5@gmail.com@21:1/5 to Simon Woodhead on Thu May 25 04:41:57 2023
    On Thursday, May 25, 2023 at 2:09:27 AM UTC+1, Simon Woodhead wrote:
    Rather than right or wrong, the /size/ of the error is what is
    most interesting (to me at least).

    Another brilliant point.
    We should perhaps create an advertising campaign along the lines of: "rec.games.backgammon --- simply the wisest voices on the web!"

    This idea has been discussed before, with Tim being a major participant.
    It suggests (to me, anyway) the idea of scoring Othello answers by lost equity, rather than the number of correct replies.
    However, the problem with this is that this lost equity is very hard to ascertain,
    and so scores will oscillate according to changes in bot technology and rollout settings etc.

    This raises the question: Has it ever happened that improvements in backgammon understanding
    or rollout settings have changed the consensus on an Othello answer?

    If so, what happens then?

    Paul

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Timothy Chow@21:1/5 to peps...@gmail.com on Thu May 25 09:01:06 2023
    On 5/24/2023 6:55 PM, peps...@gmail.com wrote:
    I think Tim's idea was to enable a solver to say "I think
    X is the play but I'm not confident."
    The 1/0/-1 system doesn't really cater to that.
    But I think the 1/0/-1 system is a better suggestion.
    It decreases the luck in the test because it discourages
    wild guessing.

    I'm not too happy with 0 for no answer because in a real game
    you have to make a play. So I don't think there should be any
    incentive to leave an answer blank.

    In any case, if someone decides to make a play, then under my
    scoring system you get

    +2 for correct + confident
    +1 for correct + not confident
    0 for incorrect and not confident
    -2 for incorrect and confident

    Under the system you and Murat are proposing, you get

    +2 for correct + confident
    +1 for correct + not confident
    -1 for incorrect and not confident
    -2 for incorrect and confident

    The trouble with the latter system is that if you're only 50/50
    about a play then you have no disincentive to claim you're
    confident. You gain 1 point if you're right and you lose 1
    point if you're wrong. Under my system, you gain 1 point if
    you're right and you lose 2 points if you're wrong. So you're
    disincentivized to claim confidence unless you're at least
    66% confident.

    ---
    Tim Chow

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Timothy Chow@21:1/5 to Simon Woodhead on Thu May 25 09:12:42 2023
    On 5/24/2023 9:09 PM, Simon Woodhead wrote:
    Rather than right or wrong, the /size/ of the error is what is
    most interesting (to me at least).

    The book 'Backgammon Super Genius Quiz' scores things both ways,
    but the official ranking was based on 1 for correct and 0 for
    incorrect, rather than on equity. I don't entirely agree with
    the justification given in the book, though, which among other
    things assumes that it's not possible to come up with enough
    discriminatory problems with large equity differences. (The
    Othello quiz is full of counterexamples, after all!)

    There is a lot to be said for scoring based on the size of the
    error, but there are two problems I see with that.

    1. It means that for match play, you pretty much have to use EMG
    to measure error size, and EMG has well-known problems.

    http://www.fortuitouspress.com/emg

    2. Size of error depends too sensitively on the rollout settings,
    choice of bot, etc. The Othello quiz is carefully designed so that
    the correct answer is robust to these variations. You can go back
    to the earliest Othello quiz and use Snowie, GNU, BGBlitz, or XG,
    on anything but the very weakest settings, and they will all agree
    on what the top play is. But if you ask them for the size of the
    errors then they may disagree rather significantly.

    ---
    Tim Chow

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Timothy Chow@21:1/5 to All on Thu May 25 09:43:13 2023
    If one wishes to score "no answer" and "wrong answer"
    differently, then here would be my proposal.

    +3 for correct + confident
    +2 for correct + not confident
    0 for no answer (confidence is ignored)
    -1 for incorrect + not confident
    -3 for incorrect + confident

    Another feature of the Othello quiz problems is that there
    are usually many plausible options---at least 4 or 5 in most
    cases, and sometimes even more than that. The above scoring
    system encourages you to submit an answer as long as you
    think you have at least a 1/3 chance of being correct. This
    will discriminate between people who have absolutely no clue
    from people who are able to narrow down the choices to no
    more than 3 candidates.

    A declaration of confidence earns you 1 point if you are right
    and -2 points if you are wrong. So you are discouraged from
    declaring confidence unless you think your chances of being
    right are at least 2/3.

    ---
    Tim Chow

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bradley K. Sherman@21:1/5 to tchow12000@yahoo.com on Thu May 25 13:30:56 2023
    Timothy Chow <tchow12000@yahoo.com> wrote:
    ...
    http://www.fortuitouspress.com/emg
    ...

    Nice article. But I wish people would stop using light gray (light grey,
    Paul) text on white backgrounds. Paint it black!

    --bks

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)