Forum: >>> Magnum BBS <<<

Borrowing a page from Paul Epstein's book

From Timothy Chow@21:1/5 to All on Wed May 24 08:29:40 2023

Paul has more than once suggested that it might be nice to
take into account not just a person's choice of play, but how
confident the person is in the play. Here's an idea for how
to score the Othello quiz in a way that partially takes into
account your confidence.

For each problem in the quiz, you may, in addition to selecting
a play, optionally declare, "I am confident my play is correct."

If you make such a declaration and your play is indeed correct,
then you score 2 points. But if you make such a declaration and
your play is incorrect, then you score -2 points.

If you do not declare that you are confident, then you score
1 point for a correct play and 0 points for an incorrect play
(or no play at all).

In effect, this scoring system allows you to offer 2 to 1 odds
that your play is correct. That is, you should declare, "I am
confident" if you believe your chances of being correct are at
least 2/3; otherwise, you should remain silent.

To get a feeling for this scoring system, let's consider some
examples. Suppose someone indiscriminately declares, "I am
confident" for every problem. This strategy will boost the
contestant's score if the contestant gets at least 7 out of 10
problems right, but will decrease the contestant's score
otherwise. So weaker contestants cannot trivially boost their
scores this way.

On the other hand, someone who only gets 3 problems right but
correctly declares "I am confident" for those 3 problems and no
others will get credit for that confidence, and will score as
well as someone who gets 6 problems right but isn't confident
about any problems.

---
Tim Chow

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MK@21:1/5 to Timothy Chow on Wed May 24 13:06:52 2023

On May 24, 2023 at 6:29:44 AM UTC-6, Timothy Chow wrote:

your play is incorrect, then you score -2 points.

If you do not declare that you are confident,
then you score 1 point for a correct play and
0 points for an incorrect play (or no play at all).

0 for no play is okay but correct/incorrect
plays should be +1/-1 similar to +2/-2

MK

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From pepstein5@gmail.com@21:1/5 to All on Wed May 24 15:55:51 2023

On Wednesday, May 24, 2023 at 9:06:53 PM UTC+1, MK wrote:

On May 24, 2023 at 6:29:44 AM UTC-6, Timothy Chow wrote:

your play is incorrect, then you score -2 points.

If you do not declare that you are confident,
then you score 1 point for a correct play and
0 points for an incorrect play (or no play at all).

0 for no play is okay but correct/incorrect
plays should be +1/-1 similar to +2/-2

MK

This is a simpler scheme, and might be more practical
because there are already clear precedents with many
academic multiple-choice exams having negative grading.

I think Tim's idea was to enable a solver to say "I think
X is the play but I'm not confident."
The 1/0/-1 system doesn't really cater to that.
But I think the 1/0/-1 system is a better suggestion.
It decreases the luck in the test because it discourages
wild guessing.

Paul

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Simon Woodhead@21:1/5 to peps...@gmail.com on Thu May 25 11:09:23 2023

Rather than right or wrong, the /size/ of the error is what is
most interesting (to me at least).

On 25/05/2023 8:55 am, peps...@gmail.com wrote:

On Wednesday, May 24, 2023 at 9:06:53 PM UTC+1, MK wrote:

On May 24, 2023 at 6:29:44 AM UTC-6, Timothy Chow wrote:

your play is incorrect, then you score -2 points.

If you do not declare that you are confident,
then you score 1 point for a correct play and
0 points for an incorrect play (or no play at all).

0 for no play is okay but correct/incorrect
plays should be +1/-1 similar to +2/-2

MK

This is a simpler scheme, and might be more practical
because there are already clear precedents with many
academic multiple-choice exams having negative grading.

I think Tim's idea was to enable a solver to say "I think
X is the play but I'm not confident."
The 1/0/-1 system doesn't really cater to that.
But I think the 1/0/-1 system is a better suggestion.
It decreases the luck in the test because it discourages
wild guessing.

Paul

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MK@21:1/5 to peps...@gmail.com on Wed May 24 18:24:38 2023

On May 24, 2023 at 4:55:52 PM UTC-6, peps...@gmail.com wrote:

On Wednesday, May 24, 2023 at 9:06:53 PM UTC+1, MK wrote:

0 for no play is okay but correct/incorrect
plays should be +1/-1 similar to +2/-2

I think Tim's idea was to enable a solver to
say "I think X is the play but I'm not confident."

I understood that. His scoring system is unfair
regardless of what his inital idea/intention was.

If you can't say it for fear of hurting his feelings,
you're contibuting to RGB incorrectly/negatively.

MK

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From MK@21:1/5 to Simon Woodhead on Wed May 24 19:15:03 2023

On May 24, 2023 at 7:09:27 PM UTC-6, Simon Woodhead wrote:

Rather than right or wrong, the /size/ of the
error is what is most interesting (to me at least).

I agree in principle but we would need better
bots with consistently more accurate rollouts
for that kind of mesuring/scoring.

I was just thinking about posting an article by
likening the current bots to accordions, biased
rollouts to bellows and the estimated equities
to the folds of the bellow of an accordion.

The maximum size of the "error" expands and
contracts on multiple axes at once, while the
distances between intermediate error values
also expand and contract (but not proportionally),
based on even things like for how many trials a
rollout is done.

Until we have unbiased AI bots, it's as absurd
to take quizzes, do rollouts, compare ERs/PRs,
etc. as walking around measuring things with
a rubber tape...

MK

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From pepstein5@gmail.com@21:1/5 to Simon Woodhead on Thu May 25 04:41:57 2023

On Thursday, May 25, 2023 at 2:09:27 AM UTC+1, Simon Woodhead wrote:

Rather than right or wrong, the /size/ of the error is what is
most interesting (to me at least).

Another brilliant point.
We should perhaps create an advertising campaign along the lines of: "rec.games.backgammon --- simply the wisest voices on the web!"

This idea has been discussed before, with Tim being a major participant.
It suggests (to me, anyway) the idea of scoring Othello answers by lost equity, rather than the number of correct replies.
However, the problem with this is that this lost equity is very hard to ascertain,
and so scores will oscillate according to changes in bot technology and rollout settings etc.

This raises the question: Has it ever happened that improvements in backgammon understanding
or rollout settings have changed the consensus on an Othello answer?

If so, what happens then?

Paul

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Timothy Chow@21:1/5 to peps...@gmail.com on Thu May 25 09:01:06 2023

On 5/24/2023 6:55 PM, peps...@gmail.com wrote:

I think Tim's idea was to enable a solver to say "I think
X is the play but I'm not confident."
The 1/0/-1 system doesn't really cater to that.
But I think the 1/0/-1 system is a better suggestion.
It decreases the luck in the test because it discourages
wild guessing.

I'm not too happy with 0 for no answer because in a real game
you have to make a play. So I don't think there should be any
incentive to leave an answer blank.

In any case, if someone decides to make a play, then under my
scoring system you get

+2 for correct + confident
+1 for correct + not confident
0 for incorrect and not confident
-2 for incorrect and confident

Under the system you and Murat are proposing, you get

+2 for correct + confident
+1 for correct + not confident
-1 for incorrect and not confident
-2 for incorrect and confident

The trouble with the latter system is that if you're only 50/50
about a play then you have no disincentive to claim you're
confident. You gain 1 point if you're right and you lose 1
point if you're wrong. Under my system, you gain 1 point if
you're right and you lose 2 points if you're wrong. So you're
disincentivized to claim confidence unless you're at least
66% confident.

---
Tim Chow

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Timothy Chow@21:1/5 to Simon Woodhead on Thu May 25 09:12:42 2023

On 5/24/2023 9:09 PM, Simon Woodhead wrote:

Rather than right or wrong, the /size/ of the error is what is
most interesting (to me at least).

The book 'Backgammon Super Genius Quiz' scores things both ways,
but the official ranking was based on 1 for correct and 0 for
incorrect, rather than on equity. I don't entirely agree with
the justification given in the book, though, which among other
things assumes that it's not possible to come up with enough
discriminatory problems with large equity differences. (The
Othello quiz is full of counterexamples, after all!)

There is a lot to be said for scoring based on the size of the
error, but there are two problems I see with that.

1. It means that for match play, you pretty much have to use EMG
to measure error size, and EMG has well-known problems.

http://www.fortuitouspress.com/emg

2. Size of error depends too sensitively on the rollout settings,
choice of bot, etc. The Othello quiz is carefully designed so that
the correct answer is robust to these variations. You can go back
to the earliest Othello quiz and use Snowie, GNU, BGBlitz, or XG,
on anything but the very weakest settings, and they will all agree
on what the top play is. But if you ask them for the size of the
errors then they may disagree rather significantly.

---
Tim Chow

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Timothy Chow@21:1/5 to All on Thu May 25 09:43:13 2023

If one wishes to score "no answer" and "wrong answer"
differently, then here would be my proposal.

+3 for correct + confident
+2 for correct + not confident
0 for no answer (confidence is ignored)
-1 for incorrect + not confident
-3 for incorrect + confident

Another feature of the Othello quiz problems is that there
are usually many plausible options---at least 4 or 5 in most
cases, and sometimes even more than that. The above scoring
system encourages you to submit an answer as long as you
think you have at least a 1/3 chance of being correct. This
will discriminate between people who have absolutely no clue
from people who are able to narrow down the choices to no
more than 3 candidates.

A declaration of confidence earns you 1 point if you are right
and -2 points if you are wrong. So you are discouraged from
declaring confidence unless you think your chances of being
right are at least 2/3.

---
Tim Chow

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bradley K. Sherman@21:1/5 to tchow12000@yahoo.com on Thu May 25 13:30:56 2023

Timothy Chow <tchow12000@yahoo.com> wrote:

...
http://www.fortuitouspress.com/emg
...

Nice article. But I wish people would stop using light gray (light grey,
Paul) text on white backgrounds. Paint it black!

--bks

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Keyop
  Sun Apr 28 20:37:53 2024
  from Huddersfield, West Yorkshire via SSH
- Keyop
  Sun Apr 28 20:37:37 2024
  from Huddersfield, West Yorkshire via SSH
- Keyop
  Sun Apr 28 20:30:04 2024
  from Huddersfield, West Yorkshire via SSH
- Bob Worm
  Mon Apr 29 09:04:47 2024
  from Wales, Uk via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	297
Nodes:	16 (2 / 14)
Uptime:	18:13:54
Calls:	6,667
Calls today:	1
Files:	12,216
Messages:	5,336,948

Borrowing a page from Paul Epstein's book

Who's Online

Recent Visitors

System Info