• Re: Balancing number of plies and number of trials

    From Timothy Chow@21:1/5 to All on Wed Dec 13 09:27:53 2023
    On 12/13/2023 4:47 AM, MK wrote:
    For example, would a rollout with 5184 trials
    at xg-roller level be as reliable as a rollout
    with 2592 tials at xg-roller+ level and as a
    rollout with 1296 trials at xg-roller++ level?
    There's a subtle distinction between "precision" and "accuracy."

    An "accurate" verdict is one that gives the correct answer.

    A "precise" estimate has very little statistical noise.

    Increasing the number of trials increases the precision. If you
    have a lot of trials then you can be very confident that you are
    learning "what the bot really thinks" and that it is very unlikely
    to change its mind even if you increase the number of trials to
    infinity.

    Accuracy is another matter. Murat of all people should understand
    that "what the bot thinks the correct play is" is not necessarily
    the same as "the correct play"; indeed, in some positions, it is
    debatable what "the correct play" is since that can depend on who
    your opponent is, what their emotional state is at the time, etc.
    But even setting those things aside, suppose for the sake of
    argument that we define "the correct play" as what game theorists
    would call an (expectiminimax) "equilibrium" play. We can ask whether
    stronger settings are more likely to yield the correct play. The
    answer is that we can't ever be completely sure, but one can give
    heuristic arguments in support of this principle. For example,
    equilibrium play has a certain self-consistency property, so you
    can "cross-examine" the bot and see its answers are self-consistent.
    Experience suggests that stronger settings exhibit greater
    self-consistency. Bob Wachtel's book "In the Game Until the End"
    has some examples of this. But again, the arguments are only
    heuristic, and we certainly can't be completely sure in any
    particular instance that stronger settings are giving us more
    "accurate" answers.

    ---
    Tim Chow

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bradley K. Sherman@21:1/5 to tchow12000@yahoo.com on Wed Dec 13 14:46:35 2023
    Timothy Chow <tchow12000@yahoo.com> wrote:
    ...
    Accuracy is another matter. Murat of all people should understand
    that "what the bot thinks the correct play is" is not necessarily
    the same as "the correct play"; indeed, in some positions, it is
    debatable what "the correct play" is since that can depend on who
    your opponent is, what their emotional state is at the time, etc.
    But even setting those things aside, suppose for the sake of
    argument that we define "the correct play" as what game theorists
    would call an (expectiminimax) "equilibrium" play. We can ask whether >stronger settings are more likely to yield the correct play. The
    answer is that we can't ever be completely sure, but one can give
    heuristic arguments in support of this principle. For example,
    equilibrium play has a certain self-consistency property, so you
    can "cross-examine" the bot and see its answers are self-consistent. >Experience suggests that stronger settings exhibit greater
    self-consistency. Bob Wachtel's book "In the Game Until the End"
    has some examples of this. But again, the arguments are only
    heuristic, and we certainly can't be completely sure in any
    particular instance that stronger settings are giving us more
    "accurate" answers.

    Related:
    |
    | Man beats machine at Go in human victory over AI
    |
    | Amateur exploited weakness in systems that have otherwise
    | dominated grandmasters.
    | ... <https://arstechnica.com/information-technology/2023/02/man-beats-machine-at-go-in-human-victory-over-ai/>

    --bks

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Timothy Chow@21:1/5 to All on Sat Dec 23 08:57:17 2023
    On 12/22/2023 12:18 PM, MK wrote:
    On December 13, 2023 at 7:27:56 AM UTC-7, Timothy Chow wrote:
    If you have a lot of trials then you can be very
    confident that you are learning "what the bot
    really thinks" and that it is very unlikely to
    change its mind even if you increase the number
    of trials to infinity.

    This isn't necessarily true and indeed incomplete.

    While random errors decrease, systematic errors
    may increase (accumulate and compound), thus
    cause the bot to change its mind.

    No, this is not correct, at least when you are simply extending
    a specific rollout. Systematic errors can indeed accumulate and
    compound over the course of a game, but a rollout trial repeatedly
    samples an entire game, so *each individual* trial is subject to
    the accumulated systematic error. There will be some randomness
    involved from trial to trial, of course; some trials may be "lucky"
    enough to avoid the variations that suffer from a lot of accumulated
    systematic error, while other trials may be "unlucky" enough to hit
    those variations, but in the long run these fluctuations will even
    out, and the rollout will converge. The final result will be an
    average over all accumulated systematic errors.

    I assume you mean look-ahead plies? Can you (or
    someone else) expand on this and explain/clarify
    how plies work during play and during rollouts?

    The GNU team can answer this better than I can. One thing to note
    is that during rollouts, the bots will apply some kind of move
    filter to screen out unpromising plays. That is, if you perform
    a 3-ply rollout, the bot doesn't necessarily evaluate every legal
    move at 3-ply and pick the highest-scoring one. It will evaluate
    all the options at the lowest ply but then discard a lot of them
    as not likely to emerge as the top play.

    I won't argue against self-consistency if you can
    prove that your equilibrium play is actually that.

    The *theoretical* equilibrium play is *defined* in terms of a
    system of equations that expresses self-consistency. If you insist
    on an empirical definition, though, then self-consistency can't be
    proved.

    so you can "cross-examine" the bot and see its
    answers are self-consistent.

    This would be most interesting for me to see. Has
    any bot been cross-examined for this and how?

    I don't know if anyone has done this in a systematic fashion, but
    certainly, if you take some crazy superbackgame or containment
    position, you can observe inconsistency yourself. Note down the
    3-ply equity (for example). Then run through all the possible rolls,
    and note down their 3-ply equities. Average them, and you'll find
    that they don't average out to the original 3-ply equity. This means
    that the 3-ply equity isn't (entirely) self-consistent. In many
    positions, the top play will still be the top play, but in the crazy superbackgame positions, this experiment can result in wild swings
    that drastically change the top play.

    But again, the arguments are only heuristic, and
    we certainly can't be completely sure in any
    particular instance that stronger settings are
    giving us more "accurate" answers.

    I argue that we can if we have unbiased bots that
    are trained not only through cubeless, single-game
    play but also through cubeful and "matchful" play,
    eliminating extrapolated cubeful/matchful equities.

    There are certainly ways to improve the way bots are trained, but it
    will still be true that we won't be *completely* sure that we're getting
    more accurate answers in every position. That would require more
    computing power than is available in the observable universe.

    ---
    Tim Chow

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Timothy Chow@21:1/5 to Tikli Chestikov on Tue Dec 26 23:08:39 2023
    On 12/26/2023 1:03 PM, Tikli Chestikov wrote:
    The fact that the main protagonist here has a ridiculous interest in hawking fast cars around various US "strips" is irrelevant.

    He's back!! Yay!!

    Been busy helping Hans Niemann file lawsuits, I presume?

    ---
    Tim Chow

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Timothy Chow@21:1/5 to All on Wed Dec 27 07:22:02 2023
    On 12/27/2023 2:16 AM, MK wrote:
    Now that I do, my immediate reaction is that it
    sounds really bad. Shouldn't it be the other way
    around? That is, evaluate at a higher ply first?

    It's done for speed. Each additional ply slows things
    down by a factor of (about) 21.

    ---
    Tim Chow

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Philippe Michel@21:1/5 to Timothy Chow on Thu Dec 28 22:09:57 2023
    On 2023-12-23, Timothy Chow <tchow12000@yahoo.com> wrote:

    On 12/22/2023 12:18 PM, MK wrote:

    I assume you mean look-ahead plies? Can you (or
    someone else) expand on this and explain/clarify
    how plies work during play and during rollouts?

    The GNU team can answer this better than I can. One thing to note
    is that during rollouts, the bots will apply some kind of move
    filter to screen out unpromising plays. That is, if you perform
    a 3-ply rollout, the bot doesn't necessarily evaluate every legal
    move at 3-ply and pick the highest-scoring one. It will evaluate
    all the options at the lowest ply but then discard a lot of them
    as not likely to emerge as the top play.

    This is not specific to rollouts. Interactive play, hints, analysis all
    uses this.

    To answer issues raised later in the thread by Murat, this is done for
    speed as already mentionned by Timothy.

    The cost in accuracy seems perfectly acceptable although it is not
    entirely negligible. For instance there are two predefined 2-ply
    settings: world class and supremo.

    The first one evaluates at 2-ply up to the top 8 0-ply moves if they are
    no worse than 0.16 point weaker than the best. The second one evaluates
    up to 16 moves no worse than 0.32 point. On the Depreli benchmark the
    cost of errors from world class is about 4% more than from supremo.

    The differences between either 1-ply or 3-ply and either of these 2-ply settings are much larger than this.

    You can change this in the analysis or rollout settings (look for
    Advanced settings and then Move filter). As far as I know, the default
    settings are conservative compared to what is used by the similar
    feature in eXtreme Gammon.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Timothy Chow@21:1/5 to All on Mon Jan 8 08:55:34 2024
    On 1/6/2024 7:50 PM, MK wrote:
    On December 27, 2023 at 5:22:06 AM UTC-7, Timothy Chow wrote:

    On 12/27/2023 2:16 AM, MK wrote:

    Now that I do, my immediate reaction is that it
    sounds really bad. Shouldn't it be the other way
    around? That is, evaluate at a higher ply first?

    It's done for speed. Each additional ply slows
    things down by a factor of (about) 21.

    Ah, that magic number 21 again. :) The number
    of possible dice rolls at every turn... ;)

    But why the factor is imprecise, i.e. "about 21"?
    Can't you give us the exact math...?

    The speed at which a complex piece of code runs depends on many
    factors beyond the simple math of how many different rolls there
    are.

    ---
    Tim Chow

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Timothy Chow@21:1/5 to All on Mon Jan 8 21:26:04 2024
    On 1/8/2024 1:14 PM, MK wrote:
    On January 8, 2024 at 6:55:38 AM UTC-7, Timothy Chow wrote:
    The speed at which a complex piece of code

    You mean like this one?:

    =======================================
    GNU Backgammon Manual V1.00.0
    10.4.5.4 n-ply Cubeful equities
    ..... so how so GNU Backgammon calculate cubeful
    2-ply equities? The answer is: by simple recursion:
    Equity=0
    Loop over 21 dice rolls
    Find best move for given roll
    Equity = Equity + Evaluate n-1 ply equity for resulting position
    End Loop
    Equity = Equity/36
    =======================================

    That's pseudocode, not code.

    ---
    Tim Chow

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Timothy Chow@21:1/5 to All on Tue Jan 9 08:48:16 2024
    On 1/9/2024 1:28 AM, MK wrote:
    So, you should be able to explain the reason based
    on the above pseudo code.

    Finding the best move for a given roll isn't necessarily going
    to take the same amount of time for every roll. To find the
    best move, one must first generate all the legal moves and
    evaluate them. The number of legal ways to play 11 is not
    necessarily going to be the same as the number of legal ways
    to play 66. It will depend on the position.

    ---
    Tim Chow

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Timothy Chow@21:1/5 to All on Thu Jan 11 17:44:13 2024
    On 1/11/2024 4:17 AM, MK wrote:
    This is not it. Just like dice rolls even out (or can
    be forced to artificially even out faster), number
    of legal ways to play for given dice rolls at given
    positions will alse average out.

    Of course. That's what "approximately" means. Check your
    dictionary.

    ---
    Tim Chow

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MK@21:1/5 to Timothy Chow on Fri Jan 12 03:17:33 2024
    On 1/11/2024 3:44 PM, Timothy Chow wrote:

    On 1/11/2024 4:17 AM, MK wrote:

    This is not it. Just like dice rolls even out
    (or can be forced to artificially even out
    faster), number of legal ways to play for given
    dice rolls at given positions will average out.

    Of course. That's what "approximately" means.

    Absolutely not!

    Check your dictionary.

    I would prefer to check your dictionary instead.
    Please tell us what dictionary you have checked?

    Noo-BG manual says "on average there are about
    20 legal moves" but because those chimpanzees
    are incapable of human language either.

    An average is just a single number result, like
    the average winning/losing PR in your contrived
    example.

    Once you compute an average, you treat it as a
    constant in your later calculations. There is
    no such thing as an "approximate average".

    Indeed, the following paragraph in the Noo-BG
    manual says: "GNU Backgammon needs to consider
    21 rolls by the opponent, 20 and possible legal
    moves per roll) = 420 positions to evaluate."

    Do you understand why it doesn't say *about*
    420 positions to evaluate? Because neither the
    21 possible combinations of rolls, nor the
    average 20 possible legal moves, nor their
    product are *not approximate*..!

    Thus, the reason for each additional ply being
    approximately 21 times slower has to to with
    something else than the number of possible of
    legal moves.

    Ask someone who knows math. Axel, Paul, etal.
    are looking up to you but maybe Bob Coca can
    help you with this on bgonline... ;)

    MK

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MK@21:1/5 to All on Fri Jan 19 18:14:24 2024
    On 1/8/2024 11:28 PM, MK wrote:

    =======================================
    GNU Backgammon Manual V1.00.0
    10.4.5.4 n-ply Cubeful equities
    ..... so how so GNU Backgammon calculate cubeful
    2-ply equities? The answer is: by simple recursion:
    Equity=0
    Loop over 21 dice rolls
    Find best move for given roll
    Equity = Equity + Evaluate n-1 ply equity for resulting position
    End Loop
    Equity = Equity/36
    =======================================

    Oh, I almost forgot. There is a kind of rotten easter
    egg in the above pseudocode. Let's see how long it
    will take for you whizzes to find it...? :)

    Bzzzt! Time's up.

    Loop over 21 dice rolls and divide by 36...?

    I keep telling you folks that your venerated
    bots are garbage... :(

    MK

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)