• Re: Idea for spin-wait loops

    From Scott Lurndal@21:1/5 to Chris M. Thomasson on Sun Mar 24 20:43:37 2024
    "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:
    On 3/23/2024 9:53 AM, Bonita Montero wrote:
    I've got a nice idea for a new processor-extrension for spin-wait
    -loops. The idea is that a thread of a processors enters a sleep
    state if a word in memory is equal to a certain register until

    A processor which is doesn't own (or have a shared copy) of the
    cacheline which would contain that word in memory will never know
    if it was modified, as it won't see the invalidate messages in
    a directory-based cache subsystem (leaving aside noncachable
    accesses to the word in memory, of course).

    This sounds like a solution to a problem that doesn't exist,
    and there would be no incentive for a processor designer
    to include the substantial additional complexity required
    to support your feature.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Scott Lurndal on Mon Mar 25 14:34:50 2024
    On Sun, 24 Mar 2024 20:43:37 GMT
    scott@slp53.sl.home (Scott Lurndal) wrote:

    "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:
    On 3/23/2024 9:53 AM, Bonita Montero wrote:
    I've got a nice idea for a new processor-extrension for spin-wait
    -loops. The idea is that a thread of a processors enters a sleep
    state if a word in memory is equal to a certain register until

    A processor which is doesn't own (or have a shared copy) of the
    cacheline which would contain that word in memory will never know
    if it was modified, as it won't see the invalidate messages in
    a directory-based cache subsystem (leaving aside noncachable
    accesses to the word in memory, of course).


    It seems, I didn't understand the idea.
    Of course, the waiting thread/core has the word in question in its
    L1D cache when it enters the wait loop.
    Of course, it is awaken if/when the the word is evicted from the cache
    for unrelated reason, i.e. practically because of capacity conflict
    caused by activity of other threads that are running on the same
    core. There is nothing wrong with spurious awakenings as long as they
    are rare.

    This sounds like a solution to a problem that doesn't exist,
    and there would be no incentive for a processor designer
    to include the substantial additional complexity required
    to support your feature.

    The problem does exist and primitive proposed by Bonita is not new. It
    is a minor modification of Monitor/Mwait.
    For current Intel and AMD processors this sort of things is
    relatively unattractive because at 2 threads per core and with rather measurable throughput gains achieved by running 2 threads instead of
    one (for AMD up to 30%, for Intel a little less, but often measurable),
    each thread is a valuable resource, so you don't really want to keep it
    paused for too long time. And the whole point of Bonita's amendment of
    existing mechanism is that the software has more control on long waits.

    On IBM POWER and on few of Sun/Oracle chips they have up to 8 threads
    per core, so each thread is not that valuable. It means that longer uninterrupted wait has more sense and control of duration of the
    timeout is more desirable. But may be IBM's and Oracle's variants of
    MWAIT already have it?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Michael S on Mon Mar 25 19:11:22 2024
    On Mon, 25 Mar 2024 14:34:50 +0200
    Michael S <already5chosen@yahoo.com> wrote:

    On Sun, 24 Mar 2024 20:43:37 GMT
    scott@slp53.sl.home (Scott Lurndal) wrote:

    "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> writes:
    On 3/23/2024 9:53 AM, Bonita Montero wrote:
    I've got a nice idea for a new processor-extrension for spin-wait
    -loops. The idea is that a thread of a processors enters a sleep
    state if a word in memory is equal to a certain register until


    A processor which is doesn't own (or have a shared copy) of the
    cacheline which would contain that word in memory will never know
    if it was modified, as it won't see the invalidate messages in
    a directory-based cache subsystem (leaving aside noncachable
    accesses to the word in memory, of course).


    It seems, I didn't understand the idea.

    I meant to say 'you' instead of 'I'.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael S@21:1/5 to Chris M. Thomasson on Wed Mar 27 17:09:57 2024
    On Tue, 26 Mar 2024 13:02:47 -0700
    "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> wrote:

    On 3/26/2024 3:12 AM, Bonita Montero wrote:
    Am 26.03.2024 um 03:48 schrieb Chris M. Thomasson:
    On 3/23/2024 11:38 PM, Bonita Montero wrote:
    Am 23.03.2024 um 21:58 schrieb Chris M. Thomasson:

    MWAIT?

    MWAIT has no timeout.


    Not sure how important it would be for MWAIT to have a timeout...
    You are referring to user space, right?

    MWAIT could be used for limited spinning like glibc's pthread_mutex
    is capable. The advantage of a MWAIT with timout would be much less interconnect-traffic compared to polling.


    MWAIT is meant to get around polling?

    I don't know what you mean by 'get around'.
    The main point of original Monitor/MWAIT is to allow to one SMT thread
    to do polling on memory address in a way that consumes almost no core's execution resources thus allowing to the other SMT thread(s) of the
    same core to run faster. The sort of more intelligent PAUSE.
    In the absence of other SMT threads the main advantage of polling
    loop with Monitor/MWAIT vs simple tight polling loop (STPL) is reduced
    power consumption.
    As far as cache coherence traffic (CCT) is concerned, Monitor/MWAIT
    polling loop provides virtually no advantage relatively to STPL. Both
    are quite efficient from CCT perspective, at least as long as programmer
    does not do anything stupid.

    Later on Intel invented 'MWAIT for Power Management' that has slightly different objectives. But that is O.T.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)