• Re: My older asm...

    From MitchAlsup@21:1/5 to All on Sun Dec 3 20:20:02 2023
    Do you have a HLL version of these ??

    I would like to try esm on them.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup@21:1/5 to Chris M. Thomasson on Tue Dec 5 19:09:17 2023
    Chris M. Thomasson wrote:

    On 12/3/2023 12:20 PM, MitchAlsup wrote:
    Do you have a HLL version of these ??

    I would like to try esm on them.

    Take note of:
    ___________________
    ..align 16
    ..globl ac_i686_lfgc_smr_activate
    ac_i686_lfgc_smr_activate:
    movl 4(%esp), %edx
    movl 8(%esp), %ecx

    ac_i686_lfgc_smr_activate_reload:
    movl (%ecx), %eax
    movl %eax, (%edx)
    mfence
    cmpl (%ecx), %eax
    jne ac_i686_lfgc_smr_activate_reload
    ret
    ___________________

    This is an example of where a #StoreLoad style membar is required on an
    x86. SMR is Safe Memory Reclamation, or aka Hazard Pointers.


    esm performs a switch into 1) sequentially consistent at the beginning
    of an ATOMIC event, 2) treats each memory reference in the event as
    SC, and 3) reverts back to causal consistency after all the memory
    references become visible instantaneously. So my ISA covers the
    MemBar requirements automagically.

    {
    1) HW is in a position to know if a ST/LD or LD/LD MemBar is required
    at the beginning of the event.
    2) Uncacheable STs in the atomic event are performed in processor-order ==memory-order so that cacheable locks covering uncacheable memory bring
    no surprises
    3) HW is in a position to know if ST/LD or ST/ST MemaBr is required after leaving an event.
    }
    So software does not have to concern itself with the idiosyncrasies
    of the memory model.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup@21:1/5 to Chris M. Thomasson on Tue Dec 5 23:25:29 2023
    Chris M. Thomasson wrote:

    On 12/5/2023 11:09 AM, MitchAlsup wrote:
    Chris M. Thomasson wrote:

    On 12/3/2023 12:20 PM, MitchAlsup wrote:
    Do you have a HLL version of these ??

    I would like to try esm on them.

    Take note of:
    ___________________
    ..align 16
    ..globl ac_i686_lfgc_smr_activate
    ac_i686_lfgc_smr_activate:
       movl 4(%esp), %edx
       movl 8(%esp), %ecx

    I cannot tell is the above is 2 LDs or 2 STs (one of the downsides of using MOV rather than LD or ST.)
    Since the SP has not been altered at this point you should not be able to do STs to the stack, so I
    assume LDs (for whatever reason). If these are arguments (as they appear) these are passed in registers
    in My 66000, So I assume these 2 instructions are unnecessary.


    ac_i686_lfgc_smr_activate_reload:
       movl (%ecx), %eax
       movl %eax, (%edx)
       mfence
       cmpl (%ecx), %eax

    Here, it looks like you are checking that the value you just stored is the same as the value of
    the memory container it was loaded from. This is checked by HW in My 66000. {{But I suggest this
    sequence is prone to ABA failures since it is based on the bit pattern stored rather than the
    fact the memory address was not written.}}

       jne ac_i686_lfgc_smr_activate_reload
    ret

    My attempt--based on the above realizations.

    ac_i686_lfgc_smr_activate:
    LD R4,[R2].lock
    ST R4,[R1].lock
    RET

    The .lock on the LD begins the ATOMIC event and initializes the failure point to ac_i686_lfgc_smr_activate_reload, which does not need a label or a branch; core makes sure LD access is sequentially consistent with all previously
    issued memory references before checking any deeper than the DCache and TLB, and does not deliver LD.data until this state has been achieved. It is this waiting that opens up window for a SNOOP to interfere with this sequence.

    The .lock on the ST ends the ATOMIC event--so if no interference has been detected, the event succeeds and tehST is performed, core reverts to causal consistency--if interference has been detected, ST is cancelled, control
    passes back to the initiator (LD.lock) and the event begins anew and afresh.


    ___________________

    This is an example of where a #StoreLoad style membar is required on
    an x86. SMR is Safe Memory Reclamation, or aka Hazard Pointers.


    esm performs a switch into 1) sequentially consistent at the beginning
    of an ATOMIC event, 2) treats each memory reference in the event as
    SC, and 3) reverts back to causal consistency after all the memory
    references become visible instantaneously. So my ISA covers the
    MemBar requirements automagically.

    Fwiw, the only reason I needed to use mfence in my
    ac_i686_lfgc_smr_activate function is to _honor_ ordering wrt the store followed by a load to another location on i686. Now, fwiw, my friend Joe Seigh created an interesting algorithm called SMR-RCU, a really neat
    hybrid. This would allow me to elude the explicit #StoreLoad membar on
    an x86 aka MFENCE or even a dummy LOCK RMW. Fwiw, loading a hazard
    pointer does not require any atomic RMW logic...


    {
    1) HW is in a position to know if a ST/LD or LD/LD MemBar is required
    at the beginning of the event.
    2) Uncacheable STs in the atomic event are performed in processor-order
    ==memory-order so that cacheable locks covering uncacheable memory bring
    no surprises
    3) HW is in a position to know if ST/LD or ST/ST MemaBr is required
    after leaving an event.
    }
    So software does not have to concern itself with the idiosyncrasies of
    the memory model.

    So, when you get some _really_ free time to burn and you are bored, can
    you show me what ac_i686_lfgc_smr_activate would look like in your
    system? Can I just get rid of the MFENCE? If I can, well, that implies sequential consistency.

    You get rid of a lot more than just the mfence. See Above.

    Do you have a special compiler that can turn std C++11 code into asm
    that works wrt your system? Is that why you asked me if I had a HLL
    version of it?

    I have a 99% functional C compiler that runs many Fortran programs, but
    C++ is a way bigger language {constructors, destructors, try-throw-catch,
    their version of ATOMICs, threading, .....}

    Thanks.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup@21:1/5 to Chris M. Thomasson on Tue Dec 5 23:50:26 2023
    Chris M. Thomasson wrote:

    On 12/5/2023 3:25 PM, MitchAlsup wrote:

    ac_i686_lfgc_smr_activate_reload:
       movl (%ecx), %eax
       movl %eax, (%edx)
       mfence
       cmpl (%ecx), %eax

    Here, it looks like you are checking that the value you just stored is
    the same as the value of
    the memory container it was loaded from.

    It's a store followed by a load to another location. SMR needs this to
    be honored.

    This means I cannot read x86 anymore, so we need a different communication means.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup@21:1/5 to Chris M. Thomasson on Wed Dec 6 02:21:04 2023
    Chris M. Thomasson wrote:

    On 12/5/2023 3:25 PM, MitchAlsup wrote:
    Chris M. Thomasson wrote:
    [...]
    I have a 99% functional C compiler that runs many Fortran programs, but
    C++ is a way bigger language {constructors, destructors, try-throw-catch,
    their version of ATOMICs, threading, .....}

    A C11 compiler that knows about membars and atomics? Fwiw, check this out:

    Where does it mention a My 66000 ISA target ?? That is the only ISA I am spending time in.........

    http://www.smorgasbordet.com/pellesc

    [...]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup@21:1/5 to Chris M. Thomasson on Wed Dec 6 02:18:58 2023
    Chris M. Thomasson wrote:

    Basically, SMR needs to load something from location A, store it in
    location B, and reload from A and compare it to B. This needs a
    StoreLoad relationship. Basically, location B would be on a per-thread
    stack in TLS. So, iirc off the top of my head:

    {{I still contend this is sensitive to the ABA problem}}

    <pseudo-code>
    ______________
    smr_reload:
    a = atomic_load(&loc_a);

    // critical!
    atomic_store(&loc_b, a);
    membar_storeload();
    b = atomic_load(&loc_a);

    if (a != b) goto smr_reload;
    ______________

    Where loc_b usually resides in TLS. This is a key aspect of why SMR can
    work at all.


    smr_reload:
    LD R4,[R2].lock
    ST R4,[R1].lock
    -----------

    As before: the LD waits until all older memory references are sequentially ordered,
    a HW monitor is initialized and if there is a write, coherent invalidate, (or a couple of other) accesses to the cache line touched by R2, the ST sill not be performed
    and control reverts to the LD.lock instruction, creating a tight loop.

    By the time control arrives at the instruction following the ST.lock a == b is guaranteed.

    You can insert code to check this, but it is dead code under esm execution model.

    Should you want to go somewhere other than smr_reload::

    smr_reload:
    LD R4,[R2].lock
    BI somewhereelse
    ST R4,[R1].lock
    -----------
    somewhereelse::
    // control arrives here upon detection of interference.

    Using other parts of ISA one could::

    smr_reload:
    MM R1,R2,#8

    and have the guarantee the the memory to memory move was ATOMIC so you don't need
    the .lock at all but you can't go somewhereelse.

    None of the esm solutions are sensitive to the ABA problem.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup@21:1/5 to Chris M. Thomasson on Wed Dec 6 15:01:10 2023
    Chris M. Thomasson wrote:

    On 12/5/2023 6:21 PM, MitchAlsup wrote:
    Chris M. Thomasson wrote:

    On 12/5/2023 3:25 PM, MitchAlsup wrote:
    Chris M. Thomasson wrote:
    [...]
    I have a 99% functional C compiler that runs many Fortran programs, but >>>> C++ is a way bigger language {constructors, destructors,
    try-throw-catch,
    their version of ATOMICs, threading, .....}

    A C11 compiler that knows about membars and atomics? Fwiw, check this
    out:

    Where does it mention a My 66000 ISA target ?? That is the only ISA I am
    spending time in.........

    http://www.smorgasbordet.com/pellesc

    [...]

    I was just wondering if your C compiler handles C11? If so, that would
    be great!!!!


    It handles whatever the current CLANG front end handles.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup@21:1/5 to Chris M. Thomasson on Thu Dec 7 03:29:26 2023
    Chris M. Thomasson wrote:

    On 12/6/2023 7:01 AM, MitchAlsup wrote:
    Chris M. Thomasson wrote:

    On 12/5/2023 6:21 PM, MitchAlsup wrote:
    Chris M. Thomasson wrote:

    On 12/5/2023 3:25 PM, MitchAlsup wrote:
    Chris M. Thomasson wrote:
    [...]
    I have a 99% functional C compiler that runs many Fortran programs, >>>>>> but
    C++ is a way bigger language {constructors, destructors,
    try-throw-catch,
    their version of ATOMICs, threading, .....}

    A C11 compiler that knows about membars and atomics? Fwiw, check
    this out:

    Where does it mention a My 66000 ISA target ?? That is the only ISA I am >>>> spending time in.........

    http://www.smorgasbordet.com/pellesc

    [...]

    I was just wondering if your C compiler handles C11? If so, that would
    be great!!!!


    It handles whatever the current CLANG front end handles.

    A proposal? https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0233r5.pdf :^)

    Under 2.3. Pros and Cons
    <snip>
    The main disadvantage of the hazard pointers method is that each traversal incurs a store-load
    memory order fence, when using the method's basic form (without blocking or using system
    support such as sys_membarrier()).

    The transition from {sequential to causal*} consistency appears to take place at the
    subsequent memory reference . There are 2 cases to
    consider::
    a) ST.lock remains in the execution window
    b) ST.lock has retired

    (*) or stronger {MMI/O, config}

    The stage by stage rules seem to be::

    No-Address:: younger memory references are not allowed to access CACHE;
    after ST.lock AGENs, those younger memory references can access the cache. {{The younger memory references can pass through AGEN, and optimistically
    read tag, TLB, data but not change any state or deliver (LD) or accept
    (ST) a value.}}

    {Address:
    Write-Permission:
    No-Data-No-Line::
    No-Data-Line::
    Data-No-Line::} younger cacheable memory references with different line
    index then ST.lock are allowed to be seen externally, less than cacheable
    are not allowed...

    Data-Line:: When this store is sequentially consistent with the rest of processor memory order; ST.data has been performed. Less than cacheable
    younger accesses are allowed.

    Notice that ST.lock is the only timing point:: but all participating
    STs remain processor consistent so all participating STs are performed
    before or during ST.lock.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup@21:1/5 to Chris M. Thomasson on Fri Dec 8 22:59:56 2023
    Chris M. Thomasson wrote:

    On 12/6/2023 7:01 AM, MitchAlsup wrote:

    How about this, C11:
    ____________________________
    #include <stdio.h>
    #include <stdlib.h>
    #include <assert.h>
    #include <stdatomic.h>


    struct ct_node
    {
    struct ct_node* m_next;
    };


    int
    main(void)
    {
    printf("ct_c_atomic_test...nn");
    fflush(stdout);

    {
    _Atomic(struct ct_node*) shared = NULL;

    struct ct_node local = { NULL };

    struct ct_node* result_0 = atomic_exchange(&shared, &local);

    assert(!result_0);

    struct ct_node* result_1 = atomic_exchange(&shared, NULL);

    assert(result_1 == &local);
    }

    printf("completed!nn");

    return 0;
    }
    ____________________________

    ?


    It should be approximately::

    main:
    ENTER R0,R0,#16
    LEA R1,#"ct_c_atomic_test...nn" // printf("ct_c_atomic_test...nn");
    CALL printf
    MOV R1,#1 // fflush(stdout);
    CALL fflush

    MOV R2,#0 // shared = NULL; // pointer = 0; ?!?
    ST R2,[SP,8] // local.m_next = NULL;
    // for the life of me I can't see why the
    // below code does not just SIGSEGV.
    //..............................................// But I ignore that.....

    ADD R5,SP,#8 // &local;
    LD R3,[R2].lock // atomic_exchange(&shared
    ST R5,[R2].lock // atomic_exchange(&shared = &local);
    // ST R3,[SP,8] // local = atomic_exchange(); // dead

    BEQ0 R3,assert1 // assert(!result_0);

    // NULL = atomic_exchange(); is dead
    LD R3,[R2].lock // atomic_exchange(&shared
    ST #0,[R2].lock // atomic_exchange(&shared = NULL);
    // R4 = result_1

    // R5 already has &[sp+8]
    CMP R4,R3,R5 // assert(result_1 == &local);
    // last use R5, R3, R2
    BEQ R4,assert2

    LEA R1,#"completed!nn" // printf("completed!nn");
    CALL printf
    MOV R1,#0 // return 0;
    EXIT R0,R0,#16

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup@21:1/5 to Chris M. Thomasson on Sat Dec 9 04:25:22 2023
    Chris M. Thomasson wrote:

    On 12/8/2023 2:59 PM, MitchAlsup wrote:
    Chris M. Thomasson wrote:

    On 12/6/2023 7:01 AM, MitchAlsup wrote:

    How about this, C11:
    ____________________________
    #include <stdio.h>
    #include <stdlib.h>
    #include <assert.h>
    #include <stdatomic.h>


    struct ct_node
    {
         struct ct_node* m_next;
    };


    int
    main(void)
    {
         printf("ct_c_atomic_test...nn");
         fflush(stdout);

         {
             _Atomic(struct ct_node*) shared = NULL;

             struct ct_node local = { NULL };

             struct ct_node* result_0 = atomic_exchange(&shared, &local);

             assert(!result_0);

             struct ct_node* result_1 = atomic_exchange(&shared, NULL); >>
             assert(result_1 == &local);
         }

         printf("completed!nn");

         return 0;
    }
    ____________________________

    ?


    It should be approximately::

    main:
        ENTER        R0,R0,#16
        LEA        R1,#"ct_c_atomic_test...nn"    //
    printf("ct_c_atomic_test...nn");
        CALL        printf
        MOV        R1,#1            // fflush(stdout);
        CALL        fflush

        MOV        R2,#0            // shared       = NULL;    // pointer =
    0; ?!?
        ST        R2,[SP,8]        // local.m_next = NULL;
                            // for the life of me I can't see why the
                            // below code does not just SIGSEGV.
    //..............................................// But I ignore that.....

    Actually, I am wondering why you "seem" think that it would have any
    chance of SIGSEGV? The atomic exchanges are legit, all the memory
    references are legit, no problem. Akin to, pseudo-code:
    _________________
    atomic<word*> shared = nullptr;
    word local = 123;
    word* x = shared.exchange(&local);
    assert(x == nullptr);
    word* y = shared.exchange(nullptr);
    assert(y == &local);
    _________________

    Why does _Atomic(struct ct_node*) shared = NULL; not set the shared
    pointer to zero (NULL) ?? Apparently you are setting something at where
    it is pointing to NULL; so how does shared get to be a pointer to something ??

    Iirc, keep in mind that default membar is seq_cst in C/C++11. Unless I foobar'ed it, it looks fine to me. :^)


        ADD        R5,SP,#8        // &local;
        LD        R3,[R2].lock        // atomic_exchange(&shared
        ST        R5,[R2].lock        // atomic_exchange(&shared  =  &local);
    //    ST        R3,[SP,8]        // local = atomic_exchange();    // dead

        BEQ0        R3,assert1        // assert(!result_0); >>
                            // NULL = atomic_exchange(); is dead
        LD        R3,[R2].lock        // atomic_exchange(&shared
        ST        #0,[R2].lock        // atomic_exchange(&shared  =  NULL);
                            // R4 = result_1

                            // R5 already has &[sp+8]
        CMP        R4,R3,R5        // assert(result_1 == &local);
                            // last use R5, R3, R2
        BEQ        R4,assert2

        LEA        R1,#"completed!nn"    // printf("completed!nn");
        CALL        printf
        MOV        R1,#0            // return 0;
        EXIT        R0,R0,#16


    Ahhh! I need to examine this. Fwiw, MSVC has C11 atomic, but no threads.
    What fun! ;^o

    You will find My 66000 ATOMICs to be a lot thinner that competing ISAs.

    I looked at ARM ASM for this and ARM has converted the LD.lock;ST.lock
    into a test-and-test-and-set loop. esm has automagic looping if there
    is no interference check (BI), so::

        LD        R3,[R2].lock
    // long number of cycles achieving sequential consistency and the value
    // to be delivered to a register
    ST R5,[R2[.lock --- // must fail
    | // automagically
    LD R3,[r2].lock <----/ // try again
    // fewer cycles
    ST R5,[R2[.lock // greater chance of success

    If interference is detected making ST.lock unperformable, then control
    reverts to the LD.lock. Now, we are already sequentially consistent
    so the LD is performed and made visible externally with intent to write.

    Oh, and BTW: this adds no state that across context switches--all
    context switches cause the event to fail and control reverts to the
    control point prior to context switching.

    If you really do want test-and-test-and-set functionality::

    Label:
        LD        R3,[R2]
        BC some_condition,R3,Label
        LD        R3,[R2].lock
    ST R5,[R2[.lock


    Afaict, PellesC has full C11 atomics, threads and membars.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup@21:1/5 to MitchAlsup on Sat Dec 9 22:12:14 2023
    MitchAlsup wrote:

    Chris M. Thomasson wrote:

    Having written::

        LD        R3,[R2].lock
    // long number of cycles achieving sequential consistency and the value
    // to be delivered to a register
    ST R5,[R2[.lock --- // must fail
    | // automagically
    LD R3,[r2].lock <----/ // try again
    // fewer cycles
    ST R5,[R2[.lock // greater chance of success

    It occurs to me that the magic control transfer to LD.lock arrives
    with the notion the previous ATOMIC event has failed and that this
    second execution of LD.lock should simply wait for the cache line
    {as if performing a LD without the .lock and without the attempt to
    obtain write permission, and when the cache line arrives, chase it
    with a Coherent Invalidate:: performing the test-and-test-and-set
    paradigm without actually encoding the paradigm in instructions}.

    This should eliminate most of the desire for test-and-test-and-set
    explicitly in the instruction stream.

    If you really do want test-and-test-and-set functionality::

    Label:
        LD        R3,[R2]
        BC some_condition,R3,Label
        LD        R3,[R2].lock
    ST R5,[R2[.lock

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup@21:1/5 to Chris M. Thomasson on Sun Dec 10 17:04:48 2023
    Chris M. Thomasson wrote:



    How about this, C11:
    ____________________________
    #include <stdio.h>
    #include <stdlib.h>
    #include <assert.h>
    #include <stdatomic.h>

    Can you provide a reference to stdatomic.h that discusses how the functions are supposed to work at the HW level rather than at the SW level.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Scott Lurndal@21:1/5 to MitchAlsup on Sun Dec 10 20:08:39 2023
    mitchalsup@aol.com (MitchAlsup) writes:
    Chris M. Thomasson wrote:



    How about this, C11:
    ____________________________
    #include <stdio.h>
    #include <stdlib.h>
    #include <assert.h>
    #include <stdatomic.h>

    Can you provide a reference to stdatomic.h that discusses how the functions are
    supposed to work at the HW level rather than at the SW level.

    As I understand it, such a reference doesn't exist. The C++ standard
    simply defines the guarantees the application can expect from the implementation (compiler + OS).

    The C11/C++11 Standard language version is here:

    https://en.cppreference.com/w/cpp/header/stdatomic.h

    GCC's version:

    https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup@21:1/5 to Scott Lurndal on Sun Dec 10 22:58:28 2023
    Scott Lurndal wrote:

    mitchalsup@aol.com (MitchAlsup) writes:
    Chris M. Thomasson wrote:



    How about this, C11:
    ____________________________
    #include <stdio.h>
    #include <stdlib.h>
    #include <assert.h>
    #include <stdatomic.h>

    Can you provide a reference to stdatomic.h that discusses how the functions are
    supposed to work at the HW level rather than at the SW level.

    As I understand it, such a reference doesn't exist. The C++ standard
    simply defines the guarantees the application can expect from the implementation (compiler + OS).

    The C11/C++11 Standard language version is here:

    https://en.cppreference.com/w/cpp/header/stdatomic.h

    GCC's version:

    https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html

    This one is much better.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup@21:1/5 to Chris M. Thomasson on Mon Dec 11 00:30:57 2023
    Chris M. Thomasson wrote:

    On 12/10/2023 12:08 PM, Scott Lurndal wrote:
    mitchalsup@aol.com (MitchAlsup) writes:
    Chris M. Thomasson wrote:



    How about this, C11:
    ____________________________
    #include <stdio.h>
    #include <stdlib.h>
    #include <assert.h>
    #include <stdatomic.h>

    Can you provide a reference to stdatomic.h that discusses how the functions are
    supposed to work at the HW level rather than at the SW level.

    As I understand it, such a reference doesn't exist.

    I think so. An atomic exchange can be implemented with an atomic RMW
    exchange (LOCK XCHG), CAS (cmpxchg), or even LL/SC.

    More like::

    An Atomic Exchange has to reach a point of sequential consistency (which
    may entail a MEMBAR on machines with relaxed memory ordering) before
    the address of the exchanged container is made visible to the system.
    The exchange is performed in such a way that the stored value is visible
    to the system prior to any other access to that container is made visible
    to the system (this may also require an MEMBAR on systems with relaxed
    memory orderings.)

    The C++ standard
    simply defines the guarantees the application can expect from the
    implementation (compiler + OS).

    The C11/C++11 Standard language version is here:

    https://en.cppreference.com/w/cpp/header/stdatomic.h

    GCC's version:

    https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup@21:1/5 to Chris M. Thomasson on Mon Dec 11 01:20:49 2023
    Chris M. Thomasson wrote:

    On 12/10/2023 4:30 PM, MitchAlsup wrote:
    Chris M. Thomasson wrote:

    On 12/10/2023 12:08 PM, Scott Lurndal wrote:
    mitchalsup@aol.com (MitchAlsup) writes:
    Chris M. Thomasson wrote:



    How about this, C11:
    ____________________________
    #include <stdio.h>
    #include <stdlib.h>
    #include <assert.h>
    #include <stdatomic.h>

    Can you provide a reference to stdatomic.h that discusses how the
    functions are
    supposed to work at the HW level rather than at the SW level.

    As I understand it, such a reference doesn't exist.

    I think so. An atomic exchange can be implemented with an atomic RMW
    exchange (LOCK XCHG), CAS (cmpxchg), or even LL/SC.

    More like::

    An Atomic Exchange has to reach a point of sequential consistency (which
    may entail a MEMBAR on machines with relaxed memory ordering) before
    the address of the exchanged container is made visible to the system.

    Not really... Atomic exchange can be implemented in relaxed form wrt no memory barriers in sight. Just as long as it does its job, an atomic
    swap. On the SPARC I had to decorate atomic exchange with the correct
    membars to get the job done. The weakest I could get away with...

    Ok, then write the paragraph I tried to write with sufficient detail that
    a CPU designer, who knows nothing of software, can read but cannot misunderstand.
    Fit all necessary sequencing, barriers, timing, ... needed such that any CPU sequence designer will achieve a successful atomic_exchange over all his
    (or her !) designs.

    The exchange is performed in such a way that the stored value is visible
    to the system prior to any other access to that container is made visible
    to the system (this may also require an MEMBAR on systems with relaxed
    memory orderings.)





    The C++ standard
    simply defines the guarantees the application can expect from the
    implementation (compiler + OS).

    The C11/C++11 Standard language version is here:

    https://en.cppreference.com/w/cpp/header/stdatomic.h

    GCC's version:

    https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to MitchAlsup on Mon Dec 11 10:09:18 2023
    On 10/12/2023 23:58, MitchAlsup wrote:
    Scott Lurndal wrote:

    mitchalsup@aol.com (MitchAlsup) writes:
    Chris M. Thomasson wrote:



    How about this, C11:
    ____________________________
    #include <stdio.h>
    #include <stdlib.h>
    #include <assert.h>
    #include <stdatomic.h>

    Can you provide a reference to stdatomic.h that discusses how the
    functions are
    supposed to work at the HW level rather than at the SW level.

    As I understand it, such a reference doesn't exist.   The C++ standard
    simply defines the guarantees the application can expect from the
    implementation (compiler + OS).

    The C11/C++11 Standard language version is here:

    https://en.cppreference.com/w/cpp/header/stdatomic.h

    GCC's version:

    https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html

    This one is much better.

    Other links that might be of use or interest from gcc include:

    <https://gcc.gnu.org/wiki/MemoryModel>
    <https://gcc.gnu.org/wiki/Atomic/GCCMM>
    <https://gcc.gnu.org/wiki/Atomic/C11>
    <https://gcc.gnu.org/wiki/Atomic> <https://gcc.gnu.org/wiki/Atomic/GCCMM/Optimizations>

    <https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup@21:1/5 to Chris M. Thomasson on Thu Dec 14 20:55:35 2023
    Chris M. Thomasson wrote:

    On 12/10/2023 5:20 PM, MitchAlsup wrote:

    ______________________________
    word*
    atomic_exchange(
    word** origin,
    word* xchg
    ){
    hash_lock(origin);

    // RMW
    word* original = *origin;
    *origin = xchg;

    hash_unlock(origin);

    return original;
    }
    ______________________________

    As written, and assuming hash_lock() and hash_unlock are function calls
    that guarentee the atomicity of the exchange::

    atomic_exchange:
    ENTER R28,R0,#24
    MOV R30,R1
    MOV R29,R2

    CALL hash_lock

    LD R28,[R30]
    ST R29,[R30

    MOV R1,R30
    CALL hash_unlock

    MOV R1,R28
    EXIT R28,R0,#24

    But I suspect that hash_{[un]lock} is not a function call but a macro
    that offsets into the structure and performs a LL while un performs
    the SC. Here is an example where we do not even use the value in the
    addressed memory container, but simply monitor the cache line for
    interference; In which case we get::

    atomic_exchange:
    PRE #19,[R1+lock].lock

    LD R3,[R1]
    ST R2,[R1]

    PUSH #3,[R1+lock].lock

    MOV R1,R3
    RET

    If interference is detected, control automagically reverts to PRE
    #19 specifies Write permission and L1 cache.

    The PUSH #3 does not change the value in the location, just releases
    the lock and leaves the line in L1 cache. Interference is detected
    when some other resource requests write permission {and is at higher
    priority}.

    Let us see how interference plays out::

    atomic_exchange:

    PRE #19,[R1+lock].lock // control-point = .
    LD R3,[R1]
    ST R2,[R1]
    { repeat
    PUSH #3,[R1+lock].lock -fails----
    | // IP = control-point
    PRE #19,[R1+lock].lock -arrives-/

    LD R3,[R1]
    ST R2,[R1]
    } any numbers of times
    PUSH #3,[R1+lock].lock -succ----
    |
    MOV R1,R3 -arrives-/
    RET _____________________________________________________________________

    By majority vote, the .lock notation is being retired and the Lock
    designation is obtained by concattenating an L on the end of the
    memory reference nmemonic::

    LDmem Rd,[Rb...].lock
    becomes:
    LDmemL Rd,[Rb...]
    etcetera.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup@21:1/5 to Chris M. Thomasson on Fri Dec 15 04:15:21 2023
    Chris M. Thomasson wrote:

    On 12/14/2023 12:55 PM, MitchAlsup wrote:
    Chris M. Thomasson wrote:

    On 12/10/2023 5:20 PM, MitchAlsup wrote:

    ______________________________
    word*
    atomic_exchange(
       word** origin,
       word* xchg
    ){
        hash_lock(origin);

          // RMW
          word* original = *origin;
          *origin = xchg;

        hash_unlock(origin);

        return original;
    }
    ______________________________

    As written, and assuming hash_lock() and hash_unlock are function calls
    that guarentee the atomicity of the exchange::
    [...]

    Fwiw, my example of atomic exchange using locking is taking into account
    one of my previous experiments that hashes addresses into indexes into a mutex table. The mutex table is completely separated from the user logic.

    https://groups.google.com/g/comp.lang.c++/c/sV4WC_cBb9Q/m/5JRwvhpVCAAJ
    (read all)

    This is one way to implement C++ atomics using locks! This would require
    me to report that the impl is not lock-free vis is_lock_free and some
    other places.

    So would you assign the # define ATOMIC_BOOL_LOCK_FREE a 1 (sometimes lock free) or 2 (always lock free) or 0 (not lock free)

    Now, this is a locked version. The wait free version on x86 would be
    LOCK XCHG.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From MitchAlsup@21:1/5 to Chris M. Thomasson on Fri Dec 15 17:15:54 2023
    Chris M. Thomasson wrote:

    On 12/6/2023 7:29 PM, MitchAlsup wrote:

    The main disadvantage of the hazard pointers method is that each
    traversal incurs a store-load
    memory order fence, when using the method's basic form (without blocking
    or using system
    support such as sys_membarrier()).

    The transition from {sequential to causal*} consistency appears to take
    place at the
    subsequent memory reference . There are 2 cases to
    consider::
    a) ST.lock remains in the execution window
    b) ST.lock has retired
    [...]

    It's that damn Store to Load memory order requirement. There are ways
    around it wrt current arch's, dec alpha aside for a moment... I
    mentioned one of them in this thread.


    I learned very early, that getting ATOMICs right is vastly more important
    than making them fast. And getting memory order done properly is paramount.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)