On 12/3/2023 12:20 PM, MitchAlsup wrote:
Do you have a HLL version of these ??
I would like to try esm on them.
Take note of:
___________________
..align 16
..globl ac_i686_lfgc_smr_activate
ac_i686_lfgc_smr_activate:
movl 4(%esp), %edx
movl 8(%esp), %ecx
ac_i686_lfgc_smr_activate_reload:
movl (%ecx), %eax
movl %eax, (%edx)
mfence
cmpl (%ecx), %eax
jne ac_i686_lfgc_smr_activate_reload
ret
___________________
This is an example of where a #StoreLoad style membar is required on an
x86. SMR is Safe Memory Reclamation, or aka Hazard Pointers.
On 12/5/2023 11:09 AM, MitchAlsup wrote:
Chris M. Thomasson wrote:
On 12/3/2023 12:20 PM, MitchAlsup wrote:
Do you have a HLL version of these ??
I would like to try esm on them.
Take note of:
___________________
..align 16
..globl ac_i686_lfgc_smr_activate
ac_i686_lfgc_smr_activate:
movl 4(%esp), %edx
movl 8(%esp), %ecx
ac_i686_lfgc_smr_activate_reload:
movl (%ecx), %eax
movl %eax, (%edx)
mfence
cmpl (%ecx), %eax
jne ac_i686_lfgc_smr_activate_reload
ret
___________________
This is an example of where a #StoreLoad style membar is required on
an x86. SMR is Safe Memory Reclamation, or aka Hazard Pointers.
esm performs a switch into 1) sequentially consistent at the beginning
of an ATOMIC event, 2) treats each memory reference in the event as
SC, and 3) reverts back to causal consistency after all the memory
references become visible instantaneously. So my ISA covers the
MemBar requirements automagically.
Fwiw, the only reason I needed to use mfence in my
ac_i686_lfgc_smr_activate function is to _honor_ ordering wrt the store followed by a load to another location on i686. Now, fwiw, my friend Joe Seigh created an interesting algorithm called SMR-RCU, a really neat
hybrid. This would allow me to elude the explicit #StoreLoad membar on
an x86 aka MFENCE or even a dummy LOCK RMW. Fwiw, loading a hazard
pointer does not require any atomic RMW logic...
{
1) HW is in a position to know if a ST/LD or LD/LD MemBar is required
at the beginning of the event.
2) Uncacheable STs in the atomic event are performed in processor-order
==memory-order so that cacheable locks covering uncacheable memory bring
no surprises
3) HW is in a position to know if ST/LD or ST/ST MemaBr is required
after leaving an event.
}
So software does not have to concern itself with the idiosyncrasies of
the memory model.
So, when you get some _really_ free time to burn and you are bored, can
you show me what ac_i686_lfgc_smr_activate would look like in your
system? Can I just get rid of the MFENCE? If I can, well, that implies sequential consistency.
Do you have a special compiler that can turn std C++11 code into asm
that works wrt your system? Is that why you asked me if I had a HLL
version of it?
Thanks.
On 12/5/2023 3:25 PM, MitchAlsup wrote:
ac_i686_lfgc_smr_activate_reload:
movl (%ecx), %eax
movl %eax, (%edx)
mfence
cmpl (%ecx), %eax
Here, it looks like you are checking that the value you just stored is
the same as the value of
the memory container it was loaded from.
It's a store followed by a load to another location. SMR needs this to
be honored.
On 12/5/2023 3:25 PM, MitchAlsup wrote:
Chris M. Thomasson wrote:[...]
I have a 99% functional C compiler that runs many Fortran programs, but
C++ is a way bigger language {constructors, destructors, try-throw-catch,
their version of ATOMICs, threading, .....}
A C11 compiler that knows about membars and atomics? Fwiw, check this out:
http://www.smorgasbordet.com/pellesc
[...]
Basically, SMR needs to load something from location A, store it in
location B, and reload from A and compare it to B. This needs a
StoreLoad relationship. Basically, location B would be on a per-thread
stack in TLS. So, iirc off the top of my head:
<pseudo-code>
______________
smr_reload:
a = atomic_load(&loc_a);
// critical!
atomic_store(&loc_b, a);
membar_storeload();
b = atomic_load(&loc_a);
if (a != b) goto smr_reload;
______________
Where loc_b usually resides in TLS. This is a key aspect of why SMR can
work at all.
On 12/5/2023 6:21 PM, MitchAlsup wrote:
Chris M. Thomasson wrote:
On 12/5/2023 3:25 PM, MitchAlsup wrote:
Chris M. Thomasson wrote:[...]
I have a 99% functional C compiler that runs many Fortran programs, but >>>> C++ is a way bigger language {constructors, destructors,
try-throw-catch,
their version of ATOMICs, threading, .....}
A C11 compiler that knows about membars and atomics? Fwiw, check this
out:
Where does it mention a My 66000 ISA target ?? That is the only ISA I am
spending time in.........
http://www.smorgasbordet.com/pellesc
[...]
I was just wondering if your C compiler handles C11? If so, that would
be great!!!!
On 12/6/2023 7:01 AM, MitchAlsup wrote:
Chris M. Thomasson wrote:
On 12/5/2023 6:21 PM, MitchAlsup wrote:
Chris M. Thomasson wrote:
On 12/5/2023 3:25 PM, MitchAlsup wrote:
Chris M. Thomasson wrote:[...]
I have a 99% functional C compiler that runs many Fortran programs, >>>>>> but
C++ is a way bigger language {constructors, destructors,
try-throw-catch,
their version of ATOMICs, threading, .....}
A C11 compiler that knows about membars and atomics? Fwiw, check
this out:
Where does it mention a My 66000 ISA target ?? That is the only ISA I am >>>> spending time in.........
http://www.smorgasbordet.com/pellesc
[...]
I was just wondering if your C compiler handles C11? If so, that would
be great!!!!
It handles whatever the current CLANG front end handles.
A proposal? https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0233r5.pdf :^)
On 12/6/2023 7:01 AM, MitchAlsup wrote:
How about this, C11:
____________________________
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
#include <stdatomic.h>
struct ct_node
{
struct ct_node* m_next;
};
int
main(void)
{
printf("ct_c_atomic_test...nn");
fflush(stdout);
{
_Atomic(struct ct_node*) shared = NULL;
struct ct_node local = { NULL };
struct ct_node* result_0 = atomic_exchange(&shared, &local);
assert(!result_0);
struct ct_node* result_1 = atomic_exchange(&shared, NULL);
assert(result_1 == &local);
}
printf("completed!nn");
return 0;
}
____________________________
?
On 12/8/2023 2:59 PM, MitchAlsup wrote:
Chris M. Thomasson wrote:
On 12/6/2023 7:01 AM, MitchAlsup wrote:
How about this, C11:
____________________________
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
#include <stdatomic.h>
struct ct_node
{
struct ct_node* m_next;
};
int
main(void)
{
printf("ct_c_atomic_test...nn");
fflush(stdout);
{
_Atomic(struct ct_node*) shared = NULL;
struct ct_node local = { NULL };
struct ct_node* result_0 = atomic_exchange(&shared, &local);
assert(!result_0);
struct ct_node* result_1 = atomic_exchange(&shared, NULL); >>
assert(result_1 == &local);
}
printf("completed!nn");
return 0;
}
____________________________
?
It should be approximately::
main:
ENTER R0,R0,#16
LEA R1,#"ct_c_atomic_test...nn" //
printf("ct_c_atomic_test...nn");
CALL printf
MOV R1,#1 // fflush(stdout);
CALL fflush
MOV R2,#0 // shared = NULL; // pointer =
0; ?!?
ST R2,[SP,8] // local.m_next = NULL;
// for the life of me I can't see why the
// below code does not just SIGSEGV.
//..............................................// But I ignore that.....
Actually, I am wondering why you "seem" think that it would have any
chance of SIGSEGV? The atomic exchanges are legit, all the memory
references are legit, no problem. Akin to, pseudo-code:
_________________
atomic<word*> shared = nullptr;
word local = 123;
word* x = shared.exchange(&local);
assert(x == nullptr);
word* y = shared.exchange(nullptr);
assert(y == &local);
_________________
Iirc, keep in mind that default membar is seq_cst in C/C++11. Unless I foobar'ed it, it looks fine to me. :^)
ADD R5,SP,#8 // &local;
LD R3,[R2].lock // atomic_exchange(&shared
ST R5,[R2].lock // atomic_exchange(&shared = &local);
// ST R3,[SP,8] // local = atomic_exchange(); // dead
BEQ0 R3,assert1 // assert(!result_0); >>
// NULL = atomic_exchange(); is dead
LD R3,[R2].lock // atomic_exchange(&shared
ST #0,[R2].lock // atomic_exchange(&shared = NULL);
// R4 = result_1
// R5 already has &[sp+8]
CMP R4,R3,R5 // assert(result_1 == &local);
// last use R5, R3, R2
BEQ R4,assert2
LEA R1,#"completed!nn" // printf("completed!nn");
CALL printf
MOV R1,#0 // return 0;
EXIT R0,R0,#16
Ahhh! I need to examine this. Fwiw, MSVC has C11 atomic, but no threads.
What fun! ;^o
Afaict, PellesC has full C11 atomics, threads and membars.
Chris M. Thomasson wrote:
LD R3,[R2].lock
// long number of cycles achieving sequential consistency and the value
// to be delivered to a register
ST R5,[R2[.lock --- // must fail
| // automagically
LD R3,[r2].lock <----/ // try again
// fewer cycles
ST R5,[R2[.lock // greater chance of success
If you really do want test-and-test-and-set functionality::
Label:
LD R3,[R2]
BC some_condition,R3,Label
LD R3,[R2].lock
ST R5,[R2[.lock
How about this, C11:
____________________________
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
#include <stdatomic.h>
Chris M. Thomasson wrote:
How about this, C11:
____________________________
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
#include <stdatomic.h>
Can you provide a reference to stdatomic.h that discusses how the functions are
supposed to work at the HW level rather than at the SW level.
mitchalsup@aol.com (MitchAlsup) writes:
Chris M. Thomasson wrote:
How about this, C11:
____________________________
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
#include <stdatomic.h>
Can you provide a reference to stdatomic.h that discusses how the functions are
supposed to work at the HW level rather than at the SW level.
As I understand it, such a reference doesn't exist. The C++ standard
simply defines the guarantees the application can expect from the implementation (compiler + OS).
The C11/C++11 Standard language version is here:
https://en.cppreference.com/w/cpp/header/stdatomic.h
GCC's version:
https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html
On 12/10/2023 12:08 PM, Scott Lurndal wrote:
mitchalsup@aol.com (MitchAlsup) writes:
Chris M. Thomasson wrote:
How about this, C11:
____________________________
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
#include <stdatomic.h>
Can you provide a reference to stdatomic.h that discusses how the functions are
supposed to work at the HW level rather than at the SW level.
As I understand it, such a reference doesn't exist.
I think so. An atomic exchange can be implemented with an atomic RMW
exchange (LOCK XCHG), CAS (cmpxchg), or even LL/SC.
The C++ standard
simply defines the guarantees the application can expect from the
implementation (compiler + OS).
The C11/C++11 Standard language version is here:
https://en.cppreference.com/w/cpp/header/stdatomic.h
GCC's version:
https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html
On 12/10/2023 4:30 PM, MitchAlsup wrote:
Chris M. Thomasson wrote:
On 12/10/2023 12:08 PM, Scott Lurndal wrote:
mitchalsup@aol.com (MitchAlsup) writes:
Chris M. Thomasson wrote:
How about this, C11:
____________________________
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
#include <stdatomic.h>
Can you provide a reference to stdatomic.h that discusses how the
functions are
supposed to work at the HW level rather than at the SW level.
As I understand it, such a reference doesn't exist.
I think so. An atomic exchange can be implemented with an atomic RMW
exchange (LOCK XCHG), CAS (cmpxchg), or even LL/SC.
More like::
An Atomic Exchange has to reach a point of sequential consistency (which
may entail a MEMBAR on machines with relaxed memory ordering) before
the address of the exchanged container is made visible to the system.
Not really... Atomic exchange can be implemented in relaxed form wrt no memory barriers in sight. Just as long as it does its job, an atomic
swap. On the SPARC I had to decorate atomic exchange with the correct
membars to get the job done. The weakest I could get away with...
The exchange is performed in such a way that the stored value is visible
to the system prior to any other access to that container is made visible
to the system (this may also require an MEMBAR on systems with relaxed
memory orderings.)
The C++ standard
simply defines the guarantees the application can expect from the
implementation (compiler + OS).
The C11/C++11 Standard language version is here:
https://en.cppreference.com/w/cpp/header/stdatomic.h
GCC's version:
https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html
Scott Lurndal wrote:
mitchalsup@aol.com (MitchAlsup) writes:
Chris M. Thomasson wrote:
How about this, C11:
____________________________
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
#include <stdatomic.h>
Can you provide a reference to stdatomic.h that discusses how the
functions are
supposed to work at the HW level rather than at the SW level.
As I understand it, such a reference doesn't exist. The C++ standard
simply defines the guarantees the application can expect from the
implementation (compiler + OS).
The C11/C++11 Standard language version is here:
https://en.cppreference.com/w/cpp/header/stdatomic.h
GCC's version:
https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html
This one is much better.
On 12/10/2023 5:20 PM, MitchAlsup wrote:
______________________________
word*
atomic_exchange(
word** origin,
word* xchg
){
hash_lock(origin);
// RMW
word* original = *origin;
*origin = xchg;
hash_unlock(origin);
return original;
}
______________________________
On 12/14/2023 12:55 PM, MitchAlsup wrote:
Chris M. Thomasson wrote:[...]
On 12/10/2023 5:20 PM, MitchAlsup wrote:
______________________________
word*
atomic_exchange(
word** origin,
word* xchg
){
hash_lock(origin);
// RMW
word* original = *origin;
*origin = xchg;
hash_unlock(origin);
return original;
}
______________________________
As written, and assuming hash_lock() and hash_unlock are function calls
that guarentee the atomicity of the exchange::
Fwiw, my example of atomic exchange using locking is taking into account
one of my previous experiments that hashes addresses into indexes into a mutex table. The mutex table is completely separated from the user logic.
https://groups.google.com/g/comp.lang.c++/c/sV4WC_cBb9Q/m/5JRwvhpVCAAJ
(read all)
This is one way to implement C++ atomics using locks! This would require
me to report that the impl is not lock-free vis is_lock_free and some
other places.
Now, this is a locked version. The wait free version on x86 would be
LOCK XCHG.
On 12/6/2023 7:29 PM, MitchAlsup wrote:
[...]
The main disadvantage of the hazard pointers method is that each
traversal incurs a store-load
memory order fence, when using the method's basic form (without blocking
or using system
support such as sys_membarrier()).
The transition from {sequential to causal*} consistency appears to take
place at the
subsequent memory reference . There are 2 cases to
consider::
a) ST.lock remains in the execution window
b) ST.lock has retired
It's that damn Store to Load memory order requirement. There are ways
around it wrt current arch's, dec alpha aside for a moment... I
mentioned one of them in this thread.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 307 |
Nodes: | 16 (2 / 14) |
Uptime: | 83:47:57 |
Calls: | 6,921 |
Calls today: | 6 |
Files: | 12,382 |
Messages: | 5,433,379 |