And as you have noticed in the above webpage that you have to be careful
with compilers in parallel programming so that to avoid data races,
because compilers can optimize global variables if it is not volatile,
and this can cause data races in parallel programming, i am
understanding this issue, and i am for example using dynamic memory in Freepascal and Delphi for global variables that are written by the main
thread and read by many other threads , also i know how to use memory
fences to force visibility, and i also know around how how much time
takes the store buffer to drain etc.
Read my following thoughts to undertand more:
About the store buffer and memory visibility..
More about memory visibility..
I said before:
As you know that in parallel programming you have to take care
not only of memory ordering , but also take care about memory
visibility, read this to notice it:
A store barrier, “sfence” instruction on x86, forces all store
instructions prior to the barrier to happen before the barrier and have
the store buffers flushed to cache for the CPU on which it is issued.
This will make the program state "visible" to other CPUs so they can act
on it if necessary.
Read more here to understand correctly:
"However under x86-TSO, the stores are cached in the store buffers,
a load consult only shared memory and the store buffer of the given
thread, wich means it can load data from memory and ignore values from
the other thread."