Hello...
Read this:
I am a white arab, and now about Lockfree and Waitfree and Locks..
I have just read the following thoughts of a PhD researcher, and he says
the following:
"Lock-based concurrency mechanism (locks) have several difficulties in practice:
1) if we use more than one lock, we're subject to having deadlock issues;
2) if we use priority locks, we're subject to having priority inversion
issues;
3) if we use a lock without starvation-freedom guarantees (such as a
spinlock), we're subject to starvation and live-lock;
4) using locks is prone to convoying effects;
5) mutual exclusion locks don't scale for read-only operations, it takes
a reader-writer lock to have some scalability for read-only operations
and even then, we either execute read-only operations or one write, but
never both at the same time. Until today, there is no known efficient reader-writer lock with starvation-freedom guarantees;"
Read more here:
http://concurrencyfreaks.blogspot.com/2019/04/onefile-and-tail-latency.html
But i think that he is not right by saying the following:
"Until today, there is no known efficient reader-writer lock with starvation-freedom guarantees"
Because i am an inventor of many scalable algorithms and there
implementations, and i have invented scalable and efficient
starvation-free reader-writer locks, read my following thoughts below
to notice it..
Also look at his following webpage:
OneFile - The world's first wait-free Software Transactional Memory
http://concurrencyfreaks.blogspot.com/2019/04/onefile-worlds-first-wait-free-software.html
But i think he is not right, because read the following thoughts that i
have just posted that applies to waitfree and lockfree:
https://groups.google.com/forum/#!topic/comp.programming.threads/F_cF4ft1Qic
And read all my following thoughts to understand:
About Lock elision and Transactional memory..
I have just read the following:
Lock elision in the GNU C library
https://lwn.net/Articles/534758/
So it says the following:
"Lock elision uses the same programming model as normal locks, so it can
be directly applied to existing programs. The programmer keeps using
locks, but the locks are faster as they can use hardware transactional
memory internally for more parallelism. Lock elision uses memory
transactions as a fast path, while the slow path is still a normal lock. Deadlocks and other classic locking problems are still possible, because
the transactions may fall back to a real lock at any time."
So i think this is not good, because one of the benefits of
Transactional memory is that it solves the deadlock problem, but
with Lock elision you bring back the deadlock problem.
More about Locks and Transactional memory..
I have just looked at the following webpage about understanding
Transactional memory performance:
https://www.cs.utexas.edu/users/witchel/pubs/porter10ispass-tm-slides.pdf
And as you are noticing, it says that in practice Transactional memory
is worse than Locks at high contention, and it says that in practice Transactional memory is 40% worse than Locks at 100% contention.
This is why i have invented scalable Locks and scalable RWLocks, read
my following thoughts to notice it:
About beating Moore's Law with software..
bmoore has responded to me the following:
https://groups.google.com/forum/#!topic/soc.culture.china/Uu15FIknU0s
So as you are noticing he is asking me the following:
"Are you talking about beating Moore's Law with software?"
But i think that there is some of the following constraints:
"Modern programing environments contribute to the problem of software
bloat by placing ease of development and portable code above speed or
memory usage. While this is a sound business model in a commercial
environment, it does not make sense where IT resources are constrained. Languages such as Java, C-Sharp, and Python have opted for code
portability and software development speed above execution speed and
memory usage, while modern data storage and transfer standards such as
XML and JSON place flexibility and readability above efficiency."
Read the following:
https://smallwarsjournal.com/jrnl/art/overcoming-death-moores-law-role-software-advances-and-non-semiconductor-technologies
Also there remains the following to also beat Moores's Law:
"Improved Algorithms
Hardware improvements mean little if software cannot effectively use the resources available to it. The Army should shape future software
algorithms by funding basic research on improved software algorithms to
meet its specific needs. The Army should also search for new algorithms
and techniques which can be applied to meet specific needs and develop a learning culture within its software community to disseminate this information."
And about scalable algorithms, as you know i am a white arab
that is an inventor of many scalable algorithms and there
implementations, read my following thoughts to notice it:
About my new invention that is a scalable algorithm..
I am a white arab, and i think i am more smart,
and i think i am like a genius, because i have again just invented
a new scalable algorithm, but i will briefly talk about the following
best scalable reader-writer lock inventions, the first one is the following:
Scalable Read-mostly Synchronization Using Passive Reader-Writer Locks
https://www.usenix.org/system/files/conference/atc14/atc14-paper-liu.pdf
You will notice that it has a first weakness that it is for TSO hardware
memory model and the second weakness is that the writers latency is very expensive when there is few readers.
And here is the other best scalable reader-writer lock invention of
Facebook:
SharedMutex is a reader-writer lock. It is small, very fast, scalable
on multi-core
Read here:
https://github.com/facebook/folly/blob/master/folly/SharedMutex.h
But you will notice that the weakness of this scalable reader-writer
lock is that the priority can only be configured as the following:
SharedMutexReadPriority gives priority to readers,
SharedMutexWritePriority gives priority to writers.
So the weakness of this scalable reader-writer lock is that
you can have starvation with it.
So this is why i have just invented a scalable algorithm that is
a scalable reader-writer lock that is better than the above and that is starvation-free and that is fair and that has a small writers latency.
So i think mine is the best and i will sell many of my scalable
algorithms to software companies such as Microsoft or Google or Embardero..
What is it to be an inventor of many scalable algorithms ?
The Holy Grail of parallel programming is to provide good speedup while
hiding or avoiding the pitfalls of concurrency. You have to understand
it to be able to understand what i am doing, i am an inventor of
many scalable algorithms and there implementations, but how can we
define the kind of inventor like me? i think there is the following
kinds of inventors, the ones that are PhD researchers and inventors like
Albert Einstein, and the ones that are engineers and inventors like
Nikola Tesla, and i think that i am of the kind of inventor of Nikola
Tesla, i am not a PhD researcher like Albert Einstein, i am like an
engineer who invented many scalable algorithms and there
implementations, so i am like the following inventor that we call Nikola
Tesla:
https://en.wikipedia.org/wiki/Nikola_Tesla
But i think that both those PhD researchers that are inventors and those Engineers that are inventors are powerful.
You have to understand deeply what is to invent my scalable algorithms
and there implementations so that to understand that it is powerful,
i give you an example: So i have invented a scalable algorithm that is a scalable Mutex that is remarkable and that is the Holy Grail of scalable
Locks, it has the following characteristics, read my following thoughts
to understand:
About fair and unfair locking..
I have just read the following lead engineer at Amazon:
Highly contended and fair locking in Java
https://brooker.co.za/blog/2012/09/10/locking.html
So as you are noticing that you can use unfair locking that can have
starvation or fair locking that is slower than unfair locking.
I think that Microsoft synchronization objects like the Windows critical section uses unfair locking, but they still can have starvation.
But i think that this not the good way to do, because i am an inventor
and i have invented a scalable Fast Mutex that is much more powerful ,
because with my Fast Mutex you are capable to tune the "fairness" of the
lock, and my Fast Mutex is capable of more than that, read about it on
my following thoughts:
More about research and software development..
I have just looked at the following new video:
Why is coding so hard...
https://www.youtube.com/watch?v=TAAXwrgd1U8
I am understanding this video, but i have to explain my work:
I am not like this techlead in the video above, because i am also an
"inventor" that has invented many scalable algorithms and there
implementions, i am also inventing effective abstractions, i give you an example:
Read the following of the senior research scientist that is called Dave
Dice:
Preemption tolerant MCS locks
https://blogs.oracle.com/dave/preemption-tolerant-mcs-locks
As you are noticing he is trying to invent a new lock that is preemption tolerant, but his lock lacks some important characteristics, this is why
i have just invented a new Fast Mutex that is adaptative and that is
much much better and i think mine is the "best", and i think you will
not find it anywhere, my new Fast Mutex has the following characteristics:
1- Starvation-free
2- Tunable fairness
3- It keeps efficiently and very low its cache coherence traffic
4- Very good fast path performance
5- And it has a good preemption tolerance.
6- It is faster than scalable MCS lock
this is how i am an "inventor", and i have also invented other scalable algorithms such as a scalable reference counting with efficient support
for weak references, and i have invented a fully scalable Threadpool,
and i have also invented a Fully scalable FIFO queue, and i have
also invented other scalable algorithms and there implementations, and i
think i will sell some of them to Microsoft or to Google or Embarcadero
or such software companies.
And here is my other previous new invention of a scalable algorithm:
I have just read the following PhD paper about the invention that we
call counting networks and they are better than Software combining trees:
Counting Networks
http://people.csail.mit.edu/shanir/publications/AHS.pdf
And i have read the following PhD paper:
http://people.csail.mit.edu/shanir/publications/HLS.pdf
So as you are noticing they are saying in the conclusion that:
"Software combining trees and counting networks which are the only
techniques we observed to be truly scalable"
But i just found that this counting networks algorithm is not generally scalable, and i have the logical proof here, this is why i have just
come with a new invention that enhance the counting networks algorithm
to be generally scalable. And i think i will sell my new algorithm
of a generally scalable counting networks to Microsoft or Google or
Embarcadero or such software companies.
So you have to be careful with the actual counting networks algorithm
that is not generally scalable.
My other new invention is my scalable reference counting and here it is:
https://sites.google.com/site/scalable68/scalable-reference-counting-with-efficient-support-for-weak-references
And here is my just new invention of a scalable algorithm:
My Scalable RWLock that works across processes and threads was updated
to version 4.62
Now i think it is working correctly in both Windows and Linux..
You can download it from my website here:
https://sites.google.com/site/scalable68/scalable-rwlock-that-works-accross-processes-and-threads
More about me as an inventor of many scalable algorithms..
I am a white arab and i think i am like a genius, because i have
invented many scalable algorithms and there implementations, and look
for example at my just new invention of a scalable algorithm here:
https://sites.google.com/site/scalable68/scalable-rwlock-that-works-accross-processes-and-threads
As you have noticed, you have to be like a genius to be able to invent
my above scalable algorithm of a scalable RWLock, because it has the
following characteristics:
1- It is Scalable
2- It is Starvation-free
3- It is fair
4- It can be used across processes and threads
5- It can be used as a scalable Lock across processes and threads
by using my scalable AMLock that is FIFO fair on the writers side,
or it can be
used as a scalable RWLock.
I am using my scalable Lock that is FIFO fair that is called scalable
AMLock on the writers side.
Here is why scalable Locks are really important:
https://queue.acm.org/detail.cfm?id=2698990
So all in all it is a really good invention of mine.
Read my previous thoughts:
Here is how to use my new invention that is my scalable RWLock
across processes:
Just create an scalable rwlock object by giving a name in one process by calling the constructor like this:
scalable_rwlock.create('amine');
And you can use the scalable rwlock object from another process by
calling the constructor by using the name like this:
scalable_rwlock.create('amine');
So as you are noticing i have abstracted it efficiently..
Read the rest of my previous thoughts:
My new invention of a Scalable RWLock that works across processes and
threads is here, and now it works on both Windows and Linux..
Please download my source code and take a look at how i am making it
work across processes by using FNV1a hash on both process ID and thread
ID, FNV1a has a good dispersion, and FNV1a hash permits also my RWLock
to be scalable.
You can download it from my website here:
https://sites.google.com/site/scalable68/scalable-rwlock-that-works-accross-processes-and-threads
Description:
This is my invention of a fast, and scalable and starvation-free and
fair and lightweight Multiple-Readers-Exclusive-Writer Lock called
LW_RWLockX, it works across processes and threads.
The parameters of the constructor are: first parameter is the name of
the scalable RWLock to be used across processes, if the name is empty,
it will only be used across threads. The second parameter is the size of
the array of the readers, so if the size of the array is equal to the
number of parallel readers, so it will be scalable, but if the number of readers are greater than the size of the array , you will start to have contention. The third parameter is the size of the array of my scalable
Lock that is called AMLock, the number of threads can go beyond the size
of the array of the scalable AMLock, please look at the source code of
my scalable algorithms to understand.
I have also used my following implementation of FNV1a hash function to
make my new variants of RWLocks scalable (since FNV1a is a hash
algorithm that has good dispersion):
function FNV1aHash(key:int64): UInt64;
var
i: Integer;
key1:uint64;
const
FNV_offset_basis: UInt64 = 14695981039346656037;
FNV_prime: UInt64 = 1099511628211;
begin
//FNV-1a hash
Result := FNV_offset_basis;
for i := 1 to 8 do
begin
key1:=(key shr ((i-1)*8)) and $00000000000000ff;
Result := (Result xor key1) * FNV_prime;
end;
end;
- Platform: Windows, Unix and Linux on x86
Required FPC switches: -O3 -Sd
-Sd for delphi mode....
Required Delphi switches: -$H+ -DDelphi
For Delphi XE-XE7 and Delphi tokyo use the -DXE switch
You can configure it as follows from inside defines.inc file:
{$DEFINE CPU32} and {$DEFINE Windows32} for 32 bit systems
{$DEFINE CPU64} and {$DEFINE Windows64} for 64 bit systems
--'
I am a white arab, and why have i invented scalable RWLocks and scalable
Locks ?
Because there is a disadvantage with Transactional memory and
here it is:
About Hardware Transactional Memory:
"As someone who has used TSX to optimize synchronization primitives, you
can expect to see a ~15-20% performance increase, if (big if) your
program is heavy on disjoint data access, i.e. a lock is needed for correctness, but conflicts are rare in practice. If you have a lot of
threads frequently writing the same cache lines, you are probably going
to see worse performance with TSX as opposed to traditional locking. It
helps to think about TSX as transparently performing optimistic
concurrency control, which is actually pretty much how it is implemented
under the hood."
Read more here:
https://news.ycombinator.com/item?id=8169697
So as you are noticing, HTM (hardware transactional memory) and TM can
not replace locks when doing IO and when we have a highly contended
critical section.
Read the rest:
I have just read the following article that appeared in C/C++ Users
Journal, 23(3), March 2005
The Trouble With Locks
http://gotw.ca/publications/mill36.htm
And here is my thoughts about how to avoid deadlocks and race conditions
in lock-based systems:
https://community.idera.com/developer-tools/general-development/f/getit-and-third-party/71464/about-turing-completeness-and-parallel-programming
Also i don't agree with him about composability of lock-based systems,
read the following to understand:
"About composability of lock-based systems now:
Design your systems to be composable. Among the more galling claims of
the detractors of lock-based systems is the notion that they are somehow uncomposable:
“Locks and condition variables do not support modular programming,”
reads one typically brazen claim, “building large programs by gluing
together smaller programs[:] locks make this impossible.”9 The claim, of course, is incorrect. For evidence one need only point at the
composition of lock-based systems such as databases and operating
systems into larger systems that remain entirely unaware of lower-level locking.
There are two ways to make lock-based systems completely composable, and
each has its own place. First (and most obviously), one can make locking entirely internal to the subsystem. For example, in concurrent operating systems, control never returns to user level with in-kernel locks held;
the locks used to implement the system itself are entirely behind the
system call interface that constitutes the interface to the system. More generally, this model can work whenever a crisp interface exists between software components: as long as control flow is never returned to the
caller with locks held, the subsystem will remain composable.
Second (and perhaps counterintuitively), one can achieve concurrency and composability by having no locks whatsoever. In this case, there must be
no global subsystem state—subsystem state must be captured in
per-instance state, and it must be up to consumers of the subsystem to
assure that they do not access their instance in parallel. By leaving
locking up to the client of the subsystem, the subsystem itself can be
used concurrently by different subsystems and in different contexts. A
concrete example of this is the AVL tree implementation used extensively
in the Solaris kernel. As with any balanced binary tree, the
implementation is sufficiently complex to merit componentization, but by
not having any global state, the implementation may be used concurrently
by disjoint subsystems—the only constraint is that manipulation of a
single AVL tree instance must be serialized."
Read more here:
https://queue.acm.org/detail.cfm?id=1454462
Thank you,
Amine Moulat Ramdane.
--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)