• About Transactional memory and locks..

    From World90@21:1/5 to All on Thu Feb 4 10:56:16 2021

    About Transactional memory and locks..

    I have just read the following paper about Transactional memory and locks:


    I don't agree with the above paper, since read my following thoughts
    to understand:

    I have just invented a new powerful scalable fast mutex, and it has the following characteristics:

    1- Starvation-free
    2- Tunable fairness
    3- It keeps efficiently and very low its cache coherence traffic
    4- Very good fast path performance
    5- And it has a good preemption tolerance.
    6- It is faster than scalable MCS lock
    7- It solves the problem of lock convoying

    So my new invention solves the following problem:

    The convoy phenomenon


    And here is my other new invention of a Scalable RWLock that works
    across processes and threads that is starvation-free and fair and i will
    soon enhance it much more and it will become really powerful:


    And abuot Lock-free versus Lock, read my following post:


    So i think with my above scalable fast mutex and my scalable RWLocks
    that are starvation-free and fair and by reading the following about composability of lock-based systems, you will notice that lock-based
    systems are still useful.

    "About composability of lock-based systems..

    Design your systems to be composable. Among the more galling claims of
    the detractors of lock-based systems is the notion that they are somehow uncomposable: “Locks and condition variables do not support modular programming,” reads one typically brazen claim, “building large programs
    by gluing together smaller programs[:] locks make this impossible.”9 The claim, of course, is incorrect. For evidence one need only point at the composition of lock-based systems such as databases and operating
    systems into larger systems that remain entirely unaware of lower-level locking.

    There are two ways to make lock-based systems completely composable, and
    each has its own place. First (and most obviously), one can make locking entirely internal to the subsystem. For example, in concurrent operating systems, control never returns to user level with in-kernel locks held;
    the locks used to implement the system itself are entirely behind the
    system call interface that constitutes the interface to the system. More generally, this model can work whenever a crisp interface exists between software components: as long as control flow is never returned to the
    caller with locks held, the subsystem will remain composable.

    Second (and perhaps counterintuitively), one can achieve concurrency and composability by having no locks whatsoever. In this case, there must be
    no global subsystem state—subsystem state must be captured in
    per-instance state, and it must be up to consumers of the subsystem to
    assure that they do not access their instance in parallel. By leaving
    locking up to the client of the subsystem, the subsystem itself can be
    used concurrently by different subsystems and in different contexts. A
    concrete example of this is the AVL tree implementation used extensively
    in the Solaris kernel. As with any balanced binary tree, the
    implementation is sufficiently complex to merit componentization, but by
    not having any global state, the implementation may be used concurrently
    by disjoint subsystems—the only constraint is that manipulation of a
    single AVL tree instance must be serialized."

    Read more here:


    About mathematics and about abstraction..

    I think my specialization is also that i have invented many software
    algorithms and software scalable algorithms and i am still inventing
    other software scalable algorithms and algorithms, those scalable
    algorithms and algorithms that i have invented are like inventing
    mathematical theorems that you prove and present in a higher level
    abstraction, but not only that but those algorithms and scalable
    algorithms of mine are presented in a form of higher level software
    abstraction that abstract the complexity of my scalable algorithms and algorithms, it is the most important part that interests me, for example
    notice how i am constructing higher level abstraction in my following
    tutorial as methodology that, first, permits to model the
    synchronization objects of parallel programs with logic primitives with If-Then-OR-AND so that to make it easy to translate to Petri nets so
    that to detect deadlocks in parallel programs, please take a look at it
    in my following web link because this tutorial of mine is the way of
    learning by higher level abstraction:

    How to analyse parallel applications with Petri Nets


    So notice that my methodology is a generalization that solves
    communication deadlocks and resource deadlocks in parallel programs.

    1- Communication deadlocks that result from incorrect use of
    event objects or condition variables (i.e. wait-notify

    2- Resource deadlocks, a common kind of deadlock in which a set of
    threads blocks forever because each thread in the set is waiting to
    acquire a lock held by another thread in the set.

    This is what interests me in mathematics, i want to work efficiently in mathematics in a much higher level of abstraction, i give you
    an example of what i am doing in mathematics so that you understand,
    look at how i am implementing mathematics as a software parallel
    conjugate gradient system solvers that scale very well, and i am
    presenting them in a higher level of abstraction, this is how i am
    abstracting the mathematics of them, read the following about it to
    notice it:

    About SOR and Conjugate gradient mathematical methods..

    I have just looked at SOR(Successive Overrelaxation Method),
    and i think it is much less powerful than Conjugate gradient method,
    read the following to notice it:



    This is why i have implemented in both C++ and Delphi my Parallel
    Conjugate Gradient Linear System Solver Library that scales very well,
    read my following thoughts about it to understand more:

    About the convergence properties of the conjugate gradient method

    The conjugate gradient method can theoretically be viewed as a direct
    method, as it produces the exact solution after a finite number of
    iterations, which is not larger than the size of the matrix, in the
    absence of round-off error. However, the conjugate gradient method is
    unstable with respect to even small perturbations, e.g., most directions
    are not in practice conjugate, and the exact solution is never obtained. Fortunately, the conjugate gradient method can be used as an iterative
    method as it provides monotonically improving approximations to the
    exact solution, which may reach the required tolerance after a
    relatively small (compared to the problem size) number of iterations.
    The improvement is typically linear and its speed is determined by the condition number κ(A) of the system matrix A: the larger is κ(A), the
    slower the improvement.

    Read more here:


    So i think my Conjugate Gradient Linear System Solver Library
    that scales very well is still very useful, read about it
    in my writing below:

    Read the following interesting news:

    The finite element method finds its place in games

    Read more here:


    But you have to be aware that finite element method uses Conjugate
    Gradient Method for Solution of Finite Element Problems, read here to
    notice it:

    Conjugate Gradient Method for Solution of Large Finite Element Problems
    on CPU and GPU


    This is why i have also designed and implemented my Parallel Conjugate
    Gradient Linear System Solver library that scales very well,
    here it is:

    My Parallel C++ Conjugate Gradient Linear System Solver Library
    that scales very well version 1.76 is here..

    Author: Amine Moulay Ramdane


    This library contains a Parallel implementation of Conjugate Gradient
    Dense Linear System Solver library that is NUMA-aware and cache-aware
    that scales very well, and it contains also a Parallel implementation of Conjugate Gradient Sparse Linear System Solver library that is
    cache-aware that scales very well.

    Sparse linear system solvers are ubiquitous in high performance
    computing (HPC) and often are the most computational intensive parts in scientific computing codes. A few of the many applications relying on
    sparse linear solvers include fusion energy simulation, space weather simulation, climate modeling, and environmental modeling, and finite
    element method, and large-scale reservoir simulations to enhance oil
    recovery by the oil and gas industry.

    Conjugate Gradient is known to converge to the exact solution in n steps
    for a matrix of size n, and was historically first seen as a direct
    method because of this. However, after a while people figured out that
    it works really well if you just stop the iteration much earlier - often
    you will get a very good approximation after much fewer than n steps. In
    fact, we can analyze how fast Conjugate gradient converges. The end
    result is that Conjugate gradient is used as an iterative method for
    large linear systems today.

    Please download the zip file and read the readme file inside the zip to
    know how to use it.

    You can download it from:


    Language: GNU C++ and Visual C++ and C++Builder

    Operating Systems: Windows, Linux, Unix and Mac OS X on (x86)


    As you have noticed i have just written above about my Parallel C++
    Conjugate Gradient Linear System Solver Library that scales very well,
    but here is my Parallel Delphi and Freepascal Conjugate Gradient Linear
    System Solvers Libraries that scale very well:

    Parallel implementation of Conjugate Gradient Dense Linear System solver library that is NUMA-aware and cache-aware that scales very well




    Thank you,
    Amine Moulay Ramdane.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)