Hello..
About the strategy of "work depth-first; steal breadth-first"..
I have just read the following webpage:
Why Too Many Threads Hurts Performance, and What to do About It
https://www.codeguru.com/cpp/sample_chapter/article.php/c13533/Why-Too-Many-Threads-Hurts-Performance-and-What-to-do-About-It.htm
Also I have just looked at the following interesting video about Go
scheduler and Go concurrency:
Dmitry Vyukov — Go scheduler: Implementing language with lightweight concurrency
https://www.youtube.com/watch?v=-K11rY57K7k
And i have just read the following webpage about the Threadpool of
microsoft .NET 4.0:
https://blogs.msdn.microsoft.com/jennifer/2009/06/26/work-stealing-in-net-4-0/
And as you are noticing the first web link above is speaking about the
strategy of "work depth-first; steal breadth-first" , but we have to be
more smart because i think that this strategy, that is advantageous for
cache locality, works best for recursive algorithms, because a thread is
taking the first task and after that the algorithm is recursive, so it
will put the childs tasks inside the local work-stealing queue, and the
other threads will start to take from the work-stealing queue, so the
work will be distributed correctly, but as you will notice that this
strategy works best for recursive algorithms, but when you you
iteratively start many tasks, i think we will have much more contention
on the work-stealing queue and this is a weakness of this strategy,
other than that when it is not a recursive algorithm and the threads are receiving from the global queue so there will be high contention on the
global queue and this is not good. MIT's Cilk and Go scheduler and the Threadpool of Microsoft and Intel® C++ TBB are using this strategy of
"work depth-first; steal breadth-first". And as you are noticing that
they are giving more preference to cache locality than scalability.
But in my following invention of a Threadpool that scales very well i am
giving more preference to scalability than to cache locality:
https://sites.google.com/site/scalable68/an-efficient-threadpool-engine-with-priorities-that-scales-very-well
Other than that when you are doing IO with my Threadpool, you can
use asychronous IO by starting a dedicated thread to IO to be more
efficient, or you can start another of my Threadpool and use it for
tasks that uses IO, you can use the same method when threads of the my Threadpool are waiting or sleeping..
Other than that for recursion and the stack overflow problem
you can convert your function from a recursive to iterative
to solve the problem of stack overflow.
Other than that to be able to serve a great number of internet
connections or TCP/IP socket connections you can use my Threadpool
with my powerful Object oriented Stackful coroutines library for Delphi
and FreePascal here:
https://sites.google.com/site/scalable68/object-oriented-stackful-coroutines-library-for-delphi-and-freepascal
Thank you,
Amine Moulay Ramdane.
--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)