[continued from previous message]
If you look at my scalable parallel algorithm, it is dividing the each array of the matrix by 250 elements, and if you look carefully i am using two functions that consumes the greater part of all the CPU, it is the atsub() and asub(), and inside those
functions i am using a probabilistic mechanism so that to render my algorithm scalable on NUMA architecture , and it also make it scale on the memory channels, what i am doing is scrambling the array parts using a probabilistic function and what i have
noticed that this probabilistic mechanism is very efficient, to prove to you what i am saying , please look at the following simulation that i have done using a variable that contains the number of NUMA nodes, and what i have noticed that my simulation
is giving almost a perfect scalability on NUMA architecture, for example let us give to the "NUMA_nodes" variable a value of 4, and to our array a value of 250, the simulation bellow will give a number of contention points of a quarter of the array, so
if i am using 16 cores , in the worst case it will scale 4X throughput on NUMA architecture, because since i am using an array of 250 and there is a quarter of the array of contention points , so from the Amdahl's law this will give a scalability of
almost 4X throughput on four NUMA nodes, and this will give almost a perfect scalability on more and more NUMA nodes, so my parallel algorithm is scalable on NUMA architecture and it also scale well on the memory channels,
Here is the simulation that i have done, please run it and you will notice yourself that my parallel algorithm is scalable on NUMA architecture.
Here it is:
---
program test;
uses math;
var tab,tab1,tab2,tab3:array of integer;
a,n1,k,i,n2,tmp,j,numa_nodes:integer;
begin
a:=250;
Numa_nodes:=4;
setlength(tab2,a);
for i:=0 to a-1
do
begin
tab2:=i mod numa_nodes;
end;
setlength(tab,a);
randomize;
for k:=0 to a-1
do tab:=k;
n2:=a-1;
for k:=0 to a-1
do
begin
n1:=random(n2);
tmp:=tab;
tab:=tab[n1];
tab[n1]:=tmp;
end;
setlength(tab1,a);
randomize;
for k:=0 to a-1
do tab1:=k;
n2:=a-1;
for k:=0 to a-1
do
begin
n1:=random(n2);
tmp:=tab1;
tab1:=tab1[n1];
tab1[n1]:=tmp;
end;
for i:=0 to a-1
do
if tab2[tab]=tab2[tab1] then
begin
inc(j);
writeln('A contention at: ',i);
end;
writeln('Number of contention points: ',j);
setlength(tab,0);
setlength(tab1,0);
setlength(tab2,0);
end.
---
And i invite you to read my thoughts about technology here:
https://groups.google.com/g/soc.culture.usa/c/N_UxX3OECX4
More of my philosophy about the problem with capacity planning of a website and