• More of my philosophy about technology and about Intel technology and m

    From Amine Moulay Ramdane@21:1/5 to All on Mon Dec 5 12:24:08 2022
    Hello,



    More of my philosophy about technology and about Intel technology and more of my thoughts..

    I am a white arab from Morocco, and i think i am smart since i have also invented many scalable algorithms and algorithms..


    Intel says it will squeeze 1 trillion transistors onto a chip package by 2030

    "Intel Corp. researchers this weekend revealed a number of technological innovations and concepts, including packaging improvements that could result in computer chips that are 10 times as powerful as today’s most advanced silicon."


    Read more here:

    https://siliconangle.com/2022/12/04/intel-says-will-squeeze-1-trillion-transistors-onto-chip-package-2030/


    More of my philosophy about the 12 memory channels of
    the new AMD Epyc Genoa CPU and more of my thoughts..

    I am a white arab from Morocco, and i think i am smart since i have also invented many scalable algorithms and algorithms..


    So as i am saying below, i think that so that to use 12 memory
    channels in parallel that supports it the new AMD Genoa CPU, the GMI-Wide mode must enlarge more and connects each CCD with more GMI links, so i think that it is what is doing AMD in its new 4 CCDs configuration, even with the cost optimized Epyc Genoa
    9124 16 cores with 64 MB of L3 cache with 4 Core Complex Dies (CCDs), that costs around $1000 (Look at it here: https://www.tomshardware.com/reviews/amd-4th-gen-epyc-genoa-9654-9554-and-9374f-review-96-cores-zen-4-and-5nm-disrupt-the-data-center ), and
    as i am explaining more below that the Core Complex Dies (CCDs) connect to memory, I/O, and each other through the I/O Die (IOD) and each CCD connects to the IOD via a dedicated high-speed, or Global Memory Interconnect (GMI) link and the IOD also
    contains memory channels, PCIe Gen5 lanes, and Infinity Fabric links and all dies, or chiplets, interconnect with each other via AMD’s Infinity Fabric Technology, and of course this will permit my new software project of Parallel C++ Conjugate Gradient
    Linear System Solver Library that scales very well to scale on the 12 memory channels, read my following thoughts so that to understand more about it:

    More of my philosophy about the new Zen 4 AMD Ryzen™ 9 7950X and more of my thoughts..


    So i have just looked at the new Zen 4 AMD Ryzen™ 9 7950X CPU, and i invite you to look at it here:

    https://www.amd.com/en/products/cpu/amd-ryzen-9-7950x

    But notice carefully that the problem is with the number of supported memory channels, since it just support two memory channels, so it is not good, since for example my following Open source software project of Parallel C++ Conjugate Gradient Linear
    System Solver Library that scales very well is scaling around 8X on my 16 cores Intel Xeon with 2 NUMA nodes and with 8 memory channels, but it will not scale correctly on the
    new Zen 4 AMD Ryzen™ 9 7950X CPU with just 2 memory channels since it is also memory-bound, and here is my Powerful Open source software project of Parallel C++ Conjugate Gradient Linear System Solver Library that scales very well and i invite you to
    take carefully a look at it:

    https://sites.google.com/site/scalable68/scalable-parallel-c-conjugate-gradient-linear-system-solver-library

    So i advice you to buy an AMD Epyc CPU or an Intel Xeon CPU that supports 8 memory channels.

    ---


    And of course you can use the next Twelve DDR5 Memory Channels for Zen 4 AMD EPYC CPUs so that to scale more my above algorithm, and read about it here:

    https://www.tomshardware.com/news/amd-confirms-12-ddr5-memory-channels-on-genoa


    And here is the simulation program that uses the probabilistic mechanism that i have talked about and that prove to you that my algorithm of my Parallel C++ Conjugate Gradient Linear System Solver Library is scalable:

    If you look at my scalable parallel algorithm, it is dividing the each array of the matrix by 250 elements, and if you look carefully i am using two functions that consumes the greater part of all the CPU, it is the atsub() and asub(), and inside those
    functions i am using a probabilistic mechanism so that to render my algorithm scalable on NUMA architecture , and it also make it scale on the memory channels, what i am doing is scrambling the array parts using a probabilistic function and what i have
    noticed that this probabilistic mechanism i