Hello,
More of my philosophy about the AMD Epyc CPU and more of my thoughts..
I am a white arab from Morocco, and i think i am smart since i have also invented many scalable algorithms and algorithms..
I think i am highly smart since I have passed two certified IQ tests and i have scored "above" 115 IQ, if you want to be serious about buying
a CPU and motherboard, i advice you to buy the following AMD Epyc 7313p Milan 16 cores CPU that costs much less(around 1000 US dollars) and that is reliable and fast, since it is a 16 cores CPU and it supports standard ECC memory and it supports 8 memory
channels, here it is:
https://en.wikichip.org/wiki/amd/epyc/7313p
And the good Supermicro motherboard for it that supports the Epyc Milan 7003 is the following:
https://www.newegg.com/supermicro-mbd-h12ssl-nt-o-supports-single-amd-epyc-7003-7002-series-processor/p/1B4-005W-00911?Description=amd%20epyc%20motherboard&cm_re=amd_epyc%20motherboard-_-1B4-005W-00911-_-Product
And the above AMD Epyc 7313p Milan 16 cores CPU can be configured
as NUMA using the good Supermicro motherboard above as following:
This setting enables a trade-off between minimizing local memory latency for NUMAaware or highly parallelizable workloads vs. maximizing per-core memory bandwidth for non-NUMA-friendly workloads. The default configuration (one NUMA domain per socket) is
recommended for most workloads. NPS4 is recommended for HPC and other highly parallel workloads.Here is the detail introduction for such options:
• NPS0: Interleave memory accesses across all channels in both sockets (not recommended)
• NPS1: Interleave memory accesses across all eight channels in each socket, report one NUMA node per socket (unless L3 Cache as NUMA is enabled)
• NPS2: Interleave memory accesses across groups of four channels (ABCD and EFGH) in each socket, report two NUMA nodes per socket (unless L3 Cache as NUMA is enabled)
• NPS4: Interleave memory accesses across pairs of two channels (AB, CD, EF and GH) in each socket, report four NUMA nodes per socket (unless L3 Cache as NUMA is enabled)
And of course you have to read my following writing about DDR5 memory that is not a fully ECC memory:
"On-die ECC: The presence of on-die ECC on DDR5 memory has been the subject of many discussions and a lot of confusion among consumers and the press alike. Unlike standard ECC, on-die ECC primarily aims to improve yields at advanced process nodes,
thereby allowing for cheaper DRAM chips. On-die ECC only detects errors if they take place within a cell or row during refreshes. When the data is moved from the cell to the cache or the CPU, if there’s a bit-flip or data corruption, it won’t be
corrected by on-die ECC. Standard ECC corrects data corruption within the cell and as it is moved to another device or an ECC-supported SoC."
Read more here to notice it:
https://www.hardwaretimes.com/ddr5-vs-ddr4-ram-quad-channel-and-on-die-ecc-explained/
So if you want to get serious and professional you can buy the above
AMD Epyc 7313p Milan 16 cores CPU with the Supermicro motherboard that supports it and that i am advicing and that supports the fully ECC memory and that supports 8 memory channels.
And of course you can read my thoughts about technology in the following web link:
https://groups.google.com/g/soc.culture.usa/c/N_UxX3OECX4
And of course you have to read my following thoughts that also show how powerful is to use 8 memory channels:
I have just said the following:
--
More of my philosophy about the new Zen 4 AMD Ryzen™ 9 7950X and more of my thoughts..
So i have just looked at the new Zen 4 AMD Ryzen™ 9 7950X CPU, and i invite you to look at it here:
https://www.amd.com/en/products/cpu/amd-ryzen-9-7950x
But notice carefully that the problem is with the number of supported memory channels, since it just support two memory channels, so it is not good, since for example my following Open source software project of Parallel C++ Conjugate Gradient Linear
System Solver Library that scales very well is scaling around 8X on my 16 cores Intel Xeon with 2 NUMA nodes and with 8 memory channels, but it will not scale correctly on the
new Zen 4 AMD Ryzen™ 9 7950X CPU with just 2 memory channels since it is also memory-bound, and here is my Powerful Open source software project of Parallel C++ Conjugate Gradient Linear System Solver Library that scales very well and i invite you to
take carefully a look at it:
https://sites.google.com/site/scalable68/scalable-parallel-c-conjugate-gradient-linear-system-solver-library
So i advice you to buy an AMD Epyc CPU or an Intel Xeon CPU that supports 8 memory channels.
---
And of course you can use the next Twelve DDR5 Memory Channels for Zen 4 AMD EPYC CPUs so that to scale more my above algorithm, and read about it here:
https://www.tomshardware.com/news/amd-confirms-12-ddr5-memory-channels-on-genoa
And here is the simulation program that uses the probabilistic mechanism that i have talked about and that prove to you that my algorithm of my Parallel C++ Conjugate Gradient Linear System Solver Library is scalable:
If you look at my scalable parallel algorithm, it is dividing the each array of the matrix by 250 elements, and if you look carefully i am using two functions that consumes the greater part of all the CPU, it is the atsub() and asub(), and inside those
functions i am using a probabilistic mechanism so that to render my algorithm scalable on NUMA architecture , and it also make it scale on the memory channels, what i am doing is scrambling the array parts using a probabilistic function and what i have
noticed that this probabilistic mechanism i