Hello,
More of my philosophy about the next Epyc Zen 4 and Epyc Zen 5 CPUs and
more of my thoughts..
I am a white arab from Morocco, and i think i am smart since i have also invented many scalable algorithms and algorithms..
I have just read the following article about the next AMD EPYC Turin Zen
5 CPUs Rumored To Feature Up To 256 Cores & 192 Core:
https://wccftech.com/amd-epyc-turin-zen-5-cpus-rumored-to-feature-up-to-256-cores-192-core-configurations-max-600w-configurable-tdps/
And notice the data in the above article, so i can say the following
with my calculations:
DDR5 will arrive with a minimum speed of at 4800Mbit/s, which works out
to 76.8GB/s of bandwidth in a dual-channel configuration,
and each CCX in Epyc Zen 4 and Zen 4C can be enabled as its own NUMA
domain, so in the next AMD EPYC Genoa and AMD EPYC Bergamo CPUs there
will be 12 NUMA nodes per socket, with respectively DDR5-5200 and
DDR5-5600 support on those CPUs, so the AMD EPYC Genoa can support a
memory bandwidth of 5.2 GT/s x 8 bytes per channel x 12 channels for one socket, and that equals 499.2 GB per second or 998.4 GB per second for
two sockets, and the AMD EPYC Bergamo can support a memory bandwidth of
5.6 GT/s x 8 bytes per channel x 12 channels for one socket, that equals
537.6 GB per second or 1075.2 GB per second for two sockets, so as you
notice that the memory bandwidth will become powerful on those kind of
CPUs of Zen 4 and Zen 5, and the IPC gain from Zen3 to Zen4 is at around
20% and 40% Overall Performance Boost of Zen 4 over Zen 3, and Zen 5
will have 20-40% IPC increase over Zen 4, and for the network topology
in those next multicores CPUs, you ca read my following thoughts about it:
More of my philosophy about the knee of an M/M/n queue and more..
Here is the mathematical equation of the knee of an M/M/n queue in
queuing theory in operational research:
1/(n+1)^1/n
n is the number of servers.
So then an M/M/1 has a knee of 50% of the utilization, and the one of
an M/M/2 is 0,578.
More of my philosophy about the network topology in multicores CPUs..
I invite you to look at the following video:
Ring or Mesh, or other? AMD's Future on CPU Connectivity
https://www.youtube.com/watch?v=8teWvMXK99I&t=904s
And i invite you to read the following article:
Does an AMD Chiplet Have a Core Count Limit?
Read more here:
https://www.anandtech.com/show/16930/does-an-amd-chiplet-have-a-core-count-limit
I think i am smart and i say that the above video and the above article
are not so smart, so i will talk about a very important thing, and it is
the following, read the following:
Performance Scalability of a Multi-core Web Server
https://www.researchgate.net/publication/221046211_Performance_scalability_of_a_multi-core_web_server
So notice carefully that it is saying the following:
"..we determined that performance scaling was limited by the capacity of
the address bus, which became saturated on all eight cores. If this key obstacle is addressed, commercial web server and systems software are well-positioned to scale to a large number of cores."
So as you notice they were using an Intel Xeon of 8 cores, and the
application was scalable to 8x but the hardware was not scalable to 8x,
since it was scalable only to 4.8x, and this was caused by the bus
saturation, since the Address bus saturation causes poor scaling, and
the Address Bus carries requests and responses for data, called snoops,
and more caches mean more sources and more destinations for snoops that
is causing the poor scaling, so as you notice that a network topology of
a Ring bus or a bus was not sufficient so that to scale to 8x on an
Intel Xeon with 8 cores, so i think that the new architectures like Epyc
CPU and Threadripper CPU can use a faster bus or/and a different network topology that permits to both ensure a full scalability locally in the
same node and globally between the nodes, so then we can notice that a sophisticated mesh network topology not only permits to reduce the
number of hops inside the CPU for good latency, but it is also good for reliability by using its sophisticated redundancy and it is faster than previous topologies like the ring bus or the bus since
for example the search on address bus becomes parallelized, and it looks
like the internet network that uses mesh topology using routers, so it parallelizes, and i also think that using a more sophisticated topology
like a mesh network topology is related to queuing theory since we can
notice that in operational research the mathematics says that we can
make the queue like M/M/1 more efficient by making the server more
powerful, but we can notice that the knee of a M/M/1 queue is around 50%
, so we can notice that by using in a mesh topology like internet or
inside a CPU you can by parallelizing more you can in operational
research both enhance the knee of the queue and the speed of executing
the transactions and it is like using many servers in queuing theory and
it permits to scale better inside a CPU or in internet.
More of my philosophy about DDR5 and the next Sapphire Rapids CPU of
Intel and more of my thoughts..
I will explain something very important:
I invite you to read the following about the next Sapphire Rapids CPU of
Intel here:
Intel Provides Details About Sapphire Rapids CPU and Ponte Vecchio GPU
https://www.hpcwire.com/off-the-wire/intel-unveils-details-about-sapphire-rapids-cpu-ponte-vecchio-gpu-ipu/
So notice carefully that it says the following:
"The processor is built to drive industry technology transitions with
advanced memory and next generation I/O, including PCIe 5.0, CXL 1.1,
DDR5 and HBM technologies."
And notice that it says the same here:
https://en.wikipedia.org/wiki/Sapphire_Rapids
So the next Sapphire Rapids CPU of Intel will support DDR5 and HBM
technologies for the memory subsystem, but i will say that CPUs like the
kind of CPUs for computer servers have implemented ECC in their caches
for at least a decade or so, and DDR5 memory subsystem implementations
are useful for creating large capacities with modest bandwidth compared
to HBM, and HBM, on the other hand, offers large bandwidth with low
capacity, but i think that the problem with the next Sapphire Rapids CPU
of Intel is that DDR5 has a problem that it is not fully ECC, read here
to notice it:
"On-die ECC: The presence of on-die ECC on DDR5 memory has been the
subject of many discussions and a lot of confusion among consumers and
the press alike. Unlike standard ECC, on-die ECC primarily aims to
improve yields at advanced process nodes, thereby allowing for cheaper
DRAM chips. On-die ECC only detects errors if they take place within a
cell or row during refreshes. When the data is moved from the cell to
the cache or the CPU, if there’s a bit-flip or data corruption, it won’t
be corrected by on-die ECC. Standard ECC corrects data corruption within
the cell and as it is moved to another device or an ECC-supported SoC."
Read more here to notice it:
https://www.hardwaretimes.com/ddr5-vs-ddr4-ram-quad-channel-and-on-die-ecc-explained/
More of my philosophy about HP NonStop to x86 Server Platform
fault-tolerant computer systems and more..
Now HP to Extend HP NonStop to x86 Server Platform
HP announced in 2013 plans to extend its mission-critical HP NonStop
technology to x86 server architecture, providing the 24/7 availability
required in an always-on, globally connected world, and increasing
customer choice.
Read the following to notice it:
https://www8.hp.com/us/en/hp-news/press-release.html?id=1519347#.YHSXT-hKiM8
And today HP provides HP NonStop to x86 Server Platform, and here is
an example, read here:
https://www.hpe.com/ca/en/pdfViewer.html?docId=4aa5-7443&parentPage=/ca/en/products/servers/mission-critical-servers/integrity-nonstop-systems&resourceTitle=HPE+NonStop+X+NS7+%E2%80%93+Redefining+continuous+availability+and+scalability+for+x86+data+sheet
So i think programming the HP NonStop for x86 is now compatible with x86
programming.
Thank you,
Amine Moulay Ramdane.
--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)