• More of my philosophy about AVX-512 and about Delphi 11.1 and more ,of

    From World-News2100@21:1/5 to All on Mon Apr 4 15:08:17 2022
    Hello,



    More of my philosophy about AVX-512 and about Delphi 11.1 and more
    of my thoughts..

    I am a white arab from Morocco, and i think i am smart since i have also invented many scalable algorithms and algorithms..


    I am also using Delphi and Freepascal compilers, and the new Delphi 11.1 compiler provides inline assembler (asm code) support for newer sets of instructions, including AVX2 (ymm) and AVX512 (zmm), you can read about
    it here:

    https://lecturepress.com/tech-journal/dev-tools/delphi-11-is-released/

    And the new Delphi 11.1 is here..

    Build Native Apps 5x Faster With One Codebase
    For Windows, Android, iOS, macOS, and Linux

    You can read more about it here:

    https://www.embarcadero.com/products/delphi

    More of my philosophy about AMD Zen 4 and more of my thoughts..

    I have just forgotten to talk about AVX-512 support in Zen 4,
    and Zen 4 will be out in 2022, so i think that Zen 4 will support
    AVX-512 and that's a good news, and you can read about it here:

    Gigabyte Leaks AMD Zen 4 Details: 5nm, AVX-512, 96 Cores, 12-Channel DDR5

    Read more here:

    https://www.extremetech.com/computing/325888-gigabyte-leaks-amd-zen-4-details-5nm-avx-512-96-cores-12-channel-ddr5


    And read my following previous thoughts:


    More of my philosophy about Intel's Alder Lake and about ARM and x86
    memory models and more of my thoughts..

    I think i am smart, and as you have just noticed, i have just talked
    about Epyc Zen 4 and Zen 5 and i have just talked about the network
    topology inside multicore CPUs etc. read them in my thoughts below, and
    i think that my talking about the network topology of multicore CPUs
    will still be valid if the new Intel's Alder Lake also becomes a server
    CPU like a Xeon or Epyc, so here is my thoughts about Intel Alder Lake
    and about ARM and x86 memory models:

    I think that the new Intel's Alder Lake is a winner, and i think that
    the performance/efficiency core design of Intel's Alder Lake could find
    its way into servers, workstations, or embedded IT systems as you can
    notice it by reading the following article:

    https://www.networkworld.com/article/3631072/will-intels-new-desktop-cpu-design-come-to-its-xeon-server-chips.html

    More of my philosophy about the ARM and x86 memory models and more
    of my thoughts..

    I think i am smart, and as you have just noticed i have just said
    that x86 is the future(read my below thoughts so that to understand why)
    , but i think that ARM architecture has another big defect, since its
    weak hardware memory model has not balanced correctly between safety or security and performance, so i think that it is a big defect in ARM,
    read carefully the following article about x86 TSO memory model:

    https://research.swtch.com/hwmm

    So notice that Intel says that it has well balanced between safety or
    security and performance by saying the following:

    "To address these problems, Owens et al. proposed the x86-TSO model,
    based on the earlier SPARCv8 TSO model. At the time they claimed that
    “To the best of our knowledge, x86-TSO is sound, is strong enough to
    program above, and is broadly in line with the vendors’ intentions.” A
    few months later Intel and AMD released new manuals broadly adopting
    this model."

    And read more here so that to understand that x86 TSO memory model is
    very good:

    https://jakob.engbloms.se/archives/1435


    So i think that ARM has a big defect since it has to provide
    with TSO memory model as RISC-V is providing it, since it is
    very important for the security or safety concerns

    More of my philosophy about the fight between x86 and ARM architectures
    and more of my thoughts..

    I invite you to read the following interesting article about the
    fight between x86 or x64 and ARM architectures

    ARM Servers on AWS: How to Save up to 30%

    Read more here:

    https://opsworks.co/arm-servers-on-aws-how-to-save-up-to-30/

    So notice that it says the following about ARM CPU architecture compared
    to x86 CPU architecture:

    "Running in a standard setting, Graviton2 performs 20% better, and the
    power consumption of the Arm core is about half that of other types of
    cores. Since the cost savings are also about 20%, performance-cost
    improvements reach 40%."

    But i think that the new Intel's Alder Lake is a new winner, since
    read the following article so that to notice:

    Intel's Alder Lake chip could speed PCs by 30% while saving battery power

    https://www.cnet.com/tech/computing/intels-alder-lake-chip-could-speed-pcs-by-30-while-saving-battery-power/

    Also here is the other way that is using Intel so that to fight ARM:

    Intel CEO says co-designed x86 chips will fend off Arm threat

    Read more here:

    https://www.pcgamer.com/intel-x86-vs-arm-gelsinger/

    So i think that x86 architecture is the future.

    And you can read my following thoughts about 3D stacking in CPUs and
    about EUV (extreme ultra violet) and about scalability and more in the following web link:

    https://groups.google.com/g/alt.culture.morocco/c/USMMhMB9WIE

    More of my philosophy about the next Epyc Zen 4 and Epyc Zen 5 CPUs and
    more of my thoughts..

    I have just read the following article about the next AMD EPYC Turin Zen
    5 CPUs Rumored To Feature Up To 256 Cores & 192 Core:

    https://wccftech.com/amd-epyc-turin-zen-5-cpus-rumored-to-feature-up-to-256-cores-192-core-configurations-max-600w-configurable-tdps/


    And notice the data in the above article, so i can say the following
    with my calculations:

    DDR5 will arrive with a minimum speed of at 4800Mbit/s, which works out
    to 76.8GB/s of bandwidth in a dual-channel configuration,
    and each CCX in Epyc Zen 4 and Zen 4C can be enabled as its own NUMA
    domain, so in the next AMD EPYC Genoa and AMD EPYC Bergamo CPUs there
    will be 12 NUMA nodes per socket, with respectively DDR5-5200 and
    DDR5-5600 support on those CPUs, so the AMD EPYC Genoa can support a
    memory bandwidth of 5.2 GT/s x 8 bytes per channel x 12 channels for one socket, and that equals 499.2 GB per second or 998.4 GB per second for
    two sockets, and the AMD EPYC Bergamo can support a memory bandwidth of
    5.6 GT/s x 8 bytes per channel x 12 channels for one socket, that equals
    537.6 GB per second or 1075.2 GB per second for two sockets, so as you
    notice that the memory bandwidth will become powerful on those kind of
    CPUs of Zen 4 and Zen 5, and the IPC gain from Zen 3 to Zen 4 is at
    around 20% and 40% Overall Performance Boost of Zen 4 over Zen 3, and
    Zen 5 will have 20-40% IPC increase over Zen 4, and for the network
    topology in those next multicores CPUs, you can read my following
    thoughts about it:

    More of my philosophy about the knee of an M/M/n queue and more..

    Here is the mathematical equation of the knee of an M/M/n queue in
    queuing theory in operational research:

    1/(n+1)^1/n

    n is the number of servers.

    So then an M/M/1 has a knee of 50% of the utilization, and the one of
    an M/M/2 is 0,578.

    More of my philosophy about the network topology in multicores CPUs..

    I invite you to look at the following video:

    Ring or Mesh, or other? AMD's Future on CPU Connectivity

    https://www.youtube.com/watch?v=8teWvMXK99I&t=904s

    And i invite you to read the following article:

    Does an AMD Chiplet Have a Core Count Limit?

    Read more here:

    https://www.anandtech.com/show/16930/does-an-amd-chiplet-have-a-core-count-limit

    I think i am smart and i say that the above video and the above article
    are not so smart, so i will talk about a very important thing, and it is
    the following, read the following:

    Performance Scalability of a Multi-core Web Server

    https://www.researchgate.net/publication/221046211_Performance_scalability_of_a_multi-core_web_server

    So notice carefully that it is saying the following:

    "..we determined that performance scaling was limited by the capacity of
    the address bus, which became saturated on all eight cores. If this key obstacle is addressed, commercial web server and systems software are well-positioned to scale to a large number of cores."

    So as you notice they were using an Intel Xeon of 8 cores, and the
    application was scalable to 8x but the hardware was not scalable to 8x,
    since it was scalable only to 4.8x, and this was caused by the bus
    saturation, since the Address bus saturation causes poor scaling, and
    the Address Bus carries requests and responses for data, called snoops,
    and more caches mean more sources and more destinations for snoops that
    is causing the poor scaling, so as you notice that a network topology of
    a Ring bus or a bus was not sufficient so that to scale to 8x on an
    Intel Xeon with 8 cores, so i think that the new architectures like Epyc
    CPU and Threadripper CPU can use a faster bus or/and a different network topology that permits to both ensure a full scalability locally in the
    same node and globally between the nodes, so then we can notice that a sophisticated mesh network topology not only permits to reduce the
    number of hops inside the CPU for good latency, but it is also good for reliability by using its sophisticated redundancy and it is faster than previous topologies like the ring bus or the bus since
    for example the search on address bus becomes parallelized, and it looks
    like the internet network that uses mesh topology using routers, so it parallelizes, and i also think that using a more sophisticated topology
    like a mesh network topology is related to queuing theory since we can
    notice that in operational research the mathematics says that we can
    make the queue like M/M/1 more efficient by making the server more
    powerful, but we can notice that the knee of a M/M/1 queue is around 50%
    , so we can notice that by using a mesh topology like internet or
    inside a CPU, you can by parallelizing more you can in operational
    research both enhance the knee of the queue and the speed of executing
    the transactions and it is like using many servers in queuing theory and
    it permits to scale better inside a CPU or in internet.

    More of my philosophy about DDR5 and the next Sapphire Rapids CPU of
    Intel and more of my thoughts..

    I will explain something very important:

    I invite you to read the following about the next Sapphire Rapids CPU of
    Intel here:

    Intel Provides Details About Sapphire Rapids CPU and Ponte Vecchio GPU

    https://www.hpcwire.com/off-the-wire/intel-unveils-details-about-sapphire-rapids-cpu-ponte-vecchio-gpu-ipu/

    So notice carefully that it says the following:

    "The processor is built to drive industry technology transitions with
    advanced memory and next generation I/O, including PCIe 5.0, CXL 1.1,
    DDR5 and HBM technologies."

    And notice that it says the same here:

    https://en.wikipedia.org/wiki/Sapphire_Rapids

    So the next Sapphire Rapids CPU of Intel will support DDR5 and HBM
    technologies for the memory subsystem, but i will say that CPUs like the
    kind of CPUs for computer servers have implemented ECC in their caches
    for at least a decade or so, and DDR5 memory subsystem implementations
    are useful for creating large capacities with modest bandwidth compared
    to HBM, and HBM, on the other hand, offers large bandwidth with low
    capacity, but i think that the problem with the next Sapphire Rapids CPU
    of Intel is that DDR5 has a problem that it is not fully ECC, read here
    to notice it:

    "On-die ECC: The presence of on-die ECC on DDR5 memory has been the
    subject of many discussions and a lot of confusion among consumers and
    the press alike. Unlike standard ECC, on-die ECC primarily aims to
    improve yields at advanced process nodes, thereby allowing for cheaper
    DRAM chips. On-die ECC only detects errors if they take place within a
    cell or row during refreshes. When the data is moved from the cell to
    the cache or the CPU, if there’s a bit-flip or data corruption, it won’t
    be corrected by on-die ECC. Standard ECC corrects data corruption within
    the cell and as it is moved to another device or an ECC-supported SoC."

    Read more here to notice it:

    https://www.hardwaretimes.com/ddr5-vs-ddr4-ram-quad-channel-and-on-die-ecc-explained/

    More of my philosophy about HP NonStop to x86 Server Platform
    fault-tolerant computer systems and more..

    Now HP to Extend HP NonStop to x86 Server Platform

    HP announced in 2013 plans to extend its mission-critical HP NonStop
    technology to x86 server architecture, providing the 24/7 availability
    required in an always-on, globally connected world, and increasing
    customer choice.

    Read the following to notice it:

    https://www8.hp.com/us/en/hp-news/press-release.html?id=1519347#.YHSXT-hKiM8

    And today HP provides HP NonStop to x86 Server Platform, and here is
    an example, read here:

    https://www.hpe.com/ca/en/pdfViewer.html?docId=4aa5-7443&parentPage=/ca/en/products/servers/mission-critical-servers/integrity-nonstop-systems&resourceTitle=HPE+NonStop+X+NS7+%E2%80%93+Redefining+continuous+availability+and+scalability+for+x86+data+sheet

    So i think programming the HP NonStop for x86 is now compatible with x86
    programming.



    Thank you,
    Amine Moulay Ramdane.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)