• 64-bit embedded computing is here and now

    From James Brakefield@21:1/5 to All on Mon Jun 7 07:47:51 2021
    Sometimes things move faster than expected.
    As someone with an embedded background this caught me by surprise:

    Tera-Byte microSD cards are readily available and getting cheaper.
    Heck, you can carry ten of them in a credit card pouch.
    Likely to move to the same price range as hard disks ($20/TB).

    That means that a 2+ square inch PCB can hold a 64-bit processor and enough storage for memory mapped files larger than 4GB.

    Is the 32-bit embedded processor cost vulnerable to 64-bit 7nm devices as the FABs mature? Will video data move to the IOT edge? Will AI move to the edge? Will every embedded CPU have a built-in radio?

    Wait a few years and find out.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Don Y@21:1/5 to James Brakefield on Mon Jun 7 14:13:26 2021
    On 6/7/2021 7:47 AM, James Brakefield wrote:

    Sometimes things move faster than expected. As someone with an embedded background this caught me by surprise:

    Tera-Byte microSD cards are readily available and getting cheaper. Heck, you can carry ten of them in a credit card pouch. Likely to move to the same price range as hard disks ($20/TB).

    That means that a 2+ square inch PCB can hold a 64-bit processor and enough storage for memory mapped files larger than 4GB.

    Kind of old news. I've been developing on a SAMA5D36 platform with 256M of FLASH and 256M of DDR2 for 5 or 6 years, now. PCB is just over 2 sq in
    (but most of that being off-board connectors). Granted, it's a 32b processor but I'll be upgrading that to something "wider" before release (software and
    OS have been written for a 64b world -- previously waiting for costs to fall
    to make it as economical as the 32b was years ago; now waiting to see if I
    can leverage even MORE hardware-per-dollar!).

    Once you have any sort of connectivity, it becomes practical to support
    files larger than your physical memory -- just fault the appropriate
    page in over whatever interface(s) you have available (assuming you
    have other boxes that you can talk to/with)

    Is the 32-bit embedded processor cost vulnerable to 64-bit 7nm devices as
    the FABs mature? Will video data move to the IOT edge? Will AI move to the edge? Will every embedded CPU have a built-in radio?

    In my case, video is already *at* the edge. The idea of needing a
    "bigger host" or "the cloud" is already obsolescent. Even the need
    for bulk storage -- whether on-board (removable flash, as you suggest)
    or remotely served -- is dubious. How much persistent store do you
    really need, beyond your executables, in a typical application?

    I've decided that RAM is the bottleneck as you can't XIP out of
    an SD card...

    Radios? <shrug> Possibly as wireless is *so* much easier to
    interconnect than wired. But, you're still left with the power
    problem; even at a couple of watts, wall warts are unsightly
    and low voltage DC isn't readily available *everywhere* that
    you may want to site a device. (how many devices do you
    want tethered to a USB host before it starts to look a mess?)

    The bigger challenge is moving developers to think in terms of
    the capabilities that the hardware will afford. E.g., can
    you exploit *true* concurrency in your application? Or, will
    you "waste" a second core/thread context on some largely
    decoupled activity? How much capability will you be willing
    to sacrifice to your hosting OS -- and what NEW capabilities
    will it 0provide you?

    Wait a few years and find out.

    The wait won't even be *that* long...

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Paul Rubin on Tue Jun 8 07:59:53 2021
    On 08/06/2021 07:31, Paul Rubin wrote:
    James Brakefield <jim.brakefield@ieee.org> writes:
    Is the 32-bit embedded processor cost vulnerable to 64-bit 7nm devices
    as the FABs mature? Will video data move to the IOT edge? Will AI move
    to the edge? Will every embedded CPU have a built-in radio?

    I don't care what the people say--
    32 bits are here to stay.


    8-bit microcontrollers are still far more common than 32-bit devices in
    the embedded world (and 4-bit devices are not gone yet). At the other
    end, 64-bit devices have been used for a decade or two in some kinds of embedded systems.

    We'll see 64-bit take a greater proportion of the embedded systems that
    demand high throughput or processing power (network devices, hard cores
    in expensive FPGAs, etc.) where the extra cost in dollars, power,
    complexity, board design are not a problem. They will probably become
    more common in embedded Linux systems as the core itself is not usually
    the biggest part of the cost. And such systems are definitely on the
    increase.

    But for microcontrollers - which dominate embedded systems - there has
    been a lot to gain by going from 8-bit and 16-bit to 32-bit for little
    cost. There is almost nothing to gain from a move to 64-bit, but the
    cost would be a good deal higher. So it is not going to happen - at
    least not more than a very small and very gradual change.

    The OP sounds more like a salesman than someone who actually works with embedded development in reality.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul Rubin@21:1/5 to James Brakefield on Mon Jun 7 22:31:54 2021
    James Brakefield <jim.brakefield@ieee.org> writes:
    Is the 32-bit embedded processor cost vulnerable to 64-bit 7nm devices
    as the FABs mature? Will video data move to the IOT edge? Will AI move
    to the edge? Will every embedded CPU have a built-in radio?

    I don't care what the people say--
    32 bits are here to stay.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Don Y@21:1/5 to David Brown on Tue Jun 8 00:39:01 2021
    On 6/7/2021 10:59 PM, David Brown wrote:
    8-bit microcontrollers are still far more common than 32-bit devices in
    the embedded world (and 4-bit devices are not gone yet). At the other
    end, 64-bit devices have been used for a decade or two in some kinds of embedded systems.

    I contend that a good many "32b" implementations are really glorified
    8/16b applications that exhausted their memory space. I still see lots
    of designs build on a small platform (8/16b) and augment it -- either
    with some "memory enhancement" technology or additional "slave"
    processors to split the binaries. Code increases in complexity but
    there doesn't seem to be a need for the "work-per-unit-time" to.

    [This has actually been the case for a long time. The appeal of
    newer CPUs is often in the set of peripherals that accompany the
    processor, not the processor itself.]

    We'll see 64-bit take a greater proportion of the embedded systems that demand high throughput or processing power (network devices, hard cores
    in expensive FPGAs, etc.) where the extra cost in dollars, power,
    complexity, board design are not a problem. They will probably become
    more common in embedded Linux systems as the core itself is not usually
    the biggest part of the cost. And such systems are definitely on the increase.

    But for microcontrollers - which dominate embedded systems - there has
    been a lot to gain by going from 8-bit and 16-bit to 32-bit for little

    I disagree. The "cost" (barrier) that I see clients facing is the
    added complexity of a 32b platform and how it often implies (or even *requires*) a more formal OS underpinning the application. Where you
    could hack together something on bare metal in the 8/16b worlds,
    moving to 32 often requires additional complexity in managing
    mechanisms that aren't usually present in smaller CPUs (caches,
    MMU/MPU, DMA, etc.) Developers (and their organizations) can't just
    play "coder cowboy" and coerce the hardware to behaving as they
    would like. Existing staff (hired with the "bare metal" mindset)
    are often not equipped to move into a more structured environment.

    [I can hack together a device to meet some particular purpose
    much easier on "development hardware" than I can on a "PC" -- simply
    because there's too much I have to "work around" on a PC that isn't
    present on development hardware.]

    Not every product needs a filesystem, network stack, protected
    execution domains, etc. Those come with additional costs -- often
    in the form of a lack of understanding as to what the ACTUAL
    code in your product is doing at any given time. (this isn't the
    case in the smaller MCU world; it's possible for a developer to
    have written EVERY line of code in a smaller platform)

    cost. There is almost nothing to gain from a move to 64-bit, but the
    cost would be a good deal higher.

    Why is the cost "a good deal higher"? Code/data footprints don't
    uniformly "double" in size. The CPU doesn't slow down to handle
    bigger data.

    The cost is driven by where the market goes. Note how many 68Ks found design-ins vs. the T11, F11, 16032, etc. My first 32b design was
    physically large, consumed a boatload of power and ran at only a modest improvement (in terms of system clock) over 8b processors of its day.
    Now, I can buy two orders of magnitude more horsepower PLUS a
    bunch of built-in peripherals for two cups of coffee (at QTY 1)

    So it is not going to happen - at
    least not more than a very small and very gradual change.

    We got 32b processors NOT because the embedded world cried out for
    them but, rather, because of the influence of the 32b desktop world.
    We've had 32b processors since the early 80's. But, we've only had
    PCs since about the same timeframe! One assumes ubiquity in the
    desktop world would need to happen before any real spillover to embedded.
    (When the "desktop" was an '11 sitting in a back room, it wasn't seen
    as ubiquitous.)

    In the future, we'll see the 64b *phone* world drive the evolution
    of embedded designs, similarly. (do you really need 32b/64b to
    make a phone? how much code is actually executing at any given
    time and in how many different containers?)

    [The OP suggests MCus with radios -- maybe they'll be cell phone
    radios and *not* wifi/BLE as I assume he's thinking! Why add the
    need for some sort of access point to a product's deployment if
    the product *itself* can make a direct connection??]

    My current design can't fill a 32b address space (but, that's because
    I've decomposed apps to the point that they can be relatively small).
    OTOH, designing a system with a 32b limitation seems like an invitation
    to do it over when 64b is "cost effective". The extra "baggage" has
    proven to be relatively insignificant (I have ports of my codebase
    to SPARC as well as Atom running alongside a 32b ARM)

    The OP sounds more like a salesman than someone who actually works with embedded development in reality.

    Possibly. Or, just someone that wanted to stir up discussion...

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Don Y on Tue Jun 8 13:04:00 2021
    On 08/06/2021 09:39, Don Y wrote:
    On 6/7/2021 10:59 PM, David Brown wrote:
    8-bit microcontrollers are still far more common than 32-bit devices in
    the embedded world (and 4-bit devices are not gone yet).  At the other
    end, 64-bit devices have been used for a decade or two in some kinds of
    embedded systems.

    I contend that a good many "32b" implementations are really glorified
    8/16b applications that exhausted their memory space. 

    Sure. Previously you might have used 32 kB flash on an 8-bit device,
    now you can use 64 kB flash on a 32-bit device. The point is, you are
    /not/ going to find yourself hitting GB limits any time soon. The step
    from 8-bit or 16-bit to 32-bit is useful to get a bit more out of the
    system - the step from 32-bit to 64-bit is totally pointless for 99.99%
    of embedded systems. (Even for most embedded Linux systems, you usually
    only have a 64-bit cpu because you want bigger and faster, not because
    of memory limitations. It is only when you have a big gui with fast
    graphics that 32-bit address space becomes a limitation.)

    A 32-bit microcontroller is simply much easier to work with than an
    8-bit or 16-bit with "extended" or banked memory to get beyond 64 K
    address space limits.


    We'll see 64-bit take a greater proportion of the embedded systems that
    demand high throughput or processing power (network devices, hard cores
    in expensive FPGAs, etc.) where the extra cost in dollars, power,
    complexity, board design are not a problem.  They will probably become
    more common in embedded Linux systems as the core itself is not usually
    the biggest part of the cost.  And such systems are definitely on the
    increase.

    But for microcontrollers - which dominate embedded systems - there has
    been a lot to gain by going from 8-bit and 16-bit to 32-bit for little

    I disagree.  The "cost" (barrier) that I see clients facing is the
    added complexity of a 32b platform and how it often implies (or even *requires*) a more formal OS underpinning the application.

    Yes, that is definitely a cost in some cases - 32-bit microcontrollers
    are usually noticeably more complicated than 8-bit ones. How
    significant the cost is depends on the balances of the project between development costs and production costs, and how beneficial the extra functionality can be (like moving from bare metal to RTOS, or supporting networking).


    cost.  There is almost nothing to gain from a move to 64-bit, but the
    cost would be a good deal higher.

    Why is the cost "a good deal higher"?  Code/data footprints don't
    uniformly "double" in size.  The CPU doesn't slow down to handle
    bigger data.

    Some parts of code and data /do/ double in size - but not uniformly, of
    course. But your chip is bigger, faster, requires more power, has wider
    buses, needs more advanced memories, has more balls on the package,
    requires finer pitched pcb layouts, etc.

    In theory, you /could/ make a microcontroller in a 64-pin LQFP and
    replace the 72 MHz Cortex-M4 with a 64-bit ARM core at the same clock
    speed. The die would only cost two or three times more, and take
    perhaps less than 10 times the power for the core. But it would be so
    utterly pointless that no manufacturer would make such a device.

    So a move to 64-bit in practice means moving from a small, cheap, self-contained microcontroller to an embedded PC. Lots of new
    possibilities, lots of new costs of all kinds.

    Oh, and the cpu /could/ be slower for some tasks - bigger cpus that are optimised for throughput often have poorer latency and more jitter for interrupts and other time-critical features.


     So it is not going to happen - at
    least not more than a very small and very gradual change.

    We got 32b processors NOT because the embedded world cried out for
    them but, rather, because of the influence of the 32b desktop world.
    We've had 32b processors since the early 80's.  But, we've only had
    PCs since about the same timeframe!  One assumes ubiquity in the
    desktop world would need to happen before any real spillover to embedded. (When the "desktop" was an '11 sitting in a back room, it wasn't seen
    as ubiquitous.)

    I don't assume there is any direct connection between the desktop world
    and the embedded world - the needs are usually very different. There is
    a small overlap in the area of embedded devices with good networking and
    a gui, where similarity to the desktop world is useful.

    We have had 32-bit microcontrollers for decades. I used a 16-bit
    Windows system when working with my first 32-bit microcontroller. But
    at that time, 32-bit microcontrollers cost a lot more and required more
    from the board (external memories, more power, etc.) than 8-bit or
    16-bit devices. That has gradually changed with an almost total
    disregard for what has happened in the desktop world.

    Yes, the embedded world /did/ cry out for 32-bit microcontrollers for an increasing proportion of tasks. We cried many tears when then
    microcontroller manufacturers offered to give more flash space to their
    8-bit devices by having different memory models, banking, far jumps, and
    all the other shit that goes with not having a big enough address space.
    We cried out when we wanted to have Ethernet and the microcontroller
    only had a few KB of ram. I have used maybe 6 or 8 different 32-bit microcontroller processor architectures, and I used them because I
    needed them for the task. It's only in the past 5+ years that I have
    been using 32-bit microcontrollers for tasks that could be done fine
    with 8-bit devices, but the 32-bit devices are smaller, cheaper and
    easier to work with than the corresponding 8-bit parts.


    In the future, we'll see the 64b *phone* world drive the evolution
    of embedded designs, similarly.  (do you really need 32b/64b to
    make a phone?  how much code is actually executing at any given
    time and in how many different containers?)


    We will see that on devices that are, roughly speaking, tablets -
    embedded systems with a good gui, a touchscreen, networking. And that's
    fine. But these are a tiny proportion of the embedded devices made.


    The OP sounds more like a salesman than someone who actually works with
    embedded development in reality.

    Possibly.  Or, just someone that wanted to stir up discussion...


    Could be. And there's no harm in that!

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Theo@21:1/5 to David Brown on Tue Jun 8 15:46:22 2021
    David Brown <david.brown@hesbynett.no> wrote:
    But for microcontrollers - which dominate embedded systems - there has
    been a lot to gain by going from 8-bit and 16-bit to 32-bit for little
    cost. There is almost nothing to gain from a move to 64-bit, but the
    cost would be a good deal higher. So it is not going to happen - at
    least not more than a very small and very gradual change.

    I think there will be divergence about what people mean by an N-bit system:

    Register size
    Unit of logical/arithmetical processing
    Memory address/pointer size
    Memory bus/cache width

    I think we will increasingly see parts which have different sizes on one
    area but not the other.

    For example, for doing some kinds of logical operations (eg crypto), having 64-bit registers and ALU makes sense, but you might only need kilobytes of memory so only have <32 address bits.

    For something else, like a microcontroller that's hung off the side of a
    bigger system (eg the MCU on a PCIe card) you might want the ability to
    handle 64 bit addresses but don't need to pay the price for 64-bit
    registers.

    Or you might operate with 16 or 32 bit wide external RAM chip, but your
    cache could extend that to a wider word width.

    There are many permutations, and I think people will pay the cost where it benefits them and not where it doesn't.

    This is not a new phenomenon, of course. But for a time all these numbers
    were in the range between 16 and 32 bits, which made 32 simplest all round. Just like we previously had various 8/16 hybrids (eg 8 bit datapath, 16 bit address) I think we're going to see more 32/64 hybrids.

    Theo

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Brakefield@21:1/5 to Don Y on Tue Jun 8 12:38:44 2021
    On Tuesday, June 8, 2021 at 2:39:29 AM UTC-5, Don Y wrote:
    On 6/7/2021 10:59 PM, David Brown wrote:
    8-bit microcontrollers are still far more common than 32-bit devices in
    the embedded world (and 4-bit devices are not gone yet). At the other
    end, 64-bit devices have been used for a decade or two in some kinds of embedded systems.
    I contend that a good many "32b" implementations are really glorified
    8/16b applications that exhausted their memory space. I still see lots
    of designs build on a small platform (8/16b) and augment it -- either
    with some "memory enhancement" technology or additional "slave"
    processors to split the binaries. Code increases in complexity but
    there doesn't seem to be a need for the "work-per-unit-time" to.

    [This has actually been the case for a long time. The appeal of
    newer CPUs is often in the set of peripherals that accompany the
    processor, not the processor itself.]
    We'll see 64-bit take a greater proportion of the embedded systems that demand high throughput or processing power (network devices, hard cores
    in expensive FPGAs, etc.) where the extra cost in dollars, power, complexity, board design are not a problem. They will probably become
    more common in embedded Linux systems as the core itself is not usually
    the biggest part of the cost. And such systems are definitely on the increase.

    But for microcontrollers - which dominate embedded systems - there has
    been a lot to gain by going from 8-bit and 16-bit to 32-bit for little
    I disagree. The "cost" (barrier) that I see clients facing is the
    added complexity of a 32b platform and how it often implies (or even *requires*) a more formal OS underpinning the application. Where you
    could hack together something on bare metal in the 8/16b worlds,
    moving to 32 often requires additional complexity in managing
    mechanisms that aren't usually present in smaller CPUs (caches,
    MMU/MPU, DMA, etc.) Developers (and their organizations) can't just
    play "coder cowboy" and coerce the hardware to behaving as they
    would like. Existing staff (hired with the "bare metal" mindset)
    are often not equipped to move into a more structured environment.

    [I can hack together a device to meet some particular purpose
    much easier on "development hardware" than I can on a "PC" -- simply
    because there's too much I have to "work around" on a PC that isn't
    present on development hardware.]

    Not every product needs a filesystem, network stack, protected
    execution domains, etc. Those come with additional costs -- often
    in the form of a lack of understanding as to what the ACTUAL
    code in your product is doing at any given time. (this isn't the
    case in the smaller MCU world; it's possible for a developer to
    have written EVERY line of code in a smaller platform)
    cost. There is almost nothing to gain from a move to 64-bit, but the
    cost would be a good deal higher.
    Why is the cost "a good deal higher"? Code/data footprints don't
    uniformly "double" in size. The CPU doesn't slow down to handle
    bigger data.

    The cost is driven by where the market goes. Note how many 68Ks found design-ins vs. the T11, F11, 16032, etc. My first 32b design was
    physically large, consumed a boatload of power and ran at only a modest improvement (in terms of system clock) over 8b processors of its day.
    Now, I can buy two orders of magnitude more horsepower PLUS a
    bunch of built-in peripherals for two cups of coffee (at QTY 1)
    So it is not going to happen - at
    least not more than a very small and very gradual change.
    We got 32b processors NOT because the embedded world cried out for
    them but, rather, because of the influence of the 32b desktop world.
    We've had 32b processors since the early 80's. But, we've only had
    PCs since about the same timeframe! One assumes ubiquity in the
    desktop world would need to happen before any real spillover to embedded. (When the "desktop" was an '11 sitting in a back room, it wasn't seen
    as ubiquitous.)

    In the future, we'll see the 64b *phone* world drive the evolution
    of embedded designs, similarly. (do you really need 32b/64b to
    make a phone? how much code is actually executing at any given
    time and in how many different containers?)

    [The OP suggests MCus with radios -- maybe they'll be cell phone
    radios and *not* wifi/BLE as I assume he's thinking! Why add the
    need for some sort of access point to a product's deployment if
    the product *itself* can make a direct connection??]

    My current design can't fill a 32b address space (but, that's because
    I've decomposed apps to the point that they can be relatively small).
    OTOH, designing a system with a 32b limitation seems like an invitation
    to do it over when 64b is "cost effective". The extra "baggage" has
    proven to be relatively insignificant (I have ports of my codebase
    to SPARC as well as Atom running alongside a 32b ARM)
    The OP sounds more like a salesman than someone who actually works with embedded development in reality.
    Possibly. Or, just someone that wanted to stir up discussion...

    I contend that a good many "32b" implementations are really glorified
    8/16b applications that exhausted their memory space.

    The only thing that will take more than 4GB is video or a day's worth of photos.
    So there is likely to be some embedded aps that need a > 32-bit address space. Cost, size or storage capacity are no longer limiting factors.

    Am trying to puzzle out what a 64-bit embedded processor should look like.
    At the low end, yeah, a simple RISC processor. And support for complex arithmetic
    using 32-bit floats? And support for pixel alpha blending using quad 16-bit numbers?
    32-bit pointers into the software?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to All on Tue Jun 8 22:11:18 2021
    On 08/06/2021 21:38, James Brakefield wrote:

    Could you explain your background here, and what you are trying to get
    at? That would make it easier to give you better answers.

    The only thing that will take more than 4GB is video or a day's worth of photos.

    No, video is not the only thing that takes 4GB or more. But it is,
    perhaps, one of the more common cases. Most embedded systems don't need anything remotely like that much memory - to the nearest percent, 100%
    of embedded devices don't even need close to 4MB of memory (ram and
    flash put together).

    So there is likely to be some embedded aps that need a > 32-bit address space.

    Some, yes. Many, no.

    Cost, size or storage capacity are no longer limiting factors.

    Cost and size (and power) are /always/ limiting factors in embedded systems.


    Am trying to puzzle out what a 64-bit embedded processor should look like.

    There are plenty to look at. There are ARMs, PowerPC, MIPS, RISC-V.
    And of course there are some x86 processors used in embedded systems.

    At the low end, yeah, a simple RISC processor.

    Pretty much all processors except x86 and brain-dead old-fashioned 8-bit
    CISC devices are RISC. Not all are simple.

    And support for complex arithmetic
    using 32-bit floats?

    A 64-bit processor will certainly support 64-bit doubles as well as
    32-bit floats. Complex arithmetic is rarely needed, except perhaps for
    FFT's, but is easily done using real arithmetic. You can happily do
    32-bit complex arithmetic on an 8-bit AVR, albeit taking significant
    code space and run time. I believe the latest gcc for the AVR will do
    64-bit doubles as well - using exactly the same C code you would on any
    other processor.

    And support for pixel alpha blending using quad 16-bit numbers?

    You would use a hardware 2D graphics accelerator for that, not the
    processor.

    32-bit pointers into the software?


    With 64-bit processors you usually use 64-bit pointers.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dimiter_Popoff@21:1/5 to David Brown on Tue Jun 8 23:39:24 2021
    On 6/8/2021 23:18, David Brown wrote:
    On 08/06/2021 16:46, Theo wrote:
    ......

    Memory bus/cache width

    No, that is not a common way to measure cpu "width", for many reasons.
    A chip is likely to have many buses outside the cpu core itself (and the cache(s) may or may not be considered part of the core). It's common to
    have 64-bit wide buses on 32-bit processors, it's also common to have
    16-bit external databuses on a microcontroller. And the cache might be
    128 bits wide.

    I agree with your points and those of Theo, but the cache is basically
    as wide as the registers? Logically, that is; a cacheline is several
    times that, probably you refer to that.
    Not that it makes much of a difference to the fact that 64 bit data buses/registers in an MCU (apart from FPU registers, 32 bit FPUs are
    useless to me) are unlikely to attract much interest, nothing of
    significance to be gained as you said.
    To me 64 bit CPUs are of interest of course and thankfully there are
    some available, but this goes somewhat past what we call "embedded".
    Not long ago in a chat with a guy who knew some of ARM 64 bit I gathered
    there is some real mess with their out of order execution, one needs to
    do... hmmmm.. "sync", whatever they call it, all the time and there is
    a huge performance cost because of that. Anybody heard anything about
    it? (I only know what I was told).

    Dimiter

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Theo on Tue Jun 8 22:18:23 2021
    On 08/06/2021 16:46, Theo wrote:
    David Brown <david.brown@hesbynett.no> wrote:
    But for microcontrollers - which dominate embedded systems - there has
    been a lot to gain by going from 8-bit and 16-bit to 32-bit for little
    cost. There is almost nothing to gain from a move to 64-bit, but the
    cost would be a good deal higher. So it is not going to happen - at
    least not more than a very small and very gradual change.

    I think there will be divergence about what people mean by an N-bit system:

    There has always been different ways to measure the width of a cpu, and different people have different preferences.


    Register size

    Yes, that is common.

    Unit of logical/arithmetical processing

    As is that. Sometimes the width supported by general instructions
    differs from the ALU width, however, resulting in classifications like
    8/16-bit for the Z80 and 16/32-bit for the 68000.

    Memory address/pointer size

    Yes, also common.

    Memory bus/cache width

    No, that is not a common way to measure cpu "width", for many reasons.
    A chip is likely to have many buses outside the cpu core itself (and the cache(s) may or may not be considered part of the core). It's common to
    have 64-bit wide buses on 32-bit processors, it's also common to have
    16-bit external databuses on a microcontroller. And the cache might be
    128 bits wide.


    I think we will increasingly see parts which have different sizes on one
    area but not the other.


    That has always been the case.

    For example, for doing some kinds of logical operations (eg crypto), having 64-bit registers and ALU makes sense, but you might only need kilobytes of memory so only have <32 address bits.

    You need quite a few KB of ram for more serious cryptography. But it
    sounds more like you are talking about SIMD or vector operations here,
    which are not considered part of the "normal" width of the cpu. Modern
    x86 cpus might have 512 bit SIMD registers - but they are still 64-bit processors.

    But you are right that you might want some parts of the system to be
    wider and other parts thinner.


    For something else, like a microcontroller that's hung off the side of a bigger system (eg the MCU on a PCIe card) you might want the ability to handle 64 bit addresses but don't need to pay the price for 64-bit
    registers.

    Or you might operate with 16 or 32 bit wide external RAM chip, but your
    cache could extend that to a wider word width.

    There are many permutations, and I think people will pay the cost where it benefits them and not where it doesn't.


    Agreed.

    This is not a new phenomenon, of course. But for a time all these numbers were in the range between 16 and 32 bits, which made 32 simplest all round. Just like we previously had various 8/16 hybrids (eg 8 bit datapath, 16 bit address) I think we're going to see more 32/64 hybrids.


    32-bit processors have often had 64-bit registers for floating point,
    and 64-bit operations of various sorts. It is not new.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Brakefield@21:1/5 to David Brown on Tue Jun 8 14:25:21 2021
    On Tuesday, June 8, 2021 at 3:11:24 PM UTC-5, David Brown wrote:
    On 08/06/2021 21:38, James Brakefield wrote:

    Could you explain your background here, and what you are trying to get
    at? That would make it easier to give you better answers.
    The only thing that will take more than 4GB is video or a day's worth of photos.
    No, video is not the only thing that takes 4GB or more. But it is,
    perhaps, one of the more common cases. Most embedded systems don't need anything remotely like that much memory - to the nearest percent, 100%
    of embedded devices don't even need close to 4MB of memory (ram and
    flash put together).
    So there is likely to be some embedded aps that need a > 32-bit address space.
    Some, yes. Many, no.
    Cost, size or storage capacity are no longer limiting factors.
    Cost and size (and power) are /always/ limiting factors in embedded systems.

    Am trying to puzzle out what a 64-bit embedded processor should look like.
    There are plenty to look at. There are ARMs, PowerPC, MIPS, RISC-V.
    And of course there are some x86 processors used in embedded systems.
    At the low end, yeah, a simple RISC processor.
    Pretty much all processors except x86 and brain-dead old-fashioned 8-bit
    CISC devices are RISC. Not all are simple.
    And support for complex arithmetic
    using 32-bit floats?
    A 64-bit processor will certainly support 64-bit doubles as well as
    32-bit floats. Complex arithmetic is rarely needed, except perhaps for
    FFT's, but is easily done using real arithmetic. You can happily do
    32-bit complex arithmetic on an 8-bit AVR, albeit taking significant
    code space and run time. I believe the latest gcc for the AVR will do
    64-bit doubles as well - using exactly the same C code you would on any
    other processor.
    And support for pixel alpha blending using quad 16-bit numbers?
    You would use a hardware 2D graphics accelerator for that, not the
    processor.
    32-bit pointers into the software?

    With 64-bit processors you usually use 64-bit pointers.

    Could you explain your background here, and what you are trying to get
    at?

    Am familiar with embedded systems, image processing and scientific applications.
    Have used a number of 8, 16, 32 and ~64bit processors. Have also done work in
    FPGAs. Am semi-retired and when working was always trying to stay ahead of
    new opportunities and challenges.

    Some of my questions/comments belong over at comp.arch

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dimiter_Popoff@21:1/5 to James Brakefield on Wed Jun 9 01:01:21 2021
    On 6/8/2021 22:38, James Brakefield wrote:
    On Tuesday, June 8, 2021 at 2:39:29 AM UTC-5, Don Y wrote:
    On 6/7/2021 10:59 PM, David Brown wrote:
    8-bit microcontrollers are still far more common than 32-bit devices in
    the embedded world (and 4-bit devices are not gone yet). At the other
    end, 64-bit devices have been used for a decade or two in some kinds of
    embedded systems.
    I contend that a good many "32b" implementations are really glorified
    8/16b applications that exhausted their memory space. I still see lots
    of designs build on a small platform (8/16b) and augment it -- either
    with some "memory enhancement" technology or additional "slave"
    processors to split the binaries. Code increases in complexity but
    there doesn't seem to be a need for the "work-per-unit-time" to.

    [This has actually been the case for a long time. The appeal of
    newer CPUs is often in the set of peripherals that accompany the
    processor, not the processor itself.]
    We'll see 64-bit take a greater proportion of the embedded systems that
    demand high throughput or processing power (network devices, hard cores
    in expensive FPGAs, etc.) where the extra cost in dollars, power,
    complexity, board design are not a problem. They will probably become
    more common in embedded Linux systems as the core itself is not usually
    the biggest part of the cost. And such systems are definitely on the
    increase.

    But for microcontrollers - which dominate embedded systems - there has
    been a lot to gain by going from 8-bit and 16-bit to 32-bit for little
    I disagree. The "cost" (barrier) that I see clients facing is the
    added complexity of a 32b platform and how it often implies (or even
    *requires*) a more formal OS underpinning the application. Where you
    could hack together something on bare metal in the 8/16b worlds,
    moving to 32 often requires additional complexity in managing
    mechanisms that aren't usually present in smaller CPUs (caches,
    MMU/MPU, DMA, etc.) Developers (and their organizations) can't just
    play "coder cowboy" and coerce the hardware to behaving as they
    would like. Existing staff (hired with the "bare metal" mindset)
    are often not equipped to move into a more structured environment.

    [I can hack together a device to meet some particular purpose
    much easier on "development hardware" than I can on a "PC" -- simply
    because there's too much I have to "work around" on a PC that isn't
    present on development hardware.]

    Not every product needs a filesystem, network stack, protected
    execution domains, etc. Those come with additional costs -- often
    in the form of a lack of understanding as to what the ACTUAL
    code in your product is doing at any given time. (this isn't the
    case in the smaller MCU world; it's possible for a developer to
    have written EVERY line of code in a smaller platform)
    cost. There is almost nothing to gain from a move to 64-bit, but the
    cost would be a good deal higher.
    Why is the cost "a good deal higher"? Code/data footprints don't
    uniformly "double" in size. The CPU doesn't slow down to handle
    bigger data.

    The cost is driven by where the market goes. Note how many 68Ks found
    design-ins vs. the T11, F11, 16032, etc. My first 32b design was
    physically large, consumed a boatload of power and ran at only a modest
    improvement (in terms of system clock) over 8b processors of its day.
    Now, I can buy two orders of magnitude more horsepower PLUS a
    bunch of built-in peripherals for two cups of coffee (at QTY 1)
    So it is not going to happen - at
    least not more than a very small and very gradual change.
    We got 32b processors NOT because the embedded world cried out for
    them but, rather, because of the influence of the 32b desktop world.
    We've had 32b processors since the early 80's. But, we've only had
    PCs since about the same timeframe! One assumes ubiquity in the
    desktop world would need to happen before any real spillover to embedded.
    (When the "desktop" was an '11 sitting in a back room, it wasn't seen
    as ubiquitous.)

    In the future, we'll see the 64b *phone* world drive the evolution
    of embedded designs, similarly. (do you really need 32b/64b to
    make a phone? how much code is actually executing at any given
    time and in how many different containers?)

    [The OP suggests MCus with radios -- maybe they'll be cell phone
    radios and *not* wifi/BLE as I assume he's thinking! Why add the
    need for some sort of access point to a product's deployment if
    the product *itself* can make a direct connection??]

    My current design can't fill a 32b address space (but, that's because
    I've decomposed apps to the point that they can be relatively small).
    OTOH, designing a system with a 32b limitation seems like an invitation
    to do it over when 64b is "cost effective". The extra "baggage" has
    proven to be relatively insignificant (I have ports of my codebase
    to SPARC as well as Atom running alongside a 32b ARM)
    The OP sounds more like a salesman than someone who actually works with
    embedded development in reality.
    Possibly. Or, just someone that wanted to stir up discussion...

    I contend that a good many "32b" implementations are really glorified 8/16b applications that exhausted their memory space.

    The only thing that will take more than 4GB is video or a day's worth of photos.
    So there is likely to be some embedded aps that need a > 32-bit address space.
    Cost, size or storage capacity are no longer limiting factors.

    Am trying to puzzle out what a 64-bit embedded processor should look like.
    At the low end, yeah, a simple RISC processor. And support for complex arithmetic
    using 32-bit floats? And support for pixel alpha blending using quad 16-bit numbers?
    32-bit pointers into the software?


    The real value in 64 bit integer registers and 64 bit address space is
    just that, having an orthogonal "endless" space (well I remember some
    30 years ago 32 bits seemed sort of "endless" to me...).

    Not needing to assign overlapping logical addresses to anything
    can make a big difference to how the OS is done.

    32 bit FPU seems useless to me, 64 bit is OK. Although 32 FP
    *numbers* can be quite useful for storing/passing data.

    Dimiter

    ======================================================
    Dimiter Popoff, TGI http://www.tgi-sci.com ====================================================== http://www.flickr.com/photos/didi_tgi/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Don Y@21:1/5 to Theo on Tue Jun 8 16:00:54 2021
    On 6/8/2021 7:46 AM, Theo wrote:
    David Brown <david.brown@hesbynett.no> wrote:
    But for microcontrollers - which dominate embedded systems - there has
    been a lot to gain by going from 8-bit and 16-bit to 32-bit for little
    cost. There is almost nothing to gain from a move to 64-bit, but the
    cost would be a good deal higher. So it is not going to happen - at
    least not more than a very small and very gradual change.

    I think there will be divergence about what people mean by an N-bit system:

    Register size
    Unit of logical/arithmetical processing
    Memory address/pointer size
    Memory bus/cache width

    (General) Register size is the primary driver.

    A processor can have very different "size" subcomponents.
    E.g., a Z80 is an 8b processor -- registers are nominally 8b.
    However, it support 16b operations -- on register PAIRs
    (an implicit acknowledgement that the REGISTER is smaller
    than the register pair). This is common on many smaller
    processors. The address space is 16b -- with a separate 16b
    address space for I/Os. The Z180 extends the PHYSICAL
    address space to 20b but the logical address space
    remains unchanged at 16b (if you want to specify a physical
    address, you must use 20+ bits to represent it -- and invoke
    a separate mechanism to access it!). The ALU is *4* bits.

    Cache? Which one? I or D? L1/2/3/?

    What about the oddballs -- 12b? 1b?

    I think we will increasingly see parts which have different sizes on one
    area but not the other.

    For example, for doing some kinds of logical operations (eg crypto), having 64-bit registers and ALU makes sense, but you might only need kilobytes of memory so only have <32 address bits.

    That depends on the algorithm chosen and the hardware support available.

    For something else, like a microcontroller that's hung off the side of a bigger system (eg the MCU on a PCIe card) you might want the ability to handle 64 bit addresses but don't need to pay the price for 64-bit
    registers.

    Or you might operate with 16 or 32 bit wide external RAM chip, but your
    cache could extend that to a wider word width.

    There are many permutations, and I think people will pay the cost where it benefits them and not where it doesn't.

    But you don't buy MCUs with a-la-carte pricing. How much does an extra
    timer cost me? What if I want it to also serve as a *counter*? What
    cost for 100K of internal ROM? 200K?

    [It would be an interesting exercise to try to do a linear analysis of
    product prices with an idea of trying to tease out the "costs" (to
    the developer) for each feature in EXISTING products!]

    Instead, you see a *price* that is reflective of how widely used the
    device happens to be, today. You are reliant on the preferences of others
    to determine which is the most cost effective product -- for *you*.

    E.g., most of my devices have no "display" -- yet, the MCU I've chosen
    has hardware support for same. It would obviously cost me more to
    select a device WITHOUT that added capability -- because most
    purchasers *want* a display (and *they* drive the production economies).

    I could, potentially, use a 2A03 for some applications. But, the "TCO"
    of such an approach would exceed that of a 32b (or larger) processor!

    [What a crazy world!]

    This is not a new phenomenon, of course. But for a time all these numbers were in the range between 16 and 32 bits, which made 32 simplest all round. Just like we previously had various 8/16 hybrids (eg 8 bit datapath, 16 bit address) I think we're going to see more 32/64 hybrids.

    Theo


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Don Y@21:1/5 to James Brakefield on Tue Jun 8 16:51:56 2021
    On 6/8/2021 12:38 PM, James Brakefield wrote:

    I contend that a good many "32b" implementations are really glorified 8/16b applications that exhausted their memory space.

    The only thing that will take more than 4GB is video or a day's worth of photos.

    That's not true. For example, I rely on a "PC" in my current design
    to support the RDBMS. Otherwise, I would have to design a "special
    node" (I have a distributed system) that had the resources necessary
    to process multiple concurrent queries in a timely fashion; I can
    put 100GB of RAM in a PC (whereas my current nodes only have 256MB).

    The alternative is to rely on secondary (disk) storage -- which is
    even worse!

    And "video" is incredibly nondescript. It conjures ideas of STBs.
    Instead, I see a wider range of applications in terms of *vision*.

    E.g., let your doorbell camera "notice motion", recognize that
    motion as indicative of someone/thing approaching it (e.g.,
    a visitor), recognize the face/features of the visitor and
    alert you to its presence (if desired). No need to involve a
    cloud service to do this.

    [My "doorbell" is a camera/microphone/speaker. *If* I want to
    know that you are present, *it* will tell me. Or, if told to
    do so, will grant you access to the house (even in my absence).
    For "undesirables", I'm mounting a coin mechanism adjacent to
    the entryway (our front door is protected by a gated porch area):
    "Deposit 25c to ring bell. If we want to talk to you, your
    deposit will be refunded. If *not*, consider that the *cost* of
    pestering us!"]

    There are surveillance cameras discretely placed around the exterior
    of the house (don't want the place to look like a frigging *bank*!).
    One of them has a clear view of the mailbox (our mail is delivered
    via lettercarriers riding in mail trucks). Same front door camera
    hardware. But, now: detect motion; detect motion STOPPING
    proximate to mailbox (for a few seconds or more); detect motion
    resuming; signal "mail available". Again, no need to involve a
    cloud service to accomplish this. And, when not watching for mail
    delivery, it's performing "general" surveillance -- mail detection
    is a "free bonus"!

    Imagine designing a vision-based inspection system where you "train"
    the CAMERA -- instead of some box that the camera connects to. And,
    the CAMERA signals accept/reject directly.

    [I use a boatload of cameras, here; they are cheap sensors -- the
    "cost" lies in the signal processing!]

    So there is likely to be some embedded aps that need a > 32-bit address space.
    Cost, size or storage capacity are no longer limiting factors.

    No, cost size and storage are ALWAYS limiting factors!

    E.g., each of my nodes derive power from the wired network connection.
    That puts a practical limit of ~12W on what a node can dissipate.
    That has to support the processing core plus any local I/Os! Note
    that dissipated power == heat. So, one also has to be conscious of
    how that heat will affect the devices' environs.

    (Yes, there are schemes to increase this to ~100W but now the cost
    of providing power -- and BACKUP power -- to a remote device starts
    to be a sizeable portion of the product's cost and complexity).

    My devices are intended to be "invisible" to the user -- so, they
    have to hide *inside* something (most commonly, the walls or
    ceiling -- in standard Jboxes for accessibility and Code compliance).
    So, that limits their size/volume (mine are about the volume of a
    standard duplex receptacle -- 3 cu in -- so fit in even the smallest
    of 1G boxes... even pancake boxes!)

    They have to be inexpensive so I can justify using LOTS of them
    (I will have 240 deployed, here; my industrial beta site will have
    over 1000; commercial beta site almost a similar number). Not only
    is the cost of initial acquisition of concern, but also the *perceived*
    cost of maintaining the hardware in a functional state (customer
    doesn't want to have $10K of spares on hand for rapid incident response
    and staff to be able to diagnose and repair/replace "on demand")

    In my case, I sidestep the PERSISTENT storage issue by relegating that
    to the RDBMS. In *that* domain, I can freely add spinning rust or
    an SSD without complicating the design of the rest of the nodes.
    So, "storage" becomes:
    - how much do I need for a secure bootstrap
    - how much do I need to contain a downloaded (from the RDBMS!) binary
    - how much do I need to keep "local runtime resources"
    - how much can I exploit surplus capacity *elsewhere* in the system
    to address transient needs

    Imagine what it would be like having to replace "worn" SD cards
    at some frequency in hundreds of devices scattered around hundreds
    of "invisible" places! Almost as bad as replacing *batteries* in
    those devices!

    [Have you ever had an SD card suddenly write protect itself?]

    Am trying to puzzle out what a 64-bit embedded processor should look like.

    "Should"? That depends on what you expect it to do for you.
    The nonrecurring cost of development will become an ever-increasing
    portion of the device's "cost". If you sell 10K units but spend
    500K on development (over its lifetime), you've justification for
    spending a few more dollars on recurring costs *if* you can realize
    a reduction in development/maintenance costs (because the development
    is easier, bugs are fewer/easier to find, etc.)

    Developers (and silicon vendors, as Good Business Practice)
    will look at their code and see what's "hard" to do, efficiently.
    Then, consider mechanisms that could make that easier or
    more effective.

    I see the addition of hardware features that enhance the robustness
    of the software development *process*. E.g., allowing for compartmentalizing applications and subsystems more effectively and *efficiently*.

    [I put individual objects into their own address space containers
    to ensure Object A can't be mangled by Client B (or Object C). As
    a result, talking to an object is expensive because I have to hop
    back and forth across that protection boundary. It's even worse
    when the targeted object is located on some other physical node
    (as now I have the transport cost to contend with).]

    Similarly, making communications more robust. We already see that
    with crypto accelerators. The idea of device "islands" is
    obsolescent. Increasingly, devices will interact with other
    devices to solve problems. More processing will move to the
    edge simply because of scaling issues (I can add more CPUs
    far more effectively than I can increase the performance of
    a "centralized" CPU; add another sense/control point? let *it*
    bring some processing abilities along with it!).

    And, securing the product from tampering/counterfeiting; it seems
    like most approaches, to date, have some hidden weakness. It's hard
    to believe hardware can't ameliorate that. The fact that "obscurity"
    is still relied upon by silicon vendors suggests an acknowledgement
    of their weaknesses.

    Beyond that? Likely more DSP-related support in the "native"
    instruction set (so you can blend operations between conventional
    computing needs and signal processing related issues).

    And, graphics acceleration as many applications implement user
    interfaces in the appliance.

    There may be some other optimizations that help with hashing
    or managing large "datasets" (without them being considered
    formal datasets).

    Power management (and measurement) will become increasingly
    important (I spend almost as much on the "power supply"
    as I do on the compute engine). Developers will want to be
    able to easily ascertain what they are consuming as well
    as why -- so they can (dynamically) alter their strategies.
    In addition to varying CPU clock frequency, there may be
    mechanisms to automatically (!) power down sections of
    the die based on observed instruction sequences (instead
    of me having to explicitly do so).

    [E.g., I shed load when I'm running off backup power.
    This involves powering down nodes as well as the "fields"
    on selective nodes. How do I decide *which* load to shed to
    gain the greatest benefit?]

    Memory management (in the conventional sense) will likely
    see more innovation. Instead of just "settling" for a couple
    of page sizes, we might see "adjustable" page sizes.
    Or, the ability to specify some PORTION of a *particular*
    page as being "valid" -- instead of treating the entire
    page as such.

    Scheduling algorithms will hopefully get additional
    hardware support. E.g., everything is deadline driven
    in my design ("real-time"). So, schedulers are concerned
    with evaluating the deadlines of "ready" tasks -- which
    can vary, over time, as well as may need further qualification
    based on other criteria (e.g., least-slack-time scheduling)

    Everything in my system is an *opaque* object on which a
    set of POSSIBLE methods that can be invoked. But, each *Client*
    of that object (an Actor may be multiple Clients if it possesses
    multiple different Handles to the Object) is constrained as to
    which methods can be invoked via a particular Handle.

    So, I can (e.g.) create an Authenticator object that has methods like "set_passphrase" and "test_passphrase" and "invalidate_passphrase".
    Yet, no "disclose_passphrase" method (for obvious reasons).
    I can create an Interface to one privileged Client that
    allows it to *set* a new passphrase. And, all other Interfaces
    (to that Client as well as others!) may all be restricted to
    only *testing* the passphrase ("Is it 'foobar'?"). And, I can
    limit the number of attempts that you can invoke a particular
    method over a particular interface so the OS does the enforcement
    instead of relying on the Server to do so.

    [What's to stop a Client from hammering on the Server (Authenticator
    Object) repeatedly -- invoking test_passphrase with full knowledge
    that it doesn't know the correct passhrase: "Is it 'foobar'?"
    "Is it 'foobar'?" "Is it 'foobar'?" "Is it 'foobar'?" "Is it 'foobar'?"
    The Client has been enabled to do this; that doesn't mean he can't or
    won't abuse it!

    Note that unlimited access means the server has to respond to each of
    those method invocations. By contrast, putting a limit on them
    means the OS can block the invocation from ever reaching the Object
    (and needlessly tying up the Object's resources). A capabilities
    based system that relies on encrypted tokens means the Server has
    to decrypt a token in order to determine that it is invalid;
    the Server's resources are consumed instead of the Client's]

    It takes effort (in the kernel) to verify that a Client *can* access a particular Object (i.e., has a Handle to it) AND that the Client can
    invoke THAT particular Method on that Object via this Handle (bound to
    a particular Object *Interface*) as well as verifying the format of
    the data, converting to a format suitable for the targeted Object
    (which may use a different representational structure) for a
    particular Version of the Interface...

    I can either skimp on performing some of these checks (and rely
    on other mechanisms to ensure the security and reliability of
    the codebase -- in the presence of unvetted Actors) or hope
    that some hardware mechanism in the processor makes these a bit
    easier.

    At the low end, yeah, a simple RISC processor. And support for complex arithmetic
    using 32-bit floats? And support for pixel alpha blending using quad 16-bit numbers?
    32-bit pointers into the software?

    I doubt complex arithmetic will have much play. There might be support for *building* larger data types (e.g., I use BigRationals which are incredibly inefficient). But, the bigger bang will be for operators that allow tedious/iterative solutions to be implemented in constant time. This,
    for example, is why a hardware multiply (or other FPU capabilities)
    is such a win -- consider the amount of code that is replaced by a single op-code! Ditto things like "find first set bit", etc.

    Why stick with 32b floats when you can likely implement doubles with a bit
    more microcode (surely faster than trying to do wider operations built from narrower ones)?

    There's an entirely different mindset when you start thinking in
    terms of "bigger processors". I.e., the folks who see 32b processors as
    just *wider* 8/16b processors have typically not made this adjustment.
    It's like trying to "sample the carry" in a HLL (common in ASM)
    instead of concentrating on what you REALLY want to do and letting
    the language make it easier for you to express that.

    Expect to see people making leaps forward in terms of what they
    expect from the solutions they put forth. Anything that you could
    do with a PC, before, can now be done *in* a handheld flashlight!

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Don Y@21:1/5 to David Brown on Tue Jun 8 17:30:53 2021
    On 6/8/2021 4:04 AM, David Brown wrote:
    On 08/06/2021 09:39, Don Y wrote:
    On 6/7/2021 10:59 PM, David Brown wrote:
    8-bit microcontrollers are still far more common than 32-bit devices in
    the embedded world (and 4-bit devices are not gone yet). At the other
    end, 64-bit devices have been used for a decade or two in some kinds of
    embedded systems.

    I contend that a good many "32b" implementations are really glorified
    8/16b applications that exhausted their memory space.

    Sure. Previously you might have used 32 kB flash on an 8-bit device,
    now you can use 64 kB flash on a 32-bit device. The point is, you are
    /not/ going to find yourself hitting GB limits any time soon. The step

    I don't see the "problem" with 32b devices as one of address space limits (except devices utilizing VMM with insanely large page sizes). As I said,
    in my application, task address spaces are really just a handful of pages.

    I *do* see (flat) address spaces that find themselves filling up with stack-and-heap-per-task, big chunks set aside for "onboard" I/Os,
    *partial* address decoding for offboard I/Os, etc. (i.e., you're
    not likely going to fully decode a single address to access a set
    of DIP switches as the decode logic is disproportionately high
    relative to the functionality it adds)

    How often do you see a high-order address line used for kernel/user?
    (gee, now your "user" space has been halved)

    from 8-bit or 16-bit to 32-bit is useful to get a bit more out of the
    system - the step from 32-bit to 64-bit is totally pointless for 99.99%
    of embedded systems. (Even for most embedded Linux systems, you usually
    only have a 64-bit cpu because you want bigger and faster, not because
    of memory limitations. It is only when you have a big gui with fast
    graphics that 32-bit address space becomes a limitation.)

    You're assuming there has to be some "capacity" value to the 64b move.

    You might discover that the ultralow power devices (for phones!)
    are being offered in the process geometries targeted for the 64b
    devices. Or, that some integrated peripheral "makes sense" for
    phones (but not MCUs targeting motor control applications). Or,
    that there are additional power management strategies supported
    in the hardware.

    In my mind, the distinction brought about by "32b" was more advanced
    memory protection/management -- even if not used in a particular
    application. You simply didn't see these sorts of mechanisms
    in 8/16b offerings. Likewise, floating point accelerators. Working
    in smaller processors meant you had to spend extra effort to
    bullet-proof your code, economize on math operators, etc.

    So, if you wanted the advantages of those (hardware) mechanisms,
    you "upgraded" your design to 32b -- even if it didn't need
    gobs of address space or generic MIPS. It just wasn't economical
    to bolt on an AM9511 or practical to build a homebrew MMU.

    A 32-bit microcontroller is simply much easier to work with than an
    8-bit or 16-bit with "extended" or banked memory to get beyond 64 K
    address space limits.

    There have been some 8b processors that could seemlessly (in HLL)
    handle extended address spaces. The Z180s were delightfully easy
    to use, thusly. You just had to keep in mind that a "call" to
    a different bank was more expensive than a "local" call (though
    there were no syntactic differences; the linkage editor and runtime
    package made this invisible to the developer).

    We were selling products with 128K of DRAM on Z80's back in 1981.
    Because it was easier to design THAT hardware than to step up to
    a 68K, for example. (as well as leveraging our existing codebase)
    The "video game era" was built on hybridized 8b systems -- even though
    you could buy 32b hardware, at the time. You would be surprised at
    the ingenuity of many of those systems in offloading the processor
    of costly (time consuming) operations to make the device appear more
    powerful than it actually was.

    We'll see 64-bit take a greater proportion of the embedded systems that
    demand high throughput or processing power (network devices, hard cores
    in expensive FPGAs, etc.) where the extra cost in dollars, power,
    complexity, board design are not a problem. They will probably become
    more common in embedded Linux systems as the core itself is not usually
    the biggest part of the cost. And such systems are definitely on the
    increase.

    But for microcontrollers - which dominate embedded systems - there has
    been a lot to gain by going from 8-bit and 16-bit to 32-bit for little

    I disagree. The "cost" (barrier) that I see clients facing is the
    added complexity of a 32b platform and how it often implies (or even
    *requires*) a more formal OS underpinning the application.

    Yes, that is definitely a cost in some cases - 32-bit microcontrollers
    are usually noticeably more complicated than 8-bit ones. How
    significant the cost is depends on the balances of the project between development costs and production costs, and how beneficial the extra functionality can be (like moving from bare metal to RTOS, or supporting networking).

    I see most 32b designs operating without the benefits that a VMM system
    can apply (even if you discount demand paging). They just want to have
    a big address space and not have to dick with "segment registers", etc.
    They plow through the learning effort required to configure the device
    to move the "extra capabilities" out of the way. Then, just treat it
    like a bigger 8/16 processor.

    You can "bolt on" a simple network stack even with a rudimentary RTOS/MTOS. Likewise, a web server. Now, you remove the need for graphics and other UI activities hosted *in* the device. And, you likely don't need to support multiple concurrent clients. If you want to provide those capabilities, do that *outside* the device (let it be someone else's problem). And, you gain "remote access" for free.

    Few such devices *need* (or even WANT!) ARP caches, inetd, high performance stack, file systems, etc.

    Given the obvious (coming) push for enhanced security in devices, anything running on your box that you don't need (or UNDERSTAND!) is likely going to
    be pruned off as a way to reduce the attack surface. "Why is this port open? What is this process doing? How robust is the XXX subsystem implementation
    to hostile actors in an *unsupervised* setting?"

    cost. There is almost nothing to gain from a move to 64-bit, but the
    cost would be a good deal higher.

    Why is the cost "a good deal higher"? Code/data footprints don't
    uniformly "double" in size. The CPU doesn't slow down to handle
    bigger data.

    Some parts of code and data /do/ double in size - but not uniformly, of course. But your chip is bigger, faster, requires more power, has wider buses, needs more advanced memories, has more balls on the package,
    requires finer pitched pcb layouts, etc.

    And has been targeted to a market that is EXTREMELY power sensitive
    (phones!).

    It is increasingly common for manufacturing technologies to be moving away
    from "casual development". The days of owning your own wave and doing
    in-house manufacturing at a small startup are gone. If you want to
    limit yourself to the kinds of products that you CAN (easily) assemble, you will find yourself operating with a much poorer selection of components available. I could fab a PCB in-house and build small runs of prototypes
    using the wave and shake-and-bake facilities that we had on hand. Harder
    to do so, nowadays.

    This has always been the case. When thru-hole met SMT, folks had to
    either retool to support SMT, or limit themselves to components that
    were available in thru-hole packages. As the trend has always been
    for MORE devices to move to newer packaging technologies, anyone
    who spent any time thinking about it could read the writing on the wall!
    (I bought my Leister in 1988? Now, I prefer begging favors from
    colleagues to get my prototypes assembled!)

    I suspect this is why we now see designs built on COTS "modules"
    increasingly. Just like designs using wall warts (so they don't
    have to do the testing on their own, internally designed supplies).
    It's one of the reasons FOSH is hampered (unlike FOSS, you can't roll
    your own copy of a hardware design!)

    In theory, you /could/ make a microcontroller in a 64-pin LQFP and
    replace the 72 MHz Cortex-M4 with a 64-bit ARM core at the same clock
    speed. The die would only cost two or three times more, and take
    perhaps less than 10 times the power for the core. But it would be so utterly pointless that no manufacturer would make such a device.

    This is specious reasoning: "You could take the die out of a 68K and
    replace it with a 64 bit ARM." Would THAT core cost two or three times more (do you recall how BIG 68K die were?) and consume 10 times the power?
    (it would consume considerably LESS).

    The market will drive the cost (power, size, $$$, etc.) of 64b cores
    down as they will find increasing use in devices that are size and
    power constrained. There's far more incentive to make a cheap,
    low power 64b ARM than there is to make a cheap, low power i686
    (or 68K) -- you don't see x86 devices in phones (laptops have bigger
    power budgets so less pressure on efficiency).

    There's no incentive to making thru-hole versions of any "serious"
    processor, today. Just like you can't find any fabs for DTL devices.
    Or 10 & 12" vinyl. (yeah, you can buy vinyl, today -- at a premium.
    And, I suspect you can find someone to package an ARM on a DIP
    carrier. But, each of those are niche markets, not where the
    "money lies")

    So a move to 64-bit in practice means moving from a small, cheap, self-contained microcontroller to an embedded PC. Lots of new
    possibilities, lots of new costs of all kinds.

    How do you come to that conclusion? I have a 32b MCU on a board.
    And some FLASH and DRAM. How is that going to change when I
    move to a 64b processor? The 64b devices are also SoCs so
    it's not like you suddenly have to add address decoding logic,
    a clock generator, interrupt controller, etc.

    Will phones suddenly become FATTER to accommodate the extra
    hardware needed? Will they all need bolt on battery boosters?

    Oh, and the cpu /could/ be slower for some tasks - bigger cpus that are optimised for throughput often have poorer latency and more jitter for interrupts and other time-critical features.

    You're cherry picking. They can also be FASTER for other tasks
    and likely will be optimized to justify/exploit those added abilities;
    a vendor isn't going to offer a product that is LESS desireable than
    his existing products. An IPv6 stack on a 64b processor is a bit
    easier to implement than on 32b.

    (remember, ARM is in a LOT of fabs! That speaks to how ubiquitous
    it is!)

    So it is not going to happen - at
    least not more than a very small and very gradual change.

    We got 32b processors NOT because the embedded world cried out for
    them but, rather, because of the influence of the 32b desktop world.
    We've had 32b processors since the early 80's. But, we've only had
    PCs since about the same timeframe! One assumes ubiquity in the
    desktop world would need to happen before any real spillover to embedded.
    (When the "desktop" was an '11 sitting in a back room, it wasn't seen
    as ubiquitous.)

    I don't assume there is any direct connection between the desktop world
    and the embedded world - the needs are usually very different. There is
    a small overlap in the area of embedded devices with good networking and
    a gui, where similarity to the desktop world is useful.

    The desktop world inspires the embedded world. You see what CAN be done
    for "reasonable money".

    In the 70's, we put i4004's into products because we knew the processing
    that was required was "affordable" (at several kilobucks) -- because
    we had our own '11 on site. We leveraged the in-house '11 to compute "initialization constants" for the needs of specific users (operating
    the i4004-based products). We didn't hesitate to migrate to i8080/85
    when they became available -- because the price point was largely
    unchanged (from where it had been with the i4004) AND we could skip the involvement of the '11 in computing those initialization constants!

    I watch the prices of the original 32b ARM I chose fall and see that
    as an opportunity -- to UPGRADE the capabilities (and future-safeness
    of the design). If I'd assumed $X was a tolerable price, before,
    then it likely still is!

    We have had 32-bit microcontrollers for decades. I used a 16-bit
    Windows system when working with my first 32-bit microcontroller. But
    at that time, 32-bit microcontrollers cost a lot more and required more
    from the board (external memories, more power, etc.) than 8-bit or
    16-bit devices. That has gradually changed with an almost total
    disregard for what has happened in the desktop world.

    I disagree. I recall having to put lots of "peripherals" into
    an 8/16b system, external address decoding logic, clock generators,
    DRAM controllers, etc.

    And, the cost of entry was considerably higher. Development systems
    used to cost tens of kilodollars (Intellec MDS, Zilog ZRDS, Moto
    EXORmacs, etc.) I shared a development system with several other
    developers in the 70's -- because the idea of giving each of us our
    own was anathema, at the time.

    For 35+ years, you could put one on YOUR desk for a few kilobucks.
    Now, it's considerably less than that.

    You'd have to be blind to NOT think that the components that
    are "embedded" in products haven't -- and won't continue -- to
    see similar reductions in price and increases in performance.

    Do you think the folks making the components didn't anticipate
    the potential demand for smaller/faster/cheaper chips?

    We've had TCP/IP for decades. Why is it "suddenly" more ubiquitous
    in product offerings? People *see* what they can do with a technology
    in one application domain (e.g., desktop) and extrapolate that to
    other, similar application domains (embedded).

    I did my first full custom 30+ years ago. Now, I can buy an off-the-shelf component and "program" it to get similar functionality (without
    involving a service bureau). Ideas that previously were "gee, if only..."
    are now commonplace.

    Yes, the embedded world /did/ cry out for 32-bit microcontrollers for an increasing proportion of tasks. We cried many tears when then microcontroller manufacturers offered to give more flash space to their
    8-bit devices by having different memory models, banking, far jumps, and
    all the other shit that goes with not having a big enough address space.
    We cried out when we wanted to have Ethernet and the microcontroller
    only had a few KB of ram. I have used maybe 6 or 8 different 32-bit microcontroller processor architectures, and I used them because I
    needed them for the task. It's only in the past 5+ years that I have
    been using 32-bit microcontrollers for tasks that could be done fine
    with 8-bit devices, but the 32-bit devices are smaller, cheaper and
    easier to work with than the corresponding 8-bit parts.

    But that's because your needs evolve and the tools you choose to
    use have, as well.

    I wanted to build a little line frequency clock to see how well it
    could discipline my NTPd. I've got all these PCs, single board PCs,
    etc. lying around. It was *easier* to hack together a small 8b
    processor to do the job -- less hardware to understand, no OS
    to get in the way, really simple to put a number on the interrupt
    latency that I could expect, no uncertainties about the hardware
    that's on the PC, etc.

    OTOH, I have a network stack that I wrote for the Z180 decades
    ago. Despite being written in a HLL, it is a bear to deploy and
    maintain owing to the tools and resources available in that
    platform. My 32b stack was a piece of cake to write, by comparison!

    In the future, we'll see the 64b *phone* world drive the evolution
    of embedded designs, similarly. (do you really need 32b/64b to
    make a phone? how much code is actually executing at any given
    time and in how many different containers?)

    We will see that on devices that are, roughly speaking, tablets -
    embedded systems with a good gui, a touchscreen, networking. And that's fine. But these are a tiny proportion of the embedded devices made.

    Again, I disagree. You've already admitted to using 32b processors
    where 8b could suffice. What makes you think you won't be using 64b
    processors when 32b could suffice?

    It's just as hard for me to prototype a 64b SoC as it is a 32b SoC.
    The boards are essentially the same size. "System" power consumption
    is almost identical. Cost is the sole differentiating factor, today.
    History tells us it will be less so, tomorrow. And, the innovations
    that will likely come in that offering will likely exceed the
    capabilities (or perceived market needs) of smaller processors.
    To say nothing of the *imagined* uses that future developers will
    envision!

    I can make a camera that "reports to google/amazon" to do motion detection, remote access, etc. Or, for virtually the same (customer) dollars, I
    can provide that functionality locally. Would a customer want to add
    an "unnecessary" dependency to a solution? "Tired of being dependant
    on Big Brother for your home security needs? ..." Imagine a 64b SoC
    with a cellular radio: "I'll *call* you when someone comes to the door..."
    (or SMS)

    I have cameras INSIDE my garage that assist with my parking and
    tell me if I've forgotten to close the garage door. Should I have google/amazon perform those value-added tasks for me? Will they
    tell me if I've left something in the car's path before I run over it?
    Will they turn on the light to make it easier for me to see?
    Should I, instead, tether all of those cameras to some "big box"
    that does all of that signal processing? What happens to those
    resources when the garage is "empty"??

    The "electric eye" (interrupter) that guards against closing the
    garage door on a toddler/pet/item in it's path does nothing to
    protect me if I leave some portion of the vehicle in the path of
    the door (but ABOVE the detection range of the interrupter).
    Locating a *camera* on teh side of the doorway lets me detect
    if ANYTHING is in the path of the door, regardless of how high
    above the old interrupter's position it may be located.

    How *many* camera interfaces should the SoC *directly* support?

    The number (and type) of applications that can be addressed with
    ADDITIONAL *local* smarts/resources is almost boundless. And, folks
    don't have to wait for a cloud supplier (off-site processing) to
    decide to offer them.

    "Build it and they will come."

    [Does your thermostat REALLY need all of that horsepower -- two
    processors! -- AND google's server in order to control the HVAC
    in your home? My god, how did that simple bimetallic strip
    ever do it??!]

    If you move into the commercial/industrial domains, the opportunities
    are even more diverse! (e.g., build a camera that does component inspection *in* the camera and interfaces to a go/nogo gate or labeller)

    Note that none of these applications need a display, touch panel, etc.
    What they likely need is low power, small size, connectivity, MIPS and
    memory. The same sorts of things that are common in phones.

    The OP sounds more like a salesman than someone who actually works with
    embedded development in reality.

    Possibly. Or, just someone that wanted to stir up discussion...

    Could be. And there's no harm in that!

    On that, we agree.

    Time for ice cream (easiest -- and most enjoyable -- way to lose weight)!

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Don Y@21:1/5 to All on Tue Jun 8 18:29:24 2021
    On 6/8/2021 3:01 PM, Dimiter_Popoff wrote:

    Am trying to puzzle out what a 64-bit embedded processor should look like. >> At the low end, yeah, a simple RISC processor. And support for complex
    arithmetic
    using 32-bit floats? And support for pixel alpha blending using quad 16-bit >> numbers?
    32-bit pointers into the software?

    The real value in 64 bit integer registers and 64 bit address space is
    just that, having an orthogonal "endless" space (well I remember some
    30 years ago 32 bits seemed sort of "endless" to me...).

    Not needing to assign overlapping logical addresses to anything
    can make a big difference to how the OS is done.

    That depends on what you expect from the OS. If you are
    comfortable with the possibility of bugs propagating between
    different subsystems, then you can live with a logical address
    space that exactly coincides with a physical address space.

    But, consider how life was before Windows used compartmentalized
    applications (and OS). How easily it is for one "application"
    (or subsystem) to cause a reboot -- unceremoniously.

    The general direction (in software development, and, by
    association, hardware) seems to be to move away from unrestrained
    access to the underlying hardware in an attempt to limit the
    amount of damage that a "misbehaving" application can cause.

    You see this in languages designed to eliminate dereferencing
    pointers, pointer arithmetic, etc. Languages that claim to
    ensure your code can't misbehave because it can only do
    exactly what the language allows (no more injecting ASM
    into your HLL code).

    I think that because you are the sole developer in your
    application, you see a distorted vision of what the rest
    of the development world encounters. Imagine handing your
    codebase to a third party. And, *then* having to come
    back to it and fix the things that "got broken".

    Or, in my case, allowing a developer to install software
    that I have to "tolerate" (for some definition of "tolerate")
    without impacting the software that I've already got running.
    (i.e., its ok to kill off his application if it is broken; but
    he can't cause *my* portion of the system to misbehave!)

    32 bit FPU seems useless to me, 64 bit is OK. Although 32 FP
    *numbers* can be quite useful for storing/passing data.

    32 bit numbers have appeal if you're registers are 32b;
    they "fit nicely". Ditto 64b in 64b registers.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Don Y@21:1/5 to All on Tue Jun 8 18:33:03 2021
    On 6/8/2021 1:39 PM, Dimiter_Popoff wrote:

    Not long ago in a chat with a guy who knew some of ARM 64 bit I gathered there is some real mess with their out of order execution, one needs to
    do... hmmmm.. "sync", whatever they call it, all the time and there is
    a huge performance cost because of that. Anybody heard anything about
    it? (I only know what I was told).

    Many processors support instruction reordering (and many compilers
    will reorder the code they generate). In each case, the reordering
    is supposed to preserve semantics.

    If the code "just runs" (and is never interrupted nor synchronized
    with something else), the result should be the same.

    If you want to be able to arbitrarily interrupt an instruction
    sequence, then you need to take special measures. This is why
    we have barriers, the ability to flush caches, etc.

    For "generic" code, the developer isn't involved with any of this.
    Inside the kernel (or device drivers), its often a different
    story...

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul Rubin@21:1/5 to James Brakefield on Tue Jun 8 18:20:07 2021
    James Brakefield <jim.brakefield@ieee.org> writes:
    Am trying to puzzle out what a 64-bit embedded processor should look like.

    Buy yourself a Raspberry Pi 4 and set it up to run your fish tank via a
    remote web browser. There's your 64 bit embedded system.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From George Neuner@21:1/5 to david.brown@hesbynett.no on Wed Jun 9 00:16:35 2021
    On Tue, 8 Jun 2021 22:11:18 +0200, David Brown
    <david.brown@hesbynett.no> wrote:


    Pretty much all processors except x86 and brain-dead old-fashioned 8-bit
    CISC devices are RISC...

    It certainly is correct to say of the x86 that its legacy, programmer
    visible, instruction set is CISC ... but it is no longer correct to
    say that the chip design is CISC.

    Since (at least) the Pentium 4 x86 really are a CISC decoder bolted
    onto the front of what essentially is a load/store RISC.

    "Complex" x86 instructions (in RAM and/or $I cache) are dynamically
    translated into equivalent short sequences[*] of RISC-like wide format instructions which are what actually is executed. Those sequences
    also are stored into a special trace cache in case they will be used
    again soon - e.g., in a loop - so they (hopefully) will not have to be translated again.


    [*] Actually, a great many x86 instructions map 1:1 to internal RISC instructions - only a small percentage of complex x86 instructions
    require "emulation" via a sequence of RISC instructions.


    ... Not all [RISC] are simple.

    Correct. Every successful RISC CPU has supported a suite of complex instructions.


    Of course, YMMV.
    George

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Don Y on Wed Jun 9 09:17:57 2021
    On 09/06/2021 02:30, Don Y wrote:
    On 6/8/2021 4:04 AM, David Brown wrote:
    On 08/06/2021 09:39, Don Y wrote:
    On 6/7/2021 10:59 PM, David Brown wrote:
    8-bit microcontrollers are still far more common than 32-bit devices in >>>> the embedded world (and 4-bit devices are not gone yet).  At the other >>>> end, 64-bit devices have been used for a decade or two in some kinds of >>>> embedded systems.

    I contend that a good many "32b" implementations are really glorified
    8/16b applications that exhausted their memory space.

    Sure.  Previously you might have used 32 kB flash on an 8-bit device,
    now you can use 64 kB flash on a 32-bit device.  The point is, you are
    /not/ going to find yourself hitting GB limits any time soon.  The step

    I don't see the "problem" with 32b devices as one of address space limits (except devices utilizing VMM with insanely large page sizes).  As I said, in my application, task address spaces are really just a handful of pages.


    32 bit address space is not typically a problem or limitation.

    (One other use of 64-bit address space is for debug tools like valgrind
    or "sanitizers" that use large address spaces along with MMU protection
    and specialised memory allocation to help catch memory errors. But
    these also need sophisticated MMU's and a lot of other resources not
    often found on small embedded systems.)

    I *do* see (flat) address spaces that find themselves filling up with stack-and-heap-per-task, big chunks set aside for "onboard" I/Os,
    *partial* address decoding for offboard I/Os, etc.  (i.e., you're
    not likely going to fully decode a single address to access a set
    of DIP switches as the decode logic is disproportionately high
    relative to the functionality it adds)

    How often do you see a high-order address line used for kernel/user?
    (gee, now your "user" space has been halved)

    Unless you are talking about embedded Linux and particularly demanding
    (or inefficient!) tasks, halving your address space is not going to be a problem.


    from 8-bit or 16-bit to 32-bit is useful to get a bit more out of the
    system - the step from 32-bit to 64-bit is totally pointless for 99.99%
    of embedded systems.  (Even for most embedded Linux systems, you usually
    only have a 64-bit cpu because you want bigger and faster, not because
    of memory limitations.  It is only when you have a big gui with fast
    graphics that 32-bit address space becomes a limitation.)

    You're assuming there has to be some "capacity" value to the 64b move.


    I'm trying to establish if there is any value at all in moving to
    64-bit. And I have no doubt that for the /great/ majority of embedded
    systems, it would not.

    I don't even see it as having noticeable added value in the solid
    majority of embedded Linux systems produced. But in those systems, the
    cost is minor or irrelevant once you have a big enough processor.

    You might discover that the ultralow power devices (for phones!)
    are being offered in the process geometries targeted for the 64b
    devices.

    Process geometries are not targeted at 64-bit. They are targeted at
    smaller, faster and lower dynamic power. In order to produce such a big
    design as a 64-bit cpu, you'll aim for a minimum level of process sophistication - but that same process can be used for twice as many
    32-bit cores, or bigger sram, or graphics accelerators, or whatever else
    suits the needs of the device.

    A major reason you see 64-bit cores in big SOC's is that the die space
    is primarily taken up by caches, graphics units, on-board ram,
    networking, interfaces, and everything else. Moving the cpu core from
    32-bit to 64-bit only increases the die size by a few percent, and for
    some tasks it will also increase the the performance of the code by a
    small but helpful amount. So it is not uncommon, even if you don't need
    the additional address space.

    (The other major reason is that for some systems, you want to work with
    more than about 2 GB ram, and then life is much easier with 64-bit cores.)

    On microcontrollers - say, a random Cortex-M4 or M7 device - changing to
    a 64-bit core will increase the die by maybe 30% and give roughly /zero/ performance increase. You don't use 64-bit unless you really need it.



      Or, that some integrated peripheral "makes sense" for
    phones (but not MCUs targeting motor control applications).  Or,
    that there are additional power management strategies supported
    in the hardware.

    In my mind, the distinction brought about by "32b" was more advanced
    memory protection/management -- even if not used in a particular application.  You simply didn't see these sorts of mechanisms
    in 8/16b offerings.  Likewise, floating point accelerators.  Working
    in smaller processors meant you had to spend extra effort to
    bullet-proof your code, economize on math operators, etc.

    You need to write correct code regardless of the size of the device. I disagree entirely about memory protection being useful there. This is comp.arch.embedded, not comp.programs.windows (or whatever). An MPU
    might make it easier to catch and fix bugs while developing and testing,
    but code that hits MPU traps should not leave your workbench.

    But you are absolutely right about maths (floating point or integer) -
    having 32-bit gives you a lot more freedom and less messing around with
    scaling back and forth to make things fit and work efficiently in 8-bit
    or 16-bit. And if you have floating point hardware (and know how to use
    it properly), that opens up new possibilities.

    64-bit cores will extend that, but the step is almost negligable in
    comparison. It would be wrong to say "int32_t is enough for anyone",
    but it is /almost/ true. It is certainly true enough that it is not a
    problem that using "int64_t" takes two instructions instead of one.

    Some parts of code and data /do/ double in size - but not uniformly, of
    course.  But your chip is bigger, faster, requires more power, has wider
    buses, needs more advanced memories, has more balls on the package,
    requires finer pitched pcb layouts, etc.

    And has been targeted to a market that is EXTREMELY power sensitive (phones!).

    A phone cpu takes orders of magnitude more power to do the kinds of
    tasks that might be typical for a microcontroller cpu - reading sensors, controlling outputs, handling UARTs, SPI and I²C buses, etc. Phone cpus
    are optimised for doing the "big phone stuff" efficiently - because
    that's what takes the time, and therefore the power.

    (I'm snipping because there is far too much here - I have read your
    comments, but I'm trying to limit the ones I reply to.)


    We will see that on devices that are, roughly speaking, tablets -
    embedded systems with a good gui, a touchscreen, networking.  And that's
    fine.  But these are a tiny proportion of the embedded devices made.

    Again, I disagree.

    I assume you are disagreeing about seeing 64-bit cpus only on devices
    that need a lot of memory or processing power, rather than disagreeing
    that such devices are only a tiny proportion of embedded devices.

    You've already admitted to using 32b processors
    where 8b could suffice.  What makes you think you won't be using 64b processors when 32b could suffice?

    As I have said, I think there will be an increase in the proportion of
    64-bit embedded devices - but I think it will be very slow and gradual.
    Perhaps in 20 years time 64-bit will be in the place that 32-bit is
    now. But it won't happen for a long time.

    Why do I use 32-bit microcontrollers where an 8-bit one could do the
    job? Well, we mentioned above that you can be freer with the maths.
    You can, in general, be freer in the code - and you can use better tools
    and languages. With ARM microcontrollers I can use the latest gcc and
    C++ standards - I don't have to program in a weird almost-C dialect
    using extensions to get data in flash, or pay thousands for a limited
    C++ compiler with last century's standards. I don't have to try and
    squeeze things into 8-bit scaled integers, or limit my use of pointers
    due to cpu limitations.

    And manufacturers make the devices smaller, cheaper, lower power and
    faster than 8-bit devices in many cases.

    If manufactures made 64-bit devices that are smaller, cheaper and lower
    power than the 32-bit ones today, I'd use them. But they would not be
    better for the job, or better to work with and better for development in
    the way 32-bit devices are better than 8-bit and 16-bit.


    It's just as hard for me to prototype a 64b SoC as it is a 32b SoC.
    The boards are essentially the same size.  "System" power consumption
    is almost identical.  Cost is the sole differentiating factor, today.

    For you, perhaps. Not necessarily for others.

    We design, program and manufacture electronics. Production and testing
    of simpler cards is cheaper. The pcbs are cheaper. The chips are
    cheaper. The mounting is faster. The programming and testing is
    faster. You don't mix big, thick tracks and high power on the same
    board as tight-packed BGA with blind/buried vias - but you /can/ happily
    work with less dense packages on the same board.

    If you are talking about replacing one 400-ball SOC with another
    400-ball SOC with a 64-bit core instead of a 32-bit core, then it will
    make no difference in manufacturing. But if you are talking about
    replacing a Cortex-M4 microcontroller with a Cortex-A53 SOC, it /will/
    be a lot more expensive in most volumes.

    I can't really tell what kinds of designs you are discussing here. When
    I talk about embedded systems in general, I mean microcontrollers
    running specific programs - not general-purpose computers in embedded
    formats (such as phones).

    (For very small volumes, the actual physical production costs are a
    small proportion of the price, and for very large volumes you have
    dedicated machines for the particular board.)

    Possibly.  Or, just someone that wanted to stir up discussion...

    Could be.  And there's no harm in that!

    On that, we agree.

    Time for ice cream (easiest -- and most enjoyable -- way to lose weight)!

    I've not heard of that as a dieting method, but I shall give it a try :-)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to George Neuner on Wed Jun 9 10:40:37 2021
    On 09/06/2021 06:16, George Neuner wrote:
    On Tue, 8 Jun 2021 22:11:18 +0200, David Brown
    <david.brown@hesbynett.no> wrote:


    Pretty much all processors except x86 and brain-dead old-fashioned 8-bit
    CISC devices are RISC...

    It certainly is correct to say of the x86 that its legacy, programmer visible, instruction set is CISC ... but it is no longer correct to
    say that the chip design is CISC.

    Since (at least) the Pentium 4 x86 really are a CISC decoder bolted
    onto the front of what essentially is a load/store RISC.


    Absolutely. But from the user viewpoint, it is the ISA that matters -
    it is a CISC ISA. The implementation details are mostly hidden (though sometimes it is useful to know about timings).

    "Complex" x86 instructions (in RAM and/or $I cache) are dynamically translated into equivalent short sequences[*] of RISC-like wide format instructions which are what actually is executed. Those sequences
    also are stored into a special trace cache in case they will be used
    again soon - e.g., in a loop - so they (hopefully) will not have to be translated again.


    [*] Actually, a great many x86 instructions map 1:1 to internal RISC instructions - only a small percentage of complex x86 instructions
    require "emulation" via a sequence of RISC instructions.


    And also, some sequences of several x86 instructions map to single RISC instructions, or to no instructions at all.

    It is, of course, a horrendously complex mess - and is a major reason
    for x86 cores taking more power and costing more than RISC cores for the
    same performance.


    ... Not all [RISC] are simple.

    Correct. Every successful RISC CPU has supported a suite of complex instructions.


    Yes. People often parse RISC as R(IS)C - i.e., they think it means the
    ISA has a small instruction set. It should be parsed (RI)SC - the
    instructions are limited compared to those on a (CI)SC cpu.


    Of course, YMMV.
    George


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to All on Wed Jun 9 10:59:29 2021
    On 08/06/2021 22:39, Dimiter_Popoff wrote:
    On 6/8/2021 23:18, David Brown wrote:
    On 08/06/2021 16:46, Theo wrote:
    ......

    Memory bus/cache width

    No, that is not a common way to measure cpu "width", for many reasons.
    A chip is likely to have many buses outside the cpu core itself (and the
    cache(s) may or may not be considered part of the core).  It's common to
    have 64-bit wide buses on 32-bit processors, it's also common to have
    16-bit external databuses on a microcontroller.  And the cache might be
    128 bits wide.

    I agree with your points and those of Theo, but the cache is basically
    as wide as the registers? Logically, that is; a cacheline is several
    times that, probably you refer to that.
    Not that it makes much of a difference to the fact that 64 bit data buses/registers in an MCU (apart from FPU registers, 32 bit FPUs are
    useless to me) are unlikely to attract much interest, nothing of
    significance to be gained as you said.
    To me 64 bit CPUs are of interest of course and thankfully there are
    some available, but this goes somewhat past what we call  "embedded".
    Not long ago in a chat with a guy who knew some of ARM 64 bit I gathered there is some real mess with their out of order execution, one needs to
    do... hmmmm.. "sync", whatever they call it, all the time and there is
    a huge performance cost because of that. Anybody heard anything about
    it? (I only know what I was told).


    sync instructions of various types can be needed to handle
    thread/process synchronisation, atomic accesses, and coordination
    between software and hardware registers. Software normally runs with
    the idea that it is the only thing running, and the cpu can re-order and re-arrange the instructions and execution as long as it maintains the
    illusion that the assembly instructions in the current thread are
    executed one after the other. These re-arrangements and parallel
    execution can give very large performance benefits.

    But it also means that when you need to coordinate with other things,
    you need syncs, perhaps cache flushes, etc. Full syncs can take
    hundreds of cycles to execute on large processors. So you need to
    distinguish between reads and writes, acquires and releases, syncs on
    single addresses or general memory syncs. Big processors are optimised
    for throughput, not latency or quick reaction to hardware events.

    There are good reasons why big cpus are often paired with a Cortex-M
    core in SOCs.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Don Y@21:1/5 to David Brown on Wed Jun 9 03:12:12 2021
    On 6/9/2021 12:17 AM, David Brown wrote:

    from 8-bit or 16-bit to 32-bit is useful to get a bit more out of the
    system - the step from 32-bit to 64-bit is totally pointless for 99.99%
    of embedded systems. (Even for most embedded Linux systems, you usually >>> only have a 64-bit cpu because you want bigger and faster, not because
    of memory limitations. It is only when you have a big gui with fast
    graphics that 32-bit address space becomes a limitation.)

    You're assuming there has to be some "capacity" value to the 64b move.

    I'm trying to establish if there is any value at all in moving to
    64-bit. And I have no doubt that for the /great/ majority of embedded systems, it would not.

    That;s a no-brainer -- most embedded systems are small MCUs.
    Consider the PC I'm sitting at has an MCU in the keyboard;
    another in the mouse; one in the optical disk drive; one in
    the rust disk drive; one in the printer; two in the UPS;
    one in the wireless "modem"; one in the router; one in
    the thumb drive; etc. All offsetting the "big" CPU in
    the computer, itself.

    I don't even see it as having noticeable added value in the solid
    majority of embedded Linux systems produced. But in those systems, the
    cost is minor or irrelevant once you have a big enough processor.

    My point is that the market can distort the "price/value"
    relationship in ways that might not, otherwise, make sense.
    A "better" device may end up costing less than a "worse"
    device -- simply because of the volumes that the population
    of customers favor.

    You might discover that the ultralow power devices (for phones!)
    are being offered in the process geometries targeted for the 64b
    devices.

    Process geometries are not targeted at 64-bit. They are targeted at
    smaller, faster and lower dynamic power. In order to produce such a big design as a 64-bit cpu, you'll aim for a minimum level of process sophistication - but that same process can be used for twice as many
    32-bit cores, or bigger sram, or graphics accelerators, or whatever else suits the needs of the device.

    They will apply newer process geometries to newer devices.
    No one is going to retool an existing design -- unless doing
    so will result in a significant market enhancement.

    Why don't we have 100MHz MC6800's?

    A major reason you see 64-bit cores in big SOC's is that the die space
    is primarily taken up by caches, graphics units, on-board ram,
    networking, interfaces, and everything else. Moving the cpu core from
    32-bit to 64-bit only increases the die size by a few percent, and for
    some tasks it will also increase the the performance of the code by a
    small but helpful amount. So it is not uncommon, even if you don't need
    the additional address space.

    (The other major reason is that for some systems, you want to work with
    more than about 2 GB ram, and then life is much easier with 64-bit cores.)

    On microcontrollers - say, a random Cortex-M4 or M7 device - changing to
    a 64-bit core will increase the die by maybe 30% and give roughly /zero/ performance increase. You don't use 64-bit unless you really need it.

    Again, "... unless the market has made those devices cheaper than
    their previous choices" People don't necessarily "fit" their
    applications to the devices they choose; they consider other
    factors (cost, package type, availability, etc.) in deciding
    what to actual design into the product.

    You might "need" X MB of RAM but will "tolerate" 4X -- if the
    price is better than for the X MB *or* the X MB devices are
    not available. If the PCB layout can directly accommodate
    such a solution, then great! But, even if not, a PCB
    revision is a cheap expenditure if it lets you take advantage of
    a different component.

    I've made very deliberate efforts NOT to use many of the
    "I/Os" on the MCUs that I'm designing around so I can
    have more leeway in making that selection when released
    to production (every capability used represents a
    constraint that OTHER selections must satisfy)

    Or, that some integrated peripheral "makes sense" for
    phones (but not MCUs targeting motor control applications). Or,
    that there are additional power management strategies supported
    in the hardware.

    In my mind, the distinction brought about by "32b" was more advanced
    memory protection/management -- even if not used in a particular
    application. You simply didn't see these sorts of mechanisms
    in 8/16b offerings. Likewise, floating point accelerators. Working
    in smaller processors meant you had to spend extra effort to
    bullet-proof your code, economize on math operators, etc.

    You need to write correct code regardless of the size of the device. I disagree entirely about memory protection being useful there. This is comp.arch.embedded, not comp.programs.windows (or whatever). An MPU
    might make it easier to catch and fix bugs while developing and testing,
    but code that hits MPU traps should not leave your workbench.

    You're assuming you (or I) have control over all of the code that
    executes on a product/platform. And, that every potential bug
    manifests *in* testing. (If that were the case, we'd never
    see bugs in the wild!)

    In my case, "third parties" (who the hell is the SECOND party??)
    can install code that I've no control over. That code could
    be buggy -- or malevolent. Being able to isolate "actors"
    from each other means the OS can detect "can't happens"
    at run time and shut down the offender -- instead of letting
    it corrupt some part of the system.

    But you are absolutely right about maths (floating point or integer) -
    having 32-bit gives you a lot more freedom and less messing around with scaling back and forth to make things fit and work efficiently in 8-bit
    or 16-bit. And if you have floating point hardware (and know how to use
    it properly), that opens up new possibilities.

    64-bit cores will extend that, but the step is almost negligable in comparison. It would be wrong to say "int32_t is enough for anyone",
    but it is /almost/ true. It is certainly true enough that it is not a problem that using "int64_t" takes two instructions instead of one.

    Except that int64_t can take *four* instead of one (add/sub/mul two
    int64_t's with 32b hardware).

    Some parts of code and data /do/ double in size - but not uniformly, of
    course. But your chip is bigger, faster, requires more power, has wider >>> buses, needs more advanced memories, has more balls on the package,
    requires finer pitched pcb layouts, etc.

    And has been targeted to a market that is EXTREMELY power sensitive
    (phones!).

    A phone cpu takes orders of magnitude more power to do the kinds of
    tasks that might be typical for a microcontroller cpu - reading sensors, controlling outputs, handling UARTs, SPI and I²C buses, etc. Phone cpus
    are optimised for doing the "big phone stuff" efficiently - because
    that's what takes the time, and therefore the power.

    But you're making assumptions about what the "embedded microcontroller"
    will actually be called upon to do!

    Most of my embedded devices have "done more" than the PCs on which
    they were designed -- despite the fact that the PC can defrost bagels!

    (I'm snipping because there is far too much here - I have read your
    comments, but I'm trying to limit the ones I reply to.)


    We will see that on devices that are, roughly speaking, tablets -
    embedded systems with a good gui, a touchscreen, networking. And that's >>> fine. But these are a tiny proportion of the embedded devices made.

    Again, I disagree.

    I assume you are disagreeing about seeing 64-bit cpus only on devices
    that need a lot of memory or processing power, rather than disagreeing
    that such devices are only a tiny proportion of embedded devices.

    I'm disagreeing with the assumption that 64bit CPUs are solely used
    on "tablets, devices with good GUIs, touchscreens, networking"
    (in the embedded domain).

    You've already admitted to using 32b processors
    where 8b could suffice. What makes you think you won't be using 64b
    processors when 32b could suffice?

    As I have said, I think there will be an increase in the proportion of
    64-bit embedded devices - but I think it will be very slow and gradual.
    Perhaps in 20 years time 64-bit will be in the place that 32-bit is
    now. But it won't happen for a long time.

    And how is that any different from 32b processors introduced in 1980
    only NOW seeing any sort of "widespread" use?

    The adoption of new technologies accelerates, over time. People
    (not "everyone") are more willing to try new things -- esp if
    it is relatively easy to do so. I can buy a 64b evaluation kit
    for a few hundred dollars -- I paid more than that for my first
    8" floppy drive. I can run/install some demo software and
    get a feel for the level of performance, how much power
    is consumed, etc. I don't need to convince my employer to
    make that investment (so *I* can explore).

    In a group environment, if such a solution is *suggested*,
    I can then lend my support -- instead of shying away out of
    fear of the unknown risks.

    Why do I use 32-bit microcontrollers where an 8-bit one could do the
    job? Well, we mentioned above that you can be freer with the maths.
    You can, in general, be freer in the code - and you can use better tools
    and languages.

    Exactly. It's "easier" and you're less concerned with sorting
    out (later) what might not fit or be fast enough, etc.

    I could have done my current project with a bunch of PICs
    talking to a "big machine" over EIA485 links (I'd done an
    industrial automation project like that, before). But,
    unless you can predict how many sensors/actuators ("motes")
    there will EVER be, it's hard to determine how "big" that
    computer needs to be!

    Given that the cost of the PIC is only partially reflective
    of the cost of the DEPLOYED mote (run cable, attach and
    calibrate sensors/actuators, etc.) the added cost of
    moving to a bigger device on that mote disappears.
    Especially when you consider the flexibility it affords
    (in terms of scaling)

    With ARM microcontrollers I can use the latest gcc and
    C++ standards - I don't have to program in a weird almost-C dialect
    using extensions to get data in flash, or pay thousands for a limited
    C++ compiler with last century's standards. I don't have to try and
    squeeze things into 8-bit scaled integers, or limit my use of pointers
    due to cpu limitations.

    And manufacturers make the devices smaller, cheaper, lower power and
    faster than 8-bit devices in many cases.

    If manufactures made 64-bit devices that are smaller, cheaper and lower
    power than the 32-bit ones today, I'd use them. But they would not be
    better for the job, or better to work with and better for development in
    the way 32-bit devices are better than 8-bit and 16-bit.

    Again, you're making predictions about what those devices will be.

    Imagine 64b devices ARE equipped with radios. You can ADD a radio
    to your "better suited" 32b design. Or, *buy* the radio already
    integrated into the 64b solution. Are you going to stick with
    32b devices because they are "better suited" to the application?
    Or, will you "suffer" the pains of embracing the 64b device?

    It's not *just* a CPU core that you're dealing with. Just like
    the 8/16 vs 32b decision isn't JUST about the width of the registers
    in the device or size of the address space.

    I mentioned my little experimental LFC device to discipline my
    NTPd. It would have been *nice* if it had an 8P8C onboard
    so I could talk to it "over the wire". But, that's not the
    appropriate sort of connectivity for an 8b device -- a serial
    port is. If I didn't have a means of connecting to it thusly,
    the 8b solution -- despite being a TINY development effort -- would
    have been impractical; bolting on a network stack and NIC would
    greatly magnify the cost (development time) of that platform.

    It's just as hard for me to prototype a 64b SoC as it is a 32b SoC.
    The boards are essentially the same size. "System" power consumption
    is almost identical. Cost is the sole differentiating factor, today.

    For you, perhaps. Not necessarily for others.

    We design, program and manufacture electronics. Production and testing
    of simpler cards is cheaper. The pcbs are cheaper. The chips are
    cheaper. The mounting is faster. The programming and testing is
    faster. You don't mix big, thick tracks and high power on the same
    board as tight-packed BGA with blind/buried vias - but you /can/ happily
    work with less dense packages on the same board.

    If you are talking about replacing one 400-ball SOC with another
    400-ball SOC with a 64-bit core instead of a 32-bit core, then it will
    make no difference in manufacturing. But if you are talking about
    replacing a Cortex-M4 microcontroller with a Cortex-A53 SOC, it /will/
    be a lot more expensive in most volumes.

    I can't really tell what kinds of designs you are discussing here. When
    I talk about embedded systems in general, I mean microcontrollers
    running specific programs - not general-purpose computers in embedded
    formats (such as phones).

    I cite phones as an example of a "big market" that will severely
    impact the devices (MCUs) that are actually manufactured and sold.

    I increasingly see "applications" growing in complexity -- beyond
    "single use" devices in the past. Devices talk to more things
    (devices) than they had, previously. Interfaces grow in
    complexity (markets often want to exercise some sort of control
    or configuration over a device -- remotely -- instead of just
    letting it do its ONE thing).

    In the past, additional functionality was an infrequent upgrade.
    Now, designs accommodate it "in the field" -- because they
    are expected to (no one wants to mail a device back to the factory
    for a software upgrade -- or have a visit from a service tech
    for that purpose).

    Rarely does a product become LESS complex, with updates. I've
    often found myself updating a design only to discover I've
    run out of some resource ("ROM", RAM, real-time, etc.). This
    never causes the update to be aborted; rather, it forces
    an unexpected diversion into shoehorning the "new REQUIREMENTS"
    into the old "5 pound sack".

    In *my* case, there are fixed applications (MANY) running on
    the hardware. But, the system is designed to allow for
    new applications to be added, old ones replaced (or retired),
    augmented with additional hardware, etc. It's not the "closed
    unless updated" systems previously common.

    We made LORAN-C position plotters, ages ago. Conceptually,
    cut a portion of a commercially available map and adhere it
    to the plotter bed. Position the pen at your current location
    on the map. Turn on. Start driving ("sailing"). The pen
    will move to indicate your NEW current position as well as
    a track indicating your path TO that (from wherever you
    were a moment ago).

    [This uses 100% of an 8b processor's real-time to keep up
    with the updates from the navigation receiver.]

    "Gee, what if the user doesn't have a commercial map,
    handy? Can't we *draw* one for him?"

    [Hmmm... if we concentrate on JUST drawing a map, then
    we can spend 100% of the CPU on THAT activity! We'll just
    need to find some extra space to store the code required
    and RAM to hold the variables we'll need...]

    "Gee, when the fisherman drops a lobster pot over the
    side, he has to run over to the plotter to mark the
    current location -- so he can return to it at some later
    date. Why can't we give him a button (on a long cable)
    that automatically draws an 'X' on the plot each time
    he depresses it?"

    You can see where this is going...

    Devices grow in features and complexity. If that plotter
    was designed today, it would likely have a graphic display
    (instead of pen and ink). And the 'X' would want to be
    displayed in RED (or, some user-configured color). And
    another color for the map to distinguish it from the "track".
    And updates would want to be distributed via a phone
    or thumbdrive or other "user accessible" medium.

    This because the needs of such a device will undoubtedly
    evolve. How often have you updated the firmware in
    your disk drives? Optical drives? Mice? Keyboard?
    Microwave oven? TV?

    We designed medical instruments where the firmware resided
    in a big, bulky "module" that could easily be removed
    (expensive ZIF connector!) -- so that medtechs could
    perform the updates in minutes (instead of taking the device
    out of service). But, as long as we didn't overly tax the
    real-time demands of the "base hardware", we were free
    (subject to pricing issues) to enhance that "module" to
    accommodate whatever new features were required. The product
    could "remain current".

    Like adding RAM to a PC to extend its utility (why can't I add
    RAM to my SmartTVs? Why can't I update their codecs?)

    The upgradeable products are designed for longer service lives
    than the nonupgradable examples, here. So, they have to be
    able to accommodate (in their "base designs" a wider variety
    of unforeseeable changes.

    If you expect a short service life, then you can rationalize NOT upgrading/updating and simply expecting the user to REPLACE the
    device at some interval that your marketeers consider appropriate.

    (For very small volumes, the actual physical production costs are a
    small proportion of the price, and for very large volumes you have
    dedicated machines for the particular board.)

    Possibly. Or, just someone that wanted to stir up discussion...

    Could be. And there's no harm in that!

    On that, we agree.

    Time for ice cream (easiest -- and most enjoyable -- way to lose weight)!

    I've not heard of that as a dieting method, but I shall give it a try :-)

    It's not recommended. I suspect it is evidence of some sort of
    food allergy that causes my body not to process calories properly
    (a tablespoon is 200+ calories; an enviable "scoop" is well over a
    thousand!). It annoys my other half to no end cuz she gains weight
    just by LOOKING at the stuff! :> So, its best for me to "sneak"
    it when she can't set eyes on it. Or, for me to make flavors
    that she's not keen on (this was butter pecan so she is REALLY
    annoyed!)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Theo@21:1/5 to Don Y on Wed Jun 9 13:10:25 2021
    Don Y <blockedofcourse@foo.invalid> wrote:
    On 6/8/2021 7:46 AM, Theo wrote:
    I think there will be divergence about what people mean by an N-bit system:

    Register size
    Unit of logical/arithmetical processing
    Memory address/pointer size
    Memory bus/cache width

    (General) Register size is the primary driver.

    Is it, though? What's driving that?
    Why do you want larger registers without a larger ALU width?

    I don't think register size is of itself a primary pressure. On larger CPUs with lots of rename or vector registers, they have kilobytes of SRAM to hold the registers, and increasing the size is a cost. On a basic in-order MCU
    with 16 or 32 registers, is the register width an issue? We aren't
    designing them on 10 micron technology any more.

    I would expect datapath width to be more critical, but again that's
    relatively small on an in-order CPU, especially compared with on-chip SRAM.

    However, it support 16b operations -- on register PAIRs
    (an implicit acknowledgement that the REGISTER is smaller
    than the register pair). This is common on many smaller
    processors. The address space is 16b -- with a separate 16b
    address space for I/Os. The Z180 extends the PHYSICAL
    address space to 20b but the logical address space
    remains unchanged at 16b (if you want to specify a physical
    address, you must use 20+ bits to represent it -- and invoke
    a separate mechanism to access it!). The ALU is *4* bits.

    This is not really the world of a current 32-bit MCU, which has a 32 bit datapath and 32 bit registers. Maybe it does 64 bit arithmetic in 32 bit chunks, which then leads to the question of which MCU workloads require 64
    bit arithmetic?

    But you don't buy MCUs with a-la-carte pricing. How much does an extra
    timer cost me? What if I want it to also serve as a *counter*? What
    cost for 100K of internal ROM? 200K?

    [It would be an interesting exercise to try to do a linear analysis of product prices with an idea of trying to tease out the "costs" (to
    the developer) for each feature in EXISTING products!]

    Instead, you see a *price* that is reflective of how widely used the
    device happens to be, today. You are reliant on the preferences of others
    to determine which is the most cost effective product -- for *you*.

    Sure, what you buy is a 'highest common denominator' - you get things you
    don't use, but that other people do. But it still depends on a significant chunk of the market demanding those features. It's then a cost function of
    how much the market wants a feature against how much it'll cost to implement (and at runtime). If the cost is tiny, it may well get implemented even if almost nobody asked for it.

    If there's a use case, people will pay for it.
    (although maybe not enough)

    Theo

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Don Y@21:1/5 to Theo on Wed Jun 9 06:19:58 2021
    On 6/9/2021 5:10 AM, Theo wrote:
    Don Y <blockedofcourse@foo.invalid> wrote:
    On 6/8/2021 7:46 AM, Theo wrote:
    I think there will be divergence about what people mean by an N-bit system: >>>
    Register size
    Unit of logical/arithmetical processing
    Memory address/pointer size
    Memory bus/cache width

    (General) Register size is the primary driver.

    Is it, though? What's driving that?
    Why do you want larger registers without a larger ALU width?

    You can use a smaller ALU (in the days when silicon was expensive)
    to do the work of a larger one -- if you spread the operation over
    time.

    I don't think register size is of itself a primary pressure. On larger CPUs with lots of rename or vector registers, they have kilobytes of SRAM to hold the registers, and increasing the size is a cost. On a basic in-order MCU with 16 or 32 registers, is the register width an issue? We aren't
    designing them on 10 micron technology any more.

    It's just how people think of CPU widths. If there's no cost to
    register width, then why didn't 8b CPUs have 64 bit accumulators
    (and register files)?

    I would expect datapath width to be more critical, but again that's relatively small on an in-order CPU, especially compared with on-chip SRAM.

    However, it support 16b operations -- on register PAIRs
    (an implicit acknowledgement that the REGISTER is smaller
    than the register pair). This is common on many smaller
    processors. The address space is 16b -- with a separate 16b
    address space for I/Os. The Z180 extends the PHYSICAL
    address space to 20b but the logical address space
    remains unchanged at 16b (if you want to specify a physical
    address, you must use 20+ bits to represent it -- and invoke
    a separate mechanism to access it!). The ALU is *4* bits.

    This is not really the world of a current 32-bit MCU, which has a 32 bit datapath and 32 bit registers.

    Correct. I was just illustrating how you can have different
    "widths" in a single architecture; yet a single "CPU width"
    has to be used to describe it.

    Maybe it does 64 bit arithmetic in 32 bit
    chunks, which then leads to the question of which MCU workloads require 64 bit arithmetic?

    I treat time as a 64b entity (32b being inadequate).
    IPv6 addresses won't fit in 32b.
    There are also algorithms that can benefit from processing
    data in wider chunks (e.g., count the number of set bits
    in a 64b array goes faster in a 64b register than on a 32)
    My BigRationals would be noticeably faster if I could process
    64b at a time, instead of 32.

    [This, of course, assumes D cache can hold "as much data" in each
    case.]

    And you don't always need the full width of a register -- do you use
    all 32b of a register when you use it to keep track of the remaining
    number of iterations of a loop? Or, the index into an array? Or the
    time remaining until an upcoming deadline? Or processing characters
    in a string?

    But you don't buy MCUs with a-la-carte pricing. How much does an extra
    timer cost me? What if I want it to also serve as a *counter*? What
    cost for 100K of internal ROM? 200K?

    [It would be an interesting exercise to try to do a linear analysis of
    product prices with an idea of trying to tease out the "costs" (to
    the developer) for each feature in EXISTING products!]

    Instead, you see a *price* that is reflective of how widely used the
    device happens to be, today. You are reliant on the preferences of others >> to determine which is the most cost effective product -- for *you*.

    Sure, what you buy is a 'highest common denominator' - you get things you don't use, but that other people do. But it still depends on a significant chunk of the market demanding those features.

    Yes. Or, an application domain that consumes lots of parts.

    It's then a cost function of
    how much the market wants a feature against how much it'll cost to implement (and at runtime). If the cost is tiny, it may well get implemented even if almost nobody asked for it.

    You also have to remember that the seller isn't the sole actor in that negotiation. Charge too much and the customer can opt for a different (possibly "second choice") implementation.

    So, it is in the seller's interest to make his product as cost-effectively
    as possible. *Or*, have something that can't be obtained elsewhere.

    Nowadays, there are no second sources as there were in decades past.
    OTOH, I can find *another* ARM (for example) that may be "close enough"
    to what I need and largely compatible with my existing codebase.
    So, try to "hold me up" (overcharge) and I may find myself motivated
    to visit one of your competitors.

    [As HLLs are increasingly used, it's considerably easier to port a
    design to a different processor family entirely! Not so when you had
    100K of ASM to leverage]

    I worked in a Motogorilla shop, years ago. When I started my design,
    I brought in folks from other vendors. The Motogorilla rep got spooked;
    to lose a design to another house would require answering some serious questions from his superiors ("How did you lose the account?"). He
    was especially nervous that the only Moto offering that I was considering
    was second sourced by 7 or 8 other vendors... so, even if the device
    got the design, he would likely have competitors keeping his pricing
    in line.

    If there's a use case, people will pay for it.
    (although maybe not enough)

    Designers often have somewhat arbitrary criteria for their decisions.
    Maybe you're looking for something that will be available for at
    least a decade. Or, have alternate sources that could be called upon
    in case your fab was compromised or oversold (nothing worse than
    hearing parts are "on allocation"!)

    So, a vendor can't assume he has the "right" solution (or price) for a
    given application. Maybe the designer has a "history" with a particular
    vendor or product line and can leverage that experience in ways that
    wouldn't apply to a different vendor.

    A vendor's goal should always be to produce the best device for his perceived/targeted audience at the best price point. Then, get it
    into their hands so they are ready to embrace it when the opportunity
    presents.

    Microchip took an interesting approach trying to buy into "hobbyists"
    with cheap evaluation boards and tools. I'm sure these were loss leaders.
    But, if they ended up winning a design (or two) because the "hobbyist"
    was in a position to influence a purchasing decision...

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul Rubin@21:1/5 to David Brown on Wed Jun 9 09:41:25 2021
    David Brown <david.brown@hesbynett.no> writes:
    I can't really tell what kinds of designs you are discussing here. When
    I talk about embedded systems in general, I mean microcontrollers
    running specific programs - not general-purpose computers in embedded
    formats (such as phones).

    Philip Munts made a comment a while back that stayed with me: that these
    days, in anything mains powered, there is usually little reason to use
    an MCU instead of a Linux board.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Theo@21:1/5 to Paul Rubin on Wed Jun 9 18:07:58 2021
    Paul Rubin <no.email@nospam.invalid> wrote:
    James Brakefield <jim.brakefield@ieee.org> writes:
    Am trying to puzzle out what a 64-bit embedded processor should look like.

    Buy yourself a Raspberry Pi 4 and set it up to run your fish tank via a remote web browser. There's your 64 bit embedded system.

    I suppose there's a question of what embedded tasks intrinsically require
    4GiB RAM, and those that do so because it makes programmers' lives easier?

    In other words, you /can/ write a function to detect if your fish tank is
    hot or cold in Javascript that runs in a web app on top of Chromium on top
    of Linux. Or you could make it out of a 6502, or a pair of logic gates.

    That's complexity that's not fundamental to the application. OTOH
    maintaining a database that's larger than 4GB physically won't work without that amount of memory (or storage, etc).

    There are obviously plenty of computer systems doing that, but the question
    I don't know is what applications can be said to be 'embedded' but need that kind of RAM.

    Theo

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Don Y@21:1/5 to Paul Rubin on Wed Jun 9 10:12:20 2021
    On 6/9/2021 9:41 AM, Paul Rubin wrote:
    David Brown <david.brown@hesbynett.no> writes:
    I can't really tell what kinds of designs you are discussing here. When
    I talk about embedded systems in general, I mean microcontrollers
    running specific programs - not general-purpose computers in embedded
    formats (such as phones).

    Philip Munts made a comment a while back that stayed with me: that these days, in anything mains powered, there is usually little reason to use
    an MCU instead of a Linux board.

    I note that anytime you use a COTS "module" of any kind, you're still
    stuck having to design and layout some sort of "add-on" card that
    handles your specific I/O needs; few real world devices can be
    controlled with just serial ports, NICs and "storage interfaces".

    And, you're now dependant on a board supplier as well as having
    to understand what's on (and in) that board as they are now
    critical components of YOUR product. The same applies to any firmware
    or software that it runs.

    I'm sure the FAA, FDA, etc. will gladly allow you to formally
    validate some other party's software and assume responsibility
    for its proper operation!

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From DJ Delorie@21:1/5 to Paul Rubin on Wed Jun 9 13:16:13 2021
    Paul Rubin <no.email@nospam.invalid> writes:
    Philip Munts made a comment a while back that stayed with me: that these days, in anything mains powered, there is usually little reason to use
    an MCU instead of a Linux board.

    I have a friend who has a ceiling fan with a raspberry pi in it, because
    that was the easiest solution to turning it on and off remotely...

    So yeah, I agree, "with a computer" is becoming a default answer.

    On the other hand, my furnace (now geothermal) has been controlled by a
    linux board since 2005 or so... maybe I'm not the typical user ;-)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul Rubin@21:1/5 to Theo on Wed Jun 9 10:34:43 2021
    Theo <theom+news@chiark.greenend.org.uk> writes:
    Buy yourself a Raspberry Pi 4 and set it up to run your fish tank via a
    remote web browser. There's your 64 bit embedded system.
    I suppose there's a question of what embedded tasks intrinsically require
    4GiB RAM, and those that do so because it makes programmers' lives easier?

    You can buy a Raspberry Pi 4 with up to 8gb of ram, but the most common configuration is 2gb. The cpu is 64 bit anyway because why not?

    There are obviously plenty of computer systems doing that, but the
    question I don't know is what applications can be said to be
    'embedded' but need that kind of RAM.

    Lots of stuff is using 32 bit cpus with a few KB of ram these days. 32
    bits is displacing 8 bits in the MCU world.

    Is 64 bit displacing 32 bit in application processors like the Raspberry
    Pi, even when less than 4GB of ram is involved? I think yes, at least
    to some extent, and it will continue. My fairly low end mobile phone
    has 2GB of ram and a 64 bit 4-core processor, I think.

    Will 64 bit MCU's displace 32 bit MCUs? I don't know, maybe not.

    Are application processors displacing MCU's in embedded systems? Not
    much in portable and wearable stuff (other than phones) at least for
    now, but in larger devices I think yes, at least somewhat for now, and
    probably more going forward. Even if you're not using networking, it
    makes software and UI development a heck of a lot easier.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dimiter_Popoff@21:1/5 to Don Y on Wed Jun 9 20:56:23 2021
    On 6/9/2021 4:29, Don Y wrote:
    On 6/8/2021 3:01 PM, Dimiter_Popoff wrote:

    Am trying to puzzle out what a 64-bit embedded processor should look
    like.
    At the low end, yeah, a simple RISC processor.  And support for
    complex arithmetic
    using 32-bit floats?  And support for pixel alpha blending using quad
    16-bit numbers?
    32-bit pointers into the software?

    The real value in 64 bit integer registers and 64 bit address space is
    just that, having an orthogonal "endless" space (well I remember some
    30 years ago 32 bits seemed sort of "endless" to me...).

    Not needing to assign overlapping logical addresses to anything
    can make a big difference to how the OS is done.

    That depends on what you expect from the OS.  If you are
    comfortable with the possibility of bugs propagating between
    different subsystems, then you can live with a logical address
    space that exactly coincides with a physical address space.

    So how does the linear 64 bt address space get in the way of
    any protection you want to implement? Pages are still 4 k and
    each has its own protection attributes governed by the OS,
    it is like that with 32 bit processors as well (I talk power, I am
    not interested in half baked stuff like ARM, risc-v etc., I don't
    know if there could be a problem like that with one of these).

    There is *nothing* to gain on a 64 bit machine from segmentation,
    assigning overlapping address spaces to tasks etc.

    Notice I am talking *logical* addresses, I was explicit about
    that.

    Dimiter

    ======================================================
    Dimiter Popoff, TGI http://www.tgi-sci.com ====================================================== http://www.flickr.com/photos/didi_tgi/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Phil Hobbs@21:1/5 to Paul Rubin on Wed Jun 9 13:44:11 2021
    Paul Rubin wrote:
    David Brown <david.brown@hesbynett.no> writes:
    I can't really tell what kinds of designs you are discussing here. When
    I talk about embedded systems in general, I mean microcontrollers
    running specific programs - not general-purpose computers in embedded
    formats (such as phones).

    Philip Munts made a comment a while back that stayed with me: that these days, in anything mains powered, there is usually little reason to use
    an MCU instead of a Linux board.


    Except that if it has a network connection, you have to patch it
    unendingly or suffer the common-as-dirt IoT security nightmares.

    Cheers

    Phil Hobbs

    --
    Dr Philip C D Hobbs
    Principal Consultant
    ElectroOptical Innovations LLC / Hobbs ElectroOptics
    Optics, Electro-optics, Photonics, Analog Electronics
    Briarcliff Manor NY 10510

    http://electrooptical.net
    http://hobbs-eo.com

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dimiter_Popoff@21:1/5 to David Brown on Wed Jun 9 21:00:17 2021
    On 6/9/2021 11:59, David Brown wrote:
    On 08/06/2021 22:39, Dimiter_Popoff wrote:
    On 6/8/2021 23:18, David Brown wrote:
    On 08/06/2021 16:46, Theo wrote:
    ......

    Memory bus/cache width

    No, that is not a common way to measure cpu "width", for many reasons.
    A chip is likely to have many buses outside the cpu core itself (and the >>> cache(s) may or may not be considered part of the core).  It's common to >>> have 64-bit wide buses on 32-bit processors, it's also common to have
    16-bit external databuses on a microcontroller.  And the cache might be >>> 128 bits wide.

    I agree with your points and those of Theo, but the cache is basically
    as wide as the registers? Logically, that is; a cacheline is several
    times that, probably you refer to that.
    Not that it makes much of a difference to the fact that 64 bit data
    buses/registers in an MCU (apart from FPU registers, 32 bit FPUs are
    useless to me) are unlikely to attract much interest, nothing of
    significance to be gained as you said.
    To me 64 bit CPUs are of interest of course and thankfully there are
    some available, but this goes somewhat past what we call  "embedded".
    Not long ago in a chat with a guy who knew some of ARM 64 bit I gathered
    there is some real mess with their out of order execution, one needs to
    do... hmmmm.. "sync", whatever they call it, all the time and there is
    a huge performance cost because of that. Anybody heard anything about
    it? (I only know what I was told).


    sync instructions of various types can be needed to handle
    thread/process synchronisation, atomic accesses, and coordination
    between software and hardware registers. Software normally runs with
    the idea that it is the only thing running, and the cpu can re-order and re-arrange the instructions and execution as long as it maintains the illusion that the assembly instructions in the current thread are
    executed one after the other. These re-arrangements and parallel
    execution can give very large performance benefits.

    But it also means that when you need to coordinate with other things,
    you need syncs, perhaps cache flushes, etc. Full syncs can take
    hundreds of cycles to execute on large processors. So you need to distinguish between reads and writes, acquires and releases, syncs on
    single addresses or general memory syncs. Big processors are optimised
    for throughput, not latency or quick reaction to hardware events.

    There are good reasons why big cpus are often paired with a Cortex-M
    core in SOCs.



    Of course I know all that David, I have been using power processors
    which do things out of order for over 20 years now.
    What I was told was something about a real mess, like system memory
    accesses getting wrong because of out of order execution hence
    plenty of syncs needed to keep the thing working. I have not
    even tried to verify that, only someone with experience with 64 bit
    ARM can do that - so far none here seems to have that.

    Dimiter

    ======================================================
    Dimiter Popoff, TGI http://www.tgi-sci.com ====================================================== http://www.flickr.com/photos/didi_tgi/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to All on Wed Jun 9 20:55:13 2021
    On 09/06/2021 20:00, Dimiter_Popoff wrote:
    On 6/9/2021 11:59, David Brown wrote:
    On 08/06/2021 22:39, Dimiter_Popoff wrote:
    On 6/8/2021 23:18, David Brown wrote:
    On 08/06/2021 16:46, Theo wrote:
    ......

    Memory bus/cache width

    No, that is not a common way to measure cpu "width", for many reasons. >>>> A chip is likely to have many buses outside the cpu core itself (and
    the
    cache(s) may or may not be considered part of the core).  It's
    common to
    have 64-bit wide buses on 32-bit processors, it's also common to have
    16-bit external databuses on a microcontroller.  And the cache might be >>>> 128 bits wide.

    I agree with your points and those of Theo, but the cache is basically
    as wide as the registers? Logically, that is; a cacheline is several
    times that, probably you refer to that.
    Not that it makes much of a difference to the fact that 64 bit data
    buses/registers in an MCU (apart from FPU registers, 32 bit FPUs are
    useless to me) are unlikely to attract much interest, nothing of
    significance to be gained as you said.
    To me 64 bit CPUs are of interest of course and thankfully there are
    some available, but this goes somewhat past what we call  "embedded".
    Not long ago in a chat with a guy who knew some of ARM 64 bit I gathered >>> there is some real mess with their out of order execution, one needs to
    do... hmmmm.. "sync", whatever they call it, all the time and there is
    a huge performance cost because of that. Anybody heard anything about
    it? (I only know what I was told).


    sync instructions of various types can be needed to handle
    thread/process synchronisation, atomic accesses, and coordination
    between software and hardware registers.  Software normally runs with
    the idea that it is the only thing running, and the cpu can re-order and
    re-arrange the instructions and execution as long as it maintains the
    illusion that the assembly instructions in the current thread are
    executed one after the other.  These re-arrangements and parallel
    execution can give very large performance benefits.

    But it also means that when you need to coordinate with other things,
    you need syncs, perhaps cache flushes, etc.  Full syncs can take
    hundreds of cycles to execute on large processors.  So you need to
    distinguish between reads and writes, acquires and releases, syncs on
    single addresses or general memory syncs.  Big processors are optimised
    for throughput, not latency or quick reaction to hardware events.

    There are good reasons why big cpus are often paired with a Cortex-M
    core in SOCs.



    Of course I know all that David, I have been using power processors
    which do things out of order for over 20 years now.

    It depends on the actual PPC's in question - with single core devices
    targeted for embedded systems, you don't need much of that at all.
    Perhaps an occasional sync of some sort in connection with using DMA,
    but that's about it. Key to this is, of course, having your MPU set up
    right to make sure hardware register accesses are in-order and not cached.

    What I was told was something about a real mess, like system memory
    accesses getting wrong because of out of order execution hence
    plenty of syncs needed to keep the thing working. I have not
    even tried to verify that, only someone with experience with 64 bit
    ARM can do that - so far none here seems to have that.


    If the person programming the device has made incorrect assumptions, or incorrect setup, then yes, things can go wrong if something other than
    the current core is affected by the reads or writes.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dimiter_Popoff@21:1/5 to Phil Hobbs on Wed Jun 9 22:03:15 2021
    On 6/9/2021 20:44, Phil Hobbs wrote:
    Paul Rubin wrote:
    David Brown <david.brown@hesbynett.no> writes:
    I can't really tell what kinds of designs you are discussing here.  When >>> I talk about embedded systems in general, I mean microcontrollers
    running specific programs - not general-purpose computers in embedded
    formats (such as phones).

    Philip Munts made a comment a while back that stayed with me: that these
    days, in anything mains powered, there is usually little reason to use
    an MCU instead of a Linux board.


    Except that if it has a network connection, you have to patch it
    unendingly or suffer the common-as-dirt IoT security nightmares.

    Cheers

    Phil Hobbs


    Those nightmares do not apply if you are in complete control of your
    firmware - which few people are nowadays indeed.

    I have had netMCA devices on the net for over 10 years now in many
    countries, the worst problem I have seen was some Chinese IP hanging
    on port 80 to no consequences.

    Dimiter

    ======================================================
    Dimiter Popoff, TGI http://www.tgi-sci.com ====================================================== http://www.flickr.com/photos/didi_tgi/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dimiter_Popoff@21:1/5 to David Brown on Wed Jun 9 22:06:51 2021
    On 6/9/2021 21:55, David Brown wrote:
    On 09/06/2021 20:00, Dimiter_Popoff wrote:
    On 6/9/2021 11:59, David Brown wrote:
    On 08/06/2021 22:39, Dimiter_Popoff wrote:
    On 6/8/2021 23:18, David Brown wrote:
    On 08/06/2021 16:46, Theo wrote:
    ......

    Memory bus/cache width

    No, that is not a common way to measure cpu "width", for many reasons. >>>>> A chip is likely to have many buses outside the cpu core itself (and >>>>> the
    cache(s) may or may not be considered part of the core).  It's
    common to
    have 64-bit wide buses on 32-bit processors, it's also common to have >>>>> 16-bit external databuses on a microcontroller.  And the cache might be >>>>> 128 bits wide.

    I agree with your points and those of Theo, but the cache is basically >>>> as wide as the registers? Logically, that is; a cacheline is several
    times that, probably you refer to that.
    Not that it makes much of a difference to the fact that 64 bit data
    buses/registers in an MCU (apart from FPU registers, 32 bit FPUs are
    useless to me) are unlikely to attract much interest, nothing of
    significance to be gained as you said.
    To me 64 bit CPUs are of interest of course and thankfully there are
    some available, but this goes somewhat past what we call  "embedded". >>>> Not long ago in a chat with a guy who knew some of ARM 64 bit I gathered >>>> there is some real mess with their out of order execution, one needs to >>>> do... hmmmm.. "sync", whatever they call it, all the time and there is >>>> a huge performance cost because of that. Anybody heard anything about
    it? (I only know what I was told).


    sync instructions of various types can be needed to handle
    thread/process synchronisation, atomic accesses, and coordination
    between software and hardware registers.  Software normally runs with
    the idea that it is the only thing running, and the cpu can re-order and >>> re-arrange the instructions and execution as long as it maintains the
    illusion that the assembly instructions in the current thread are
    executed one after the other.  These re-arrangements and parallel
    execution can give very large performance benefits.

    But it also means that when you need to coordinate with other things,
    you need syncs, perhaps cache flushes, etc.  Full syncs can take
    hundreds of cycles to execute on large processors.  So you need to
    distinguish between reads and writes, acquires and releases, syncs on
    single addresses or general memory syncs.  Big processors are optimised >>> for throughput, not latency or quick reaction to hardware events.

    There are good reasons why big cpus are often paired with a Cortex-M
    core in SOCs.



    Of course I know all that David, I have been using power processors
    which do things out of order for over 20 years now.

    It depends on the actual PPC's in question - with single core devices targeted for embedded systems, you don't need much of that at all.

    You *do* need it enough to know what is there to know about it, I have
    been through it all. How big a latency there is is irrelevant to the
    point.

    What I was told was something about a real mess, like system memory
    accesses getting wrong because of out of order execution hence
    plenty of syncs needed to keep the thing working. I have not
    even tried to verify that, only someone with experience with 64 bit
    ARM can do that - so far none here seems to have that.


    If the person programming the device has made incorrect assumptions, or incorrect setup, then yes, things can go wrong if something other than
    the current core is affected by the reads or writes.


    May be the assumptions of the person were wrong. Or may be your
    assumption that their assumptions were wrong is wrong.
    Neither of us knows which it is.

    Dimiter

    ======================================================
    Dimiter Popoff, TGI http://www.tgi-sci.com ====================================================== http://www.flickr.com/photos/didi_tgi/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul Rubin@21:1/5 to Phil Hobbs on Wed Jun 9 12:58:43 2021
    Phil Hobbs <pcdhSpamMeSenseless@electrooptical.net> writes:
    But if you're using a RasPi or Beaglebone or something like that, you
    need a reasonably well-upholstered Linux distro, which has to be
    patched regularly. At very least it'll need a kernel, and kernel
    patches affecting security are not exactly rare.

    You're in the same situation with almost anything else connected to the internet. Think of the notorious "smart light bulbs".

    On the other hand, you are in reasonable shape if the raspberry pi
    running your fish tank is only reachable through a LAN or VPN.
    Non-networked low end linux boards are also a thing.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dimiter_Popoff@21:1/5 to Phil Hobbs on Wed Jun 9 23:12:48 2021
    On 6/9/2021 22:22, Phil Hobbs wrote:
    Dimiter_Popoff wrote:
    On 6/9/2021 20:44, Phil Hobbs wrote:
    Paul Rubin wrote:
    David Brown <david.brown@hesbynett.no> writes:
    I can't really tell what kinds of designs you are discussing here.
    When
    I talk about embedded systems in general, I mean microcontrollers
    running specific programs - not general-purpose computers in embedded >>>>> formats (such as phones).

    Philip Munts made a comment a while back that stayed with me: that
    these
    days, in anything mains powered, there is usually little reason to use >>>> an MCU instead of a Linux board.


    Except that if it has a network connection, you have to patch it
    unendingly or suffer the common-as-dirt IoT security nightmares.


    Those nightmares do not apply if you are in complete control of your
    firmware - which few people are nowadays indeed.

    I have had netMCA devices on the net for over 10 years now in many
    countries, the worst problem I have seen was some Chinese IP hanging
    on port 80 to no consequences.

    But if you're using a RasPi or Beaglebone or something like that, you
    need a reasonably well-upholstered Linux distro, which has to be patched regularly.  At very least it'll need a kernel, and kernel patches
    affecting security are not exactly rare.

    Cheers

    Phil Hobbs




    Oh if you use one of these all you can rely on is prayer, I don't
    think there is *one* person knowing everything which goes on within
    such a system. Basically it is impossible to know, even if you have
    all the manpower to dissect all the code you can still be taken by
    surprise by something a compiler has inserted somewhere etc., your
    initial point is well taken here.
    If you ask *me* if I am 100% sure what my devices might do - and I
    have written every single bit of code running on them, which has
    been compiled by a compiler I have written every single bit of - I
    might still be scratching my head. We buy our silicon, you know...

    Dimiter

    ======================================================
    Dimiter Popoff, TGI http://www.tgi-sci.com ====================================================== http://www.flickr.com/photos/didi_tgi/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Phil Hobbs@21:1/5 to All on Wed Jun 9 15:22:32 2021
    Dimiter_Popoff wrote:
    On 6/9/2021 20:44, Phil Hobbs wrote:
    Paul Rubin wrote:
    David Brown <david.brown@hesbynett.no> writes:
    I can't really tell what kinds of designs you are discussing here.
    When
    I talk about embedded systems in general, I mean microcontrollers
    running specific programs - not general-purpose computers in embedded
    formats (such as phones).

    Philip Munts made a comment a while back that stayed with me: that these >>> days, in anything mains powered, there is usually little reason to use
    an MCU instead of a Linux board.


    Except that if it has a network connection, you have to patch it
    unendingly or suffer the common-as-dirt IoT security nightmares.


    Those nightmares do not apply if you are in complete control of your
    firmware - which few people are nowadays indeed.

    I have had netMCA devices on the net for over 10 years now in many
    countries, the worst problem I have seen was some Chinese IP hanging
    on port 80 to no consequences.

    But if you're using a RasPi or Beaglebone or something like that, you
    need a reasonably well-upholstered Linux distro, which has to be patched regularly. At very least it'll need a kernel, and kernel patches
    affecting security are not exactly rare.

    Cheers

    Phil Hobbs



    --
    Dr Philip C D Hobbs
    Principal Consultant
    ElectroOptical Innovations LLC / Hobbs ElectroOptics
    Optics, Electro-optics, Photonics, Analog Electronics
    Briarcliff Manor NY 10510

    http://electrooptical.net
    http://hobbs-eo.com

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Hans-Bernhard_Br=c3=b6ker@21:1/5 to All on Wed Jun 9 22:52:06 2021
    Am 09.06.2021 um 10:40 schrieb David Brown:
    On 09/06/2021 06:16, George Neuner wrote:

    Since (at least) the Pentium 4 x86 really are a CISC decoder bolted
    onto the front of what essentially is a load/store RISC.

    ... and at about that time they also abandoned the last traces of their original von-Neumann architecture. The actual core is quite strictly
    Harvard now, treating the external RAM banks more like mass storage
    devices than an actual combined code+data memory.

    Absolutely. But from the user viewpoint, it is the ISA that matters -

    That depends rather a lot on who gets to be called the "user".

    x86 are quite strictly limited to the PC ecosystem these days: boxes and laptops built for Mac OS or Windows, some of them running Linux instead.
    There the "user" is somebody buying hardware and software from
    completely unrelated suppliers. I.e. unlike in the embedded world we
    discuss here, the persons writing software for those things had no say
    at all what type of CPU is used. They're thus not really the "user."
    If they were, they probably wouldn't be using an x86. ;-)

    The actual x86 users couldn't care less about the ISA --- the
    overwhelming majority of them haven't the slightest idea what an ISA
    even is. Some of them used to have a vague idea that there was some
    32bit vs. a 64bit whatchamacallit somewhere in there, but even that has
    surely faded away by now, as users no longer even face the decision
    between them.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Don Y@21:1/5 to Paul Rubin on Wed Jun 9 16:37:24 2021
    On 6/9/2021 10:34 AM, Paul Rubin wrote:
    Theo <theom+news@chiark.greenend.org.uk> writes:
    Buy yourself a Raspberry Pi 4 and set it up to run your fish tank via a
    remote web browser. There's your 64 bit embedded system.
    I suppose there's a question of what embedded tasks intrinsically require >>> 4GiB RAM, and those that do so because it makes programmers' lives easier?

    You can buy a Raspberry Pi 4 with up to 8gb of ram, but the most common configuration is 2gb. The cpu is 64 bit anyway because why not?

    Exactly. Are they going to give you a *discount* for a 32b version?

    (Here, you can have this one for half of 'FREE'...)

    There are obviously plenty of computer systems doing that, but the
    question I don't know is what applications can be said to be
    'embedded' but need that kind of RAM.

    Lots of stuff is using 32 bit cpus with a few KB of ram these days. 32
    bits is displacing 8 bits in the MCU world.

    Is 64 bit displacing 32 bit in application processors like the Raspberry
    Pi, even when less than 4GB of ram is involved? I think yes, at least
    to some extent, and it will continue. My fairly low end mobile phone
    has 2GB of ram and a 64 bit 4-core processor, I think.

    Will 64 bit MCU's displace 32 bit MCUs? I don't know, maybe not.

    Some due to need but, I suspect, most due to pricing or other
    features not available in the 32b world. Just like you don't
    find PMMUs on 8/16b devices nor in-built NICs.

    Are application processors displacing MCU's in embedded systems? Not
    much in portable and wearable stuff (other than phones) at least for
    now, but in larger devices I think yes, at least somewhat for now, and probably more going forward. Even if you're not using networking, it
    makes software and UI development a heck of a lot easier.

    This -------------------------------^^^^^^^^^^^^^^^^^^^^^^

    Elbow room always takes some of the stress out of design. You
    don't worry (as much) about bumping into limits and, instead,
    concentrate on solving the problem at hand. The idea of
    packing 8 'bools' into a byte (cuz I only had a hundred or
    so of them available) is SO behind me, now! Just use something
    "more convenient"... eight of them!

    I pass pages between processes as an efficiency hack -- even if
    I'm only using a fraction of the page. In smaller processors,
    I'd be "upset" by this blatant "waste". Instead, I shrug it off
    and note that it gives me a uniform way of moving data around
    (instead of having to tweek interfaces to LIMIT the amount
    of data that I move; or "massage" the data JUST for transport).

    My "calculator service" uses BigRationals -- because its easier than
    trying to explain to users writing scripts that arithmetic can overflow,
    suffer rounding errors, that order of operations is important, etc.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Don Y@21:1/5 to Paul Rubin on Wed Jun 9 16:26:06 2021
    On 6/9/2021 12:58 PM, Paul Rubin wrote:
    Phil Hobbs <pcdhSpamMeSenseless@electrooptical.net> writes:
    But if you're using a RasPi or Beaglebone or something like that, you
    need a reasonably well-upholstered Linux distro, which has to be
    patched regularly. At very least it'll need a kernel, and kernel
    patches affecting security are not exactly rare.

    You're in the same situation with almost anything else connected to the internet. Think of the notorious "smart light bulbs".

    No, that's only if you didn't adequately prepare for such "exposure".

    How many Linux/Windows boxes are running un-NEEDED services? Have
    ports open that shouldn't be? How much emphasis was spent on ekeing
    out a few percent extra performance from the network stack that
    could have, instead, been spent on making it more robust?

    How many folks RUNNING something like Linux/Windows in their product
    actually know much of anything about what's under the hood? Do they
    even know how to BUILD a kernel, let alone sort out what it's
    doing (wrong)?

    Exposed to the 'net you always are at the mercy of DoS attacks
    consuming your inbound bandwidth (assuming you have no contrtol
    of upstream traffic/routing). But, even a saturated network
    connection doesn't have to crash your device.

    OTOH, if your box is dutifully trying to respond to incoming packets
    that may be malicious, then you'd better hope that response is
    "correct" (or at least SAFE) in EVERY case.

    For any of these mainstream OS's, an adversary can play with an
    exact copy of yours 24/7/365 to determine its vulnerabilities
    before ever approaching your device. And, even dig through
    the sources (of some) to see how a potential attack could unfold.
    Your device will likely advertise exactly what version of the
    kernel (and network stack) it is running.

    [An adversary can also BUY one of YOUR devices and do the same
    off-line analysis -- but the analysis will only apply to YOUR
    device (if you have a proprietary OS/stack) and not a
    multitude of other exposed devices]

    On the other hand, you are in reasonable shape if the raspberry pi
    running your fish tank is only reachable through a LAN or VPN.
    Non-networked low end linux boards are also a thing.

    Exactly. But that limits utility/accessibility.

    If you only need moderate/occasional access, you can implement
    a "stealth mode" that lets the server hide, "unprotected".
    Or, require all accesses to be initiated from that server
    (*to* the remote client) -- similar to a call-back modem.

    And, of course, you can place constraints on what can be done
    over that connection instead of just treating it as "God Mode".
    [No, you can't set the heat to 105 degrees in the summer time;
    I don't care if you happen to have appropriate credentials!
    And, no, you can't install an update without my verifying
    you and the update through other mechanisms...]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Don Y@21:1/5 to Theo on Wed Jun 9 16:43:25 2021
    On 6/9/2021 10:07 AM, Theo wrote:
    Paul Rubin <no.email@nospam.invalid> wrote:
    James Brakefield <jim.brakefield@ieee.org> writes:
    Am trying to puzzle out what a 64-bit embedded processor should look like. >>
    Buy yourself a Raspberry Pi 4 and set it up to run your fish tank via a
    remote web browser. There's your 64 bit embedded system.

    I suppose there's a question of what embedded tasks intrinsically require
    4GiB RAM, and those that do so because it makes programmers' lives easier?

    In other words, you /can/ write a function to detect if your fish tank is
    hot or cold in Javascript that runs in a web app on top of Chromium on top
    of Linux. Or you could make it out of a 6502, or a pair of logic gates.

    That's complexity that's not fundamental to the application. OTOH maintaining a database that's larger than 4GB physically won't work without that amount of memory (or storage, etc).

    There are obviously plenty of computer systems doing that, but the question
    I don't know is what applications can be said to be 'embedded' but need that kind of RAM.

    Transcoding multiple video sources (for concurrent clients) in a single appliance?

    I have ~30 cameras, here. Had I naively designed with them all connected
    to a "camera processor", I suspect memory would be the least of my
    concerns (motion and scene recognition in 30 places simultaneously?)
    Instead, it was "easier" to give each camera its own processor. And,
    gain extended "remotability" as part of the process.

    Remember, the 32b address space has to simultaneously hold EVERYTHING that
    will need to be accessible to your application -- the OS, it's memory requirements, the application(s) tasks, the stacks/heaps for the threads
    they contain, the data to be processed (in and out), the memory-mapped
    I/Os consumed by the SoC itself, etc.

    When you HAVE a capability/resource, it somehow ALWAYS gets used! ;-)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Don Y@21:1/5 to All on Wed Jun 9 17:12:57 2021
    On 6/9/2021 10:56 AM, Dimiter_Popoff wrote:
    On 6/9/2021 4:29, Don Y wrote:
    On 6/8/2021 3:01 PM, Dimiter_Popoff wrote:

    Am trying to puzzle out what a 64-bit embedded processor should look like. >>>> At the low end, yeah, a simple RISC processor. And support for complex >>>> arithmetic
    using 32-bit floats? And support for pixel alpha blending using quad
    16-bit numbers?
    32-bit pointers into the software?

    The real value in 64 bit integer registers and 64 bit address space is
    just that, having an orthogonal "endless" space (well I remember some
    30 years ago 32 bits seemed sort of "endless" to me...).

    Not needing to assign overlapping logical addresses to anything
    can make a big difference to how the OS is done.

    That depends on what you expect from the OS. If you are
    comfortable with the possibility of bugs propagating between
    different subsystems, then you can live with a logical address
    space that exactly coincides with a physical address space.

    So how does the linear 64 bt address space get in the way of
    any protection you want to implement? Pages are still 4 k and
    each has its own protection attributes governed by the OS,
    it is like that with 32 bit processors as well (I talk power, I am
    not interested in half baked stuff like ARM, risc-v etc., I don't
    know if there could be a problem like that with one of these).

    With a linear address space, you typically have to link EVERYTHING
    as a single image to place each thing in its own piece of memory
    (or use segment based addressing).

    I can share code between tasks without conflicting addressing;
    the "data" for one instance of the app is isolated from other
    instances while the code is untouched -- the code doesn't even
    need to know that it is being invoked on different "data"
    from one timeslice to the next. In a flat address space,
    you'd need the equivalent of a "context pointer" that you'd
    have to pass to the "shared code". And, have to hope that
    all of your context could be represented in a single such
    reference! (I can rearrange physical pages so they each
    appear "where expected" to a bit of const CODE).

    Similarly, the data passed (or shared) from one task (process) to
    another can "appear" at entirely different logical addresses
    "at the same time" as befitting the needs of each task WITHOUT
    CONCERN (or awareness) of the existence of the other task.
    Again, I don't need to pass a pointer to the data; the address
    space has been manipulated to make sure it's where it should be.

    The needs of a task can be met by resources "harvested" from
    some other task. E.g., where is the stack for your TaskA?
    How large is it? How much of it is in-use *now*? How much
    can it GROW before it bumps into something (because that something
    occupies space in "its" address space).

    I start a task (thread) with a single page of stack. And, a
    limit on how much it is allowed to consume during its execution.
    Then, when it pushes something "off the end" of that page,
    I fault a new page in and map it at the faulting address.
    This continues as the task's stack needs grow.

    When I run out of available pages, I do a GC cycle to
    reclaim pages from (other?) tasks that are no longer using
    them.

    In this way, I can effectively SHARE a stack (or heap)
    between multiple tasks -- without having to give any
    consideration for where, in memory, they (or the stacks!)
    reside.

    I can move a page from one task (full of data) to another
    task at some place that the destination task finds "convenient".
    I can import a page from another network device or export
    one *to* another device.

    Because each task's address space is effectively empty/sparse,
    mapping a page doesn't require much effort to find a "free"
    place for it.

    I can put constraints on each such mapping -- and then runtime
    checks to ensure "things are as I expect": "Why is this NIC
    buffer residing in this particular portion of the address space?"

    With a task bound to a semicontiguous portion of memory, it can
    deal with that region as if it was a smaller virtual region.
    I can store 32b pointers to things if I know that my addresses
    are based from 0x000 and the task never extends beyond a 4GB
    region. If available, I can exploit "shorter" addressing modes.

    There is *nothing* to gain on a 64 bit machine from segmentation, assigning overlapping address spaces to tasks etc.

    What do you gain by NOT using it? You're still dicking with the MMU.
    (if you aren't then what value the MMU in your "logical" space? map
    each physical page to a corresponding logical page and never talk to
    the MMU again; store const page tables and let your OS just tweek the
    base pointer for the TLBs to use for THIS task)

    You still have to "position" physical resources in particular places
    (and you have to deal with the constraints of all tasks, simultaneously, instead of just those constraints imposed by the "current task")

    Notice I am talking *logical* addresses, I was explicit about
    that.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to All on Thu Jun 10 09:37:40 2021
    On 09/06/2021 22:52, Hans-Bernhard Bröker wrote:
    Am 09.06.2021 um 10:40 schrieb David Brown:
    On 09/06/2021 06:16, George Neuner wrote:

    Since (at least) the Pentium 4 x86 really are a CISC decoder bolted
    onto the front of what essentially is a load/store RISC.

    ... and at about that time they also abandoned the last traces of their original von-Neumann architecture.  The actual core is quite strictly Harvard now, treating the external RAM banks more like mass storage
    devices than an actual combined code+data memory.

    Absolutely.  But from the user viewpoint, it is the ISA that matters -

    That depends rather a lot on who gets to be called the "user".


    I meant "the person using the ISA" - i.e., the programmer. And even
    then, I meant low-level programmers who have to understand things like
    memory models, cache thrashing, coding for vectors and SIMD, etc. These
    are the people who see the ISA. I was not talking about the person
    wiggling the mouse and watching youtube!

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dimiter_Popoff@21:1/5 to Don Y on Thu Jun 10 13:45:49 2021
    On 6/10/2021 3:12, Don Y wrote:
    On 6/9/2021 10:56 AM, Dimiter_Popoff wrote:
    On 6/9/2021 4:29, Don Y wrote:
    On 6/8/2021 3:01 PM, Dimiter_Popoff wrote:

    Am trying to puzzle out what a 64-bit embedded processor should
    look like.
    At the low end, yeah, a simple RISC processor.  And support for
    complex arithmetic
    using 32-bit floats?  And support for pixel alpha blending using
    quad 16-bit numbers?
    32-bit pointers into the software?

    The real value in 64 bit integer registers and 64 bit address space is >>>> just that, having an orthogonal "endless" space (well I remember some
    30 years ago 32 bits seemed sort of "endless" to me...).

    Not needing to assign overlapping logical addresses to anything
    can make a big difference to how the OS is done.

    That depends on what you expect from the OS.  If you are
    comfortable with the possibility of bugs propagating between
    different subsystems, then you can live with a logical address
    space that exactly coincides with a physical address space.

    So how does the linear 64 bt address space get in the way of
    any protection you want to implement? Pages are still 4 k and
    each has its own protection attributes governed by the OS,
    it is like that with 32 bit processors as well (I talk power, I am
    not interested in half baked stuff like ARM, risc-v etc., I don't
    know if there could be a problem like that with one of these).

    With a linear address space, you typically have to link EVERYTHING
    as a single image to place each thing in its own piece of memory
    (or use segment based addressing).

    Nothing could be further from the truth. What kind of crippled
    environment can make you think that? Code can be position
    independent on processors which are not dead by design nowadays.
    When I started dps some 27 years ago I allowed program modules
    to demand a fixed address on which they would reside. This exists
    to this day and has been used 0 (zero) times. Same about object
    descriptors, program library modules etc., the first system call
    I wrote is called "allocm$", allocate memory. You request a number
    of bytes and you get back an address and the actual number of
    bytes you were given (it comes rounded by the memory cluster
    size, typically 4k (a page). This was the *first* thing I did.
    And yes, all allocation is done using worst fit strategy, sometimes
    enhanced worst fit - things the now popular OS-s have yet to get to,
    they still have to defragment their disks, LOL.


    I can share code between tasks without conflicting addressing;
    the "data" for one instance of the app is isolated from other
    instances while the code is untouched -- the code doesn't even
    need to know that it is being invoked on different "data"
    from one timeslice to the next.  In a flat address space,
    you'd need the equivalent of a "context pointer" that you'd
    have to pass to the "shared code".  And, have to hope that
    all of your context could be represented in a single such
    reference!  (I can rearrange physical pages so they each
    appear "where expected" to a bit of const CODE).

    Similarly, the data passed (or shared) from one task (process) to
    another can "appear" at entirely different logical addresses
    "at the same time" as befitting the needs of each task WITHOUT
    CONCERN (or awareness) of the existence of the other task.
    Again, I don't need to pass a pointer to the data; the address
    space has been manipulated to make sure it's where it should be.

    So how do you pass the offset from the page beginning if you do
    not pass an address.
    And how is page manipulation simpler and/or safer than just passing
    an address, sounds like a recipe for quite a mess to me.
    In a 64 bit address space there is nothing stopping you to
    pass addresses or not passing them and allow access to areas
    you want to and disallow it elsewhere.
    Other than that there is nothing to be gained by a 64 bit architecture
    really, on 32 bit machines you do have FPUs, vector units etc.
    doing calculation probably faster than the integer unit of a
    64 bit processor.
    The *whole point* of a 64 bit core is the 64 bit address space.



    The needs of a task can be met by resources "harvested" from
    some other task.  E.g., where is the stack for your TaskA?
    How large is it?  How much of it is in-use *now*?  How much
    can it GROW before it bumps into something (because that something
    occupies space in "its" address space).

    This is the beauty of 64 bit logical address space. You allocate
    enough logical memory and then you allocate physical on demand,
    this is what MMUs are there for. If you want to grow your stack
    indefinitely - the messy C style - you can just allocate it
    a few gigabytes of logical memory and use the first few kilobytes
    of it to no waste of resources. Of course there are much slicker
    ways to deal with memory allocation.



    I start a task (thread) with a single page of stack.  And, a
    limit on how much it is allowed to consume during its execution.
    Then, when it pushes something "off the end" of that page,
    I fault a new page in and map it at the faulting address.
    This continues as the task's stack needs grow.

    This is called "allocate on demand" and has been around
    for times immemorial, check my former paragraph.


    When I run out of available pages, I do a GC cycle to
    reclaim pages from (other?) tasks that are no longer using
    them.

    This is called "memory swapping", also for times immemorial.
    For the case when there is no physical memory to reclaim, that
    is.
    The first version of dps - some decades ago - ran on a CPU32
    (a 68340). It had no MMU so I implemented "memory blocks",
    a task can declare a piece a swap-able block and allow/disallow
    its swapping. Those blocks would then be shared or written to disk when
    more memory was needed etc., memory swapping without an MMU.
    Worked fine, must be still working for code I have not
    touched since on my power machines, all those decades later.


    In this way, I can effectively SHARE a stack (or heap)
    between multiple tasks -- without having to give any
    consideration for where, in memory, they (or the stacks!)
    reside.

    You can do this in a linear address space, too - this is what
    the MMU is for.



    I can move a page from one task (full of data) to another
    task at some place that the destination task finds "convenient".
    I can import a page from another network device or export
    one *to* another device.

    So instead of simply passing an address you have to switch page
    translation entries, adjust them on each task switch, flush and
    sync whatever it takes - does not sound very efficient to me.


    Because each task's address space is effectively empty/sparse,
    mapping a page doesn't require much effort to find a "free"
    place for it.

    This is the beauty of having the 64 bit address space, you always
    have enough logical memory. The "64 bit address space per task"
    buys you *nothing*.

    Dimiter

    ======================================================
    Dimiter Popoff, TGI http://www.tgi-sci.com ====================================================== http://www.flickr.com/photos/didi_tgi/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Don Y@21:1/5 to contain on Thu Jun 10 06:55:16 2021
    On 6/10/2021 3:45 AM, Dimiter_Popoff wrote:

    [attrs elided]

    Not needing to assign overlapping logical addresses to anything
    can make a big difference to how the OS is done.

    That depends on what you expect from the OS. If you are
    comfortable with the possibility of bugs propagating between
    different subsystems, then you can live with a logical address
    space that exactly coincides with a physical address space.

    So how does the linear 64 bt address space get in the way of
    any protection you want to implement? Pages are still 4 k and
    each has its own protection attributes governed by the OS,
    it is like that with 32 bit processors as well (I talk power, I am
    not interested in half baked stuff like ARM, risc-v etc., I don't
    know if there could be a problem like that with one of these).

    With a linear address space, you typically have to link EVERYTHING
    as a single image to place each thing in its own piece of memory
    (or use segment based addressing).

    Nothing could be further from the truth. What kind of crippled
    environment can make you think that? Code can be position
    independent on processors which are not dead by design nowadays.
    When I started dps some 27 years ago I allowed program modules
    to demand a fixed address on which they would reside. This exists
    to this day and has been used 0 (zero) times. Same about object
    descriptors, program library modules etc., the first system call
    I wrote is called "allocm$", allocate memory. You request a number
    of bytes and you get back an address and the actual number of
    bytes you were given (it comes rounded by the memory cluster
    size, typically 4k (a page). This was the *first* thing I did.
    And yes, all allocation is done using worst fit strategy, sometimes
    enhanced worst fit - things the now popular OS-s have yet to get to,
    they still have to defragment their disks, LOL.

    You missed my point -- possibly because this issue was raised
    BEFORE pointing out how much DYNAMIC management of the MMU
    (typically an OS delegated acticity) "buys you":
    "That depends on what you expect from the OS."

    If you can ignore the MMU *completely*, then the OS is greatly
    simplified. YOU (developer) take on the responsibilites of remembering
    what is where, etc. EVERYTHING is visible to EVERYONE and at
    EVERYTIME. The OS doesn't have to get involved in the management
    of objects/tasks/etc. That's YOUR responsibility to ensure
    your taskA doesn't go dicking around with taskB's resources.

    Welcome to the 8/16b world!

    The next step up is to statically deploy the MMU. You build
    a SINGLE logical address space to suit your liking. Then, map
    the underlying physical resources to it as best fits. And,
    this never needs to change -- memory doesn't "move around",
    it doesn't change characteristics (readable, writeable,
    exeuctable, accessable-by-X, etc.)!

    But, you can't then change permissions based on which task is
    executing -- unless you want to dick with the MMU dynamically
    (or swap between N discrete sets of STATIC page tables that
    define the many different ways M tasks can share permissions)

    So, you *just* use the MMU as a Memory Protection Unit; you mark
    sections of memory that have CODE in them as no-write, you mark
    regions with DATA as no-execute, and everything else as no-access.

    And that's the way it stays for EVERY task!

    This lets you convert RAM to ROM and prevents "fetches" from "DATA"
    memory. It ensures your code is never overwritten and that the
    processor never tries to execute out of "data memory" and NOTHING
    tries to access address regions that are "empty"!

    You've implemented a 1980's vintage protection scheme (this is how
    we designed arcade pieces, back then, as you wanted your CODE
    and FRAME BUFFER to occupy the same limited range of addresses)

    <yawn>

    Once you start using the MMU to dynamically *manage* memory (which
    includes altering protections and re-mapping), then the cost of the
    OS increases -- because these are typically things that are delegated
    *to* the OS.

    Whether or not you have overlapping address spaces or a single
    flat address space is immaterial -- you need to dynamically manage
    separate page tables for each task in either scheme. You can't
    argue that the OS doesn't need to dick with the MMU "because it's
    a flat address space" -- unless you forfeit those abilities
    (that I illustrated in my post).

    If you want to compare a less-able OS to one that is more featured,
    then its disingenuous to blame that on overlapping address spaces;
    the real "blame" lies in the support of more advanced features.

    The goal of an OS should be to make writing *correct* code easier
    by providing features as enhancements. It's why the OS typically
    reads disk files instead of replicating that file system and driver
    code into each task that needs to do so. Or, why it implements
    delays/timers -- so each task doesn't reinvent the wheel (with its
    own unique set of bugs).

    You can live without an OS. But, typically only for a trivial
    application. And, you're not likely to use a 64b processor just
    to count characters received on a serial port! Or as an egg timer!

    I can share code between tasks without conflicting addressing;
    the "data" for one instance of the app is isolated from other
    instances while the code is untouched -- the code doesn't even
    need to know that it is being invoked on different "data"
    from one timeslice to the next. In a flat address space,
    you'd need the equivalent of a "context pointer" that you'd
    have to pass to the "shared code". And, have to hope that
    all of your context could be represented in a single such
    reference! (I can rearrange physical pages so they each
    appear "where expected" to a bit of const CODE).

    Similarly, the data passed (or shared) from one task (process) to
    another can "appear" at entirely different logical addresses
    "at the same time" as befitting the needs of each task WITHOUT
    CONCERN (or awareness) of the existence of the other task.
    Again, I don't need to pass a pointer to the data; the address
    space has been manipulated to make sure it's where it should be.

    So how do you pass the offset from the page beginning if you do
    not pass an address.

    YOU pass an object to the OS and let the OS map it where *it*
    wants, with possible hints from the targeted task (logical address
    space).

    I routinely pass multiple-page-sized objects around the system.

    "Here's a 20MB telephone recording, memory mapped (to wherever YOU,
    its recipient, want it). Because it is memory mapped and has its
    own pager, the actual amount of physical memory that is in use
    at any given time can vary -- based on the resource allocation
    you've been granted and the current resource availability in the
    system. E.g., there may be as little as one page of physical
    data present at any given time -- and that page may "move" to
    back a different logical address based on WHERE you are presently
    looking!

    Go through and sort out when Bob is speaking and when Tom is speaking.
    "Return" an object of UNKNOWN length that lists each of these time
    intervals along with the speaker assumed to be talking in each. Tell
    me where you (the OS) decided it would best fit into my logical address
    space, after consulting the hint I provided (but that you may not have
    been able to honor because the result ended up *bigger* than the "hole"
    I had imagined it fitting into). No need to tell me how big it really
    is as I will be able to parse it (cuz I know how you will have built that
    list) and the OS will track the memory that it uses so all I have to do
    is free() it (it may be built out of 1K pages, 4K pages, 16MB pages)!"

    How is this HARDER to do when a single task has an entire 64b address
    space instead of when it has to SHARE *a* single address space among
    all tasks/objects?

    And how is page manipulation simpler and/or safer than just passing
    an address, sounds like a recipe for quite a mess to me.

    The MMU has made that mapping a "permanent" part of THIS task's
    address space. It isn't visible to any other task -- why *should*
    it be? Why does the pointer need to indirectly reflect the fact
    that portions of that SINGLE address space are ineligible to
    contain said object because of OTHER unrelated (to this task) objects??

    In a 64 bit address space there is nothing stopping you to
    pass addresses or not passing them and allow access to areas
    you want to and disallow it elsewhere.

    And I can't do that in N overlapping 64b address spaces?

    The only "win" you get is by exposing everything to everyone.
    That's not the way software is evolving. Compartmentalization
    (to protect from other actors), opacity (to hide implementation
    details), accessors (instead of exposing actual data), etc.

    This comes at a cost -- in performance as well as OS design.
    But, *seems* to be worth the effort, given how "mainstream"
    development is heading.

    Other than that there is nothing to be gained by a 64 bit architecture really, on 32 bit machines you do have FPUs, vector units etc.
    doing calculation probably faster than the integer unit of a
    64 bit processor.
    The *whole point* of a 64 bit core is the 64 bit address space.

    No, the whole point of a 64b core is the 64b registers.
    You can package a 64b CPU so that only 20! address lines
    are bonded out. This limits the physical address space
    to 20b. What value to making the logical address
    space bigger -- so you can leave gaps for expansion
    between objects??

    The needs of a task can be met by resources "harvested" from
    some other task. E.g., where is the stack for your TaskA?
    How large is it? How much of it is in-use *now*? How much
    can it GROW before it bumps into something (because that something
    occupies space in "its" address space).

    This is the beauty of 64 bit logical address space. You allocate
    enough logical memory and then you allocate physical on demand,
    this is what MMUs are there for. If you want to grow your stack
    indefinitely - the messy C style - you can just allocate it
    a few gigabytes of logical memory and use the first few kilobytes
    of it to no waste of resources. Of course there are much slicker
    ways to deal with memory allocation.

    Again, how is this any harder with "overlapping" 64b address spaces?
    Or, how is it EASIER with nonoverlap?

    I start a task (thread) with a single page of stack. And, a
    limit on how much it is allowed to consume during its execution.
    Then, when it pushes something "off the end" of that page,
    I fault a new page in and map it at the faulting address.
    This continues as the task's stack needs grow.

    This is called "allocate on demand" and has been around
    for times immemorial, check my former paragraph.

    I'm not trying to be "novel". Rather, showing that these
    features come from the MMU -- not a "nonoverlapping"
    (or overlapping!) address space.

    I.e., the take away from all this is the MMU is the win
    AND the cost for the OS. Without it, the OS gets simpler...
    and less capable!

    When I run out of available pages, I do a GC cycle to
    reclaim pages from (other?) tasks that are no longer using
    them.

    This is called "memory swapping", also for times immemorial.
    For the case when there is no physical memory to reclaim, that
    is.
    The first version of dps - some decades ago - ran on a CPU32
    (a 68340). It had no MMU so I implemented "memory blocks",
    a task can declare a piece a swap-able block and allow/disallow
    its swapping. Those blocks would then be shared or written to disk when
    more memory was needed etc., memory swapping without an MMU.
    Worked fine, must be still working for code I have not
    touched since on my power machines, all those decades later.

    There's no disk involved. The amount of physical memory
    is limited to what's on-board (unless I try to move resources
    to another node or -- *gack* -- use a scratch table in the RDBMS
    as a backing store).

    Recovering "no longer in use" portions of stack is "low hanging fruit";
    look at the task's stack pointer and you know how much allocated stack
    is no longer in use. Try to recover it (of course, the task
    may immediately fault another page back into play but that's
    an optimization issue).

    If there is no "low hanging fruit", then I ask tasks to voluntarily
    relinquish memory. Some tasks may have requested "extra" memory
    in order to precompute results for future requests/activities.
    If it was available -- and if the task wanted to "pay" for it -- then
    the OS would grant the allocation (knowing that it could eventually
    revoke it!) They could relinquish those resources at the expense of
    having to recompute those things at a later date ("on demand" *or* when
    memory is again available).

    If I can't recover enough resources "voluntarily", then I
    *take* memory away from a (selected) task and inform it
    (raise an exception that it will handle as soon as it gets
    a timeslice) of that "theft". It will either recover from
    the loss (because it was being greedy and didn't elect
    to forfeit excess memory that it had allocated when I asked,
    earlier) *or* it will crash. <shrug> When you run out
    of resources, SOMETHING has to give (and the OS is better
    suited to determining WHAT than the individual tasks are...
    they ALL think *they* are important!)

    Again, "what do you expect from your OS?"

    In this way, I can effectively SHARE a stack (or heap)
    between multiple tasks -- without having to give any
    consideration for where, in memory, they (or the stacks!)
    reside.

    You can do this in a linear address space, too - this is what
    the MMU is for.

    Yes, see? There's nothing special about a flat address space!

    I can move a page from one task (full of data) to another
    task at some place that the destination task finds "convenient".
    I can import a page from another network device or export
    one *to* another device.

    So instead of simply passing an address you have to switch page
    translation entries, adjust them on each task switch, flush and
    sync whatever it takes - does not sound very efficient to me.

    It's not intended to be fast/efficient. It's intended to ensure
    that the recipient -- AND ONLY THE RECIPIENT -- is *now*
    granted access to that page's contents. depending on semantics,
    it can create a copy of an object or "move" the object, leaving
    a "hole" in the original location.

    [I.e., if move semantics, then the original owner shouldn't be
    trying to access something that he's "given away"! Any access,
    by him, to that memory region should signal a fatal exception!]

    If you don't care who sees what, then you don't need the MMU!
    And we're back to my initial paragraph of this reply! :>

    Because each task's address space is effectively empty/sparse,
    mapping a page doesn't require much effort to find a "free"
    place for it.

    This is the beauty of having the 64 bit address space, you always
    have enough logical memory. The "64 bit address space per task"
    buys you *nothing*.

    If "always having enough logical memory" is such a great thing,
    isn't having MORE logical memory (because you've moved other
    things into OVERLAPPING portions of that memory space) an
    EVEN BETTER thing?

    Again, what does your flat addressing BUY the OS in terms of
    complexity reduction? (your initial assumption)
    "...a big difference to how the OS is done"

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dimiter_Popoff@21:1/5 to Don Y on Thu Jun 10 18:32:23 2021
    On 6/10/2021 16:55, Don Y wrote:
    On 6/10/2021 3:45 AM, Dimiter_Popoff wrote:

    [attrs elided]

    Don, this becomes way too lengthy and repeating itself.

    You keep on saying that a linear 64 bit address space means exposing
    everything to everybody after I explained this is not true at all.

    You keep on claiming this or that about how I do things without
    bothering to understand what I said - like your claim that I use the MMU
    for "protection only".
    NO, this is not true either. On 32 bit machines - as mine in
    production are - mapping 4G logical space into say 128M of physical
    memory goes all the way through page translation, block translation
    for regions where page translation would be impractical etc.
    You sound the way I would have sounded before I had written and
    built on for years what is now dps. The devil is in the detail :-).

    You pass "objects", pages etc. Well guess what, it *always* boils
    down to an *address* for the CPU. The rest is generic talk.
    And if you choose to have overlapping address spaces when you
    pass a pointer from one task to another the OS has to deal with this
    at a significant cost.
    In a linear address space, you pass the pointer *as is* so the OS does
    not have to deal with anything except access restrictions.
    In dps, you can send a message to another task - the message being
    data the OS will copy into that tasks memory, the data being
    perfectly able to be an address of something in another task's
    memory. If a task accesses an address it is not supposed to
    the user is notified and allowed to press CR to kill that task.
    Then there are common data sections for groups of tasks etc.,
    it is pretty huge really.

    The concept "one entire address space to all tasks" is from the 60-s
    if not earlier (I just don't know and don't care to check now) and it
    has done a good job while it was necessary, mostly on 16 bit CPUs.
    For today's processors this means just making them run with the
    handbrake on, *nothing* is gained because of that - no more security
    (please don't repeat that "expose everything" nonsense), just
    burning more CPU power, constantly having to remap addresses etc.

    Dimiter

    ======================================================
    Dimiter Popoff, TGI http://www.tgi-sci.com ====================================================== http://www.flickr.com/photos/didi_tgi/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Don Y@21:1/5 to All on Thu Jun 10 14:09:22 2021
    On 6/10/2021 8:32 AM, Dimiter_Popoff wrote:
    On 6/10/2021 16:55, Don Y wrote:
    On 6/10/2021 3:45 AM, Dimiter_Popoff wrote:

    [attrs elided]

    Don, this becomes way too lengthy and repeating itself.

    You keep on saying that a linear 64 bit address space means exposing everything to everybody after I explained this is not true at all.

    Task A has built a structure -- a page worth of data residing
    at 0x123456. It wants to pass this to TaskB so that TaskB can perform
    some operations on it.

    Can TaskB acccess the data at 0x123456 *before* TaskA has told it
    to do so?

    Can TaskB access the data at 0x123456 WHILE TaskA is manipulating it?

    Can TaskA alter the data at 0x123456 *after* it has "passed it along"
    to TaskB -- possibly while TaskB is still using it?

    You keep on claiming this or that about how I do things without
    bothering to understand what I said - like your claim that I use the MMU
    for "protection only".

    I didn't say that YOU did that. I said that to be able to ignore
    the MMU after setting it up, you can ONLY use it to protect
    code from alteration, data from execution, etc. The "permissions"
    that it applies have to be invariant over the execution time of
    ALL of the code.

    So, if you DON'T use it "for protection only", then you are admitting
    to having to dynamically tweek it.

    *THIS* is the cost that the OS incurs -- and having a flat address
    space doesn't make it any easier! If you aren't incurring that cost,
    then you're not protecting something.

    NO, this is not true either. On 32 bit machines - as mine in
    production are - mapping 4G logical space into say 128M of physical
    memory goes all the way through page translation, block translation
    for regions where page translation would be impractical etc.
    You sound the way I would have sounded before I had written and
    built on for years what is now dps. The devil is in the detail :-).

    You pass "objects", pages etc. Well guess what, it *always* boils
    down to an *address* for the CPU. The rest is generic talk.

    Yes, the question is "who manages the protocol for sharing".
    Since forever, you could pass pointers around and let anyone
    access anything they wanted. You could impose -- but not
    ENFORCE -- schemes that ensured data was shared properly
    (e.g., so YOU wouldn't be altering data that *I* was using).

    [Monitors can provide some structure to that sharing but
    are costly when you consider the number of things that may
    potentially need to be shared. And, you can still poke
    directly at the data being shared, bypassing the monitor,
    if you want to (or have a bug)]

    But, you had to rely on programming discipline to ensure this
    worked. Just like you have to rely on discipline to ensure
    code is "bugfree" (how's that worked for the industry?)

    And if you choose to have overlapping address spaces when you
    pass a pointer from one task to another the OS has to deal with this
    at a significant cost.

    How does your system handle the above example? How do you "pass" the
    pointer from TaskA to TaskB -- if not via the OS? Do you expose a
    shared memory region that both tasks can use to exchange data
    and hope they follow some rules? Always use synchronization
    primitives for each data exchange? RELY on the developer to
    get it right? ALWAYS?

    Once you've passed the pointer, how does TaskB access that data
    WITHOUT having to update the MMU? Or, has TaskB had access to
    the data all along?

    What happens when B wants to pass the modified data to C?
    Does the MMU have to be updated (C's tables) to grant that
    access? Or, like B, has C had access all along? And, has
    C had to remain disciplined enough not to go mucking around
    with that region of memory until A *and* B have done modifying
    it?

    I don't allow anyone to see anything -- until the owner of that thing explicitly grants access. If you try to access something before it's
    been made available for your access, the OS traps and aborts your
    process -- you've violated the discipline and the OS is going to
    enforce it! In an orderly manner that doesn't penalize other
    tasks that have behaved properly.

    In a linear address space, you pass the pointer *as is* so the OS does
    not have to deal with anything except access restrictions.
    In dps, you can send a message to another task - the message being
    data the OS will copy into that tasks memory, the data being
    perfectly able to be an address of something in another task's

    So, you don't use the MMU to protect TaskA's resources from TaskB
    (or TaskC!) access. You expect LESS from your OS.

    memory. If a task accesses an address it is not supposed to
    the user is notified and allowed to press CR to kill that task.

    What are the addresses "it's not supposed to?" Some *subset* of
    the addresses that "belong" to other tasks? Perhaps I can
    access a buffer that belongs to TaskB but not TaskB's code?
    Or, some OTHER buffer that TaskB doesn't want me to see? Do
    you explicitly have to locate ("org") each buffer so that you
    can place SOME in protected portions of the address space and
    others in shared areas? How do you change these distinctions
    dynamically -- or, do you do a lot of data copying from
    "protected" space to "shared" space?

    Then there are common data sections for groups of tasks etc.,
    it is pretty huge really.

    Again, you expose things by default -- even if only a subset
    of things. You create shared memory regions where there are
    no protections and then rely on your application to behave and
    not access data (that has been exposed for its access) until
    it *should*.

    Everybody does this. And everyone has bugs as a result. You
    are relying on the developer to *repeatedly* implement the sharing
    protocol -- instead of relying on the OS to enforce that for you.

    It's like putting tons of globals in your application -- to
    make data sharing easier (and, thus, more prone to bugs).

    You expect less of your OS.

    My tasks are free to do whatever they want in their own protection domain.
    They KNOW that nothing can SEE the data they are manipulating *or*
    observe HOW they are manipulating it or *influence* their manipulation
    of it.

    Until they want to expose that data. And, then, only to those entities
    that they think SHOULD see it.

    They can give (hand-off) data to another entity -- much like call-by-value semantics -- and have the other entity know that NOTHING that the
    original "donor" can do AFTER that handoff will affect the data that
    has been "passed" to them.

    Yet, they can still manipulate that data -- update it or reuse that
    memory region -- for the next "client".

    The OS enforces these guarantees. Much more than just passing along
    a pointer to the data! Trying to track down the donor's alteration
    of data while the recipient is concurrently accessing it (multiple
    tasks, multiple cores, multiple CPUs) is a nightmare proposition.
    And, making an *unnecessary* copy of it is a waste of resources
    (esp if the two parties actually ARE well-behaved)

    The concept "one entire address space to all tasks" is from the 60-s
    if not earlier (I just don't know and don't care to check now) and it
    has done a good job while it was necessary, mostly on 16 bit CPUs.
    For today's processors this means just making them run with the
    handbrake on, *nothing* is gained because of that - no more security
    (please don't repeat that "expose everything" nonsense), just
    burning more CPU power, constantly having to remap addresses etc.

    Remapping is done in hardware. The protection overhead is a
    matter of updating page table entries. *You* gain nothing by creating
    a flat address space because *you* aren't trying to compartmentalize
    different tasks and subsystems. You likely protect the kernel's
    code/data from direct interference from "userland" (U/S bit) but
    want the costs of sharing between tasks to be low -- at the expense
    of forfeiting protections between them.

    *Most* of the world consists of imperfect coders. *Most* of us have
    to deal with colleagues (of varying abilities) before, after and
    during our tenure running code on the same CPU as our applications.

    "The bug is (never!) in my code! So, it MUST be in YOURS!"

    You can either stare at each other, confident in the correctness
    of your own code. Or, find the bug IN THE OTHER GUY'S CODE
    (you can't prove yours is correct anymore than he can; so you have to
    find the bug SOMEWHERE to make your point), effectively doing his
    debugging *for* him.

    Why do you think desktop OS's go to such lengths to compartmentalize applications? Aren't the coders of application A just as competent
    as those who coded application B? Why would you think application
    A might stomp on some resource belonging to application B? Wouldn't
    that be a violation of DISCIPLINE (and outright RUDE)?

    You've been isolated from this for far too long. So, don't see
    what it's like to have to deal with another(s)' code impacting
    the same product that *you* are working on.

    Encapsulation and opacity are the best ways to ensure all interactions
    to your code/data are through permitted interfaces.
    "Who overwrote my location 0x123456? I know *I* didn't..."
    "Who turned on power to the motor? I'm the only one who should do so!"
    "Who deleted the log file?"
    There's a reason we eschew globals!

    I can ensure TaskB can't delete the log file -- by simply denying him
    access to logfile.delete(). But, letting him use logfile.append()
    as much as he wants! At the same time, allowing TaskA to delete or logfile.rollover() as it sees fit -- because I've verified that
    TaskA does this appropriately as part of its contract. And, there's
    no NEED for TaskB to ever do so -- it's not B's responsibility
    (so why allow him the opportunity to ERRONEOUSLY do so -- and then
    have to chase down how this happened?)

    If TaskB *tries* to access logfile.delete(), I can trap to make his
    violation obvious: "Reason for process termination: illegal access"

    And, I don't need to do this with pointers or hardware protection
    of the pages in which logfile.delete() resides! I just don't let
    him invoke *that* method! I *expect* my OS to provide these mechanisms
    to the developer to make his job easier AND the resulting code more robust.

    There is a cost to all this. But, *if* something misbehaves, it leaves
    visible evidence of its DIRECT actions; you don't have to wonder WHEN
    (in the past) some datum was corrupted that NOW manifests as an error
    in some, possibly unrelated, manner.

    Of course, you don't need any of this if you're a perfect coder.

    You don't expose the internals of your OS to your tasks, do you?
    Why? Don't you TRUST them to observe proper discipline in their
    interactions with it? You trust them to observe those same
    disciplines when interacting with each other... Why can't TaskA
    see the preserved state for TaskB? Don't you TRUST it to
    only modify it if it truly knows what it's doing? Not the result
    of resolving some errant pointer?

    Welcome to the 70's!

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dimiter_Popoff@21:1/5 to Don Y on Fri Jun 11 01:13:24 2021
    On 6/11/2021 0:09, Don Y wrote:
    On 6/10/2021 8:32 AM, Dimiter_Popoff wrote:
    On 6/10/2021 16:55, Don Y wrote:
    On 6/10/2021 3:45 AM, Dimiter_Popoff wrote:

    [attrs elided]
    ;
    Don, this becomes way too lengthy and repeating itself.

    You keep on saying that a linear 64 bit address space means exposing
    everything to everybody after I explained this is not true at all.

    Task A has built a structure -- a page worth of data residing
    at 0x123456.  It wants to pass this to TaskB so that TaskB can perform
    some operations on it.

    Can TaskB acccess the data at 0x123456 *before* TaskA has told it
    to do so >
    Can TaskB access the data at 0x123456 WHILE TaskA is manipulating it?

    Can TaskA alter the data at 0x123456 *after* it has "passed it along"
    to TaskB -- possibly while TaskB is still using it?

    If task A does not want any of the above it just places them in a
    page to which it only has access. Or it can allow read access only.
    *Why* do you confuse this with linear address space? What does the
    one have to do with the other?


    You keep on claiming this or that about how I do things without
    bothering to understand what I said - like your claim that I use the MMU
    for "protection only".

    I didn't say that YOU did that.  I said that to be able to ignore
    the MMU after setting it up, you can ONLY use it to protect
    code from alteration, data from execution, etc.  The "permissions"
    that it applies have to be invariant over the execution time of
    ALL of the code.

    So, if you DON'T use it "for protection only", then you are admitting
    to having to dynamically tweek it.

    Of course dps is dealing with it, all the time. The purpose of the
    linear *logical* address space is just orthogonality and simplicity,
    like not having to remap passed addresses (which can have a lot
    of further implications, like inability to use addresses in another
    tasks structure).


    *THIS* is the cost that the OS incurs -- and having a flat address
    space doesn't make it any easier!  If you aren't incurring that cost,
    then you're not protecting something.

    Oh but it does - see my former paragraph.



    NO, this is not true either. On 32 bit machines - as mine in
    production are - mapping 4G logical space into say 128M of physical
    memory goes all the way through page translation, block translation
    for regions where page translation would be impractical etc.
    You sound the way I would have sounded before I had written and
    built on for years what is now dps. The devil is in the detail :-).

    You pass "objects", pages etc. Well guess what, it *always* boils
    down to an *address* for the CPU. The rest is generic talk.

    Yes, the question is "who manages the protocol for sharing".
    Since forever, you could pass pointers around and let anyone
    access anything they wanted.  You could impose -- but not
    ENFORCE -- schemes that ensured data was shared properly
    (e.g., so YOU wouldn't be altering data that *I* was using).

    [Monitors can provide some structure to that sharing but
    are costly when you consider the number of things that may
    potentially need to be shared.  And, you can still poke
    directly at the data being shared, bypassing the monitor,
    if you want to (or have a bug)]

    But, you had to rely on programming discipline to ensure this
    worked.  Just like you have to rely on discipline to ensure
    code is "bugfree" (how's that worked for the industry?)

    And if you choose to have overlapping address spaces when you
    pass a pointer from one task to another the OS has to deal with this
    at a significant cost.

    How does your system handle the above example?  How do you "pass" the pointer from TaskA to TaskB -- if not via the OS?  Do you expose a
    shared memory region that both tasks can use to exchange data
    and hope they follow some rules?  Always use synchronization
    primitives for each data exchange?  RELY on the developer to
    get it right?  ALWAYS?

    I already explained that. If task A wants to leave a message
    into task B memory it goes through a call (signd7$ or whatever,
    there are variations) and the message is left there.
    If task A did not want to receive messages it won't even be
    attempted by the OS, will return a straight error (task does not
    support... whatever). If the message is illegal the result is
    similar. And if it happens that task A tries to access directly
    memory of task B which it is not supposed to it will just go to
    the "task A memory access violation. Press CR to kill it".

    You have to rely on the developer to get it right if they
    write supervisor code. Otherwise you need not.
    The signalling system works in user mode though you can
    write supervisor level code which uses it, but if you
    are allowed to write at that level you can mess up pretty
    much everything, I hope you are not trying to wrestle
    *that* one.


    Once you've passed the pointer, how does TaskB access that data
    WITHOUT having to update the MMU?  Or, has TaskB had access to
    the data all along?


    By just writing to the address task A has listed for the
    purpose. It is not in a protected area so the only thing
    the MMU may have to do is a tablewalk.

    *THIS* demonstrates the advantage of the linear logical
    address space very well.


    What happens when B wants to pass the modified data to C?
    Does the MMU have to be updated (C's tables) to grant that
    access?  Or, like B, has C had access all along?  And, has
    C had to remain disciplined enough not to go mucking around
    with that region of memory until A *and* B have done modifying
    it?

    Either of these has its area which allows messaging. I don't
    see what you want to achieve by making it only more cumbersome
    (but not less possible) to do.

    I don't allow anyone to see anything -- until the owner of that thing explicitly grants access.  If you try to access something before it's
    been made available for your access, the OS traps and aborts your
    process -- you've violated the discipline and the OS is going to
    enforce it!  In an orderly manner that doesn't penalize other
    tasks that have behaved properly.


    So well, how is the linear address space in your way of doing that?
    It certainly is not in my way when I do it.


    In a linear address space, you pass the pointer *as is* so the OS does
    not have to deal with anything except access restrictions.
    In dps, you can send a message to another task - the message being
    data the OS will copy into that tasks memory, the data being
    perfectly able to be an address of something in another task's

    So, you don't use the MMU to protect TaskA's resources from TaskB
    (or TaskC!) access.  You expect LESS from your OS.

    Why on Earth do you think that? And what does the linear address space
    have to do with *any* of it?
    Pages can be as small as 4k why do you not just have them properly
    setup upon task start or at some time by having the page which
    can receive messages open to accesses and the rest closed?
    And again, how on Earth do you see any relevance between a linear
    logical address space and all this.


    memory. If a task accesses an address it is not supposed to
    the user is notified and allowed to press CR to kill that task.

    What are the addresses "it's not supposed to?"  Some *subset* of
    the addresses that "belong" to other tasks?  Perhaps I can
    access a buffer that belongs to TaskB but not TaskB's code > Or, some OTHER buffer that TaskB doesn't want me to see?  Do
    you explicitly have to locate ("org") each buffer so that you
    can place SOME in protected portions of the address space and
    others in shared areas?  How do you change these distinctions
    dynamically -- or, do you do a lot of data copying from
    "protected" space to "shared" space?

    This is up to the tasks, they can make system calls to mark
    pages non-swappable, write protected etc., you name it.
    And again, ***this has nothing to do with orthogonality
    of the logical address space***.


    Then there are common data sections for groups of tasks etc.,
    it is pretty huge really.

    Again, you expose things by default -- even if only a subset
    of things.  You create shared memory regions where there are
    no protections and then rely on your application to behave and
    not access data (that has been exposed for its access) until
    it *should*.

    Why would you want to protect regions you don't want protected?
    The common data sections are quite useful when you write a largish
    piece of software which runs as multiple tasks in multiple
    windows - e.g. nuvi, the spectrometry software - it has
    multiple "display" windows, a command window into which
    one can also run dps scripts etc., why would you want to
    deprive them of that common section? They are all part of the
    same software package.
    But I suppose you are not that far yet since you still wrestle
    scheduling and memory protection.



    Everybody does this.  And everyone has bugs as a result.  You
    are relying on the developer to *repeatedly* implement the sharing
    protocol -- instead of relying on the OS to enforce that for you.

    Not at all. And for I don't know which time, this has 0% to do
    with the linearity of the logical address space which is what
    you objected.

    Please let us just get back to it and just agree with the obvious,
    which is that linear logical address space has *nothing* to do
    with security.


    Leave DPS alone. DPS is a large thing and even I could not
    tell you everything about it even if I had the weeks it would
    take simply because there are things I have to look at to
    remember. Please don't try to tell the world how the OS you want
    to write is better than what you simply do not know.
    Tell me about the filesystem you have
    implemented for it (I'd say you have none by the way you
    sound), how you implemented your tcp/ip stack, how your
    distributed file system works (in dps, I have dfs - a device
    driver which allows access to remote files just as if they
    are local provided the dfs server has allowed access to that
    user/path etc.). Then tell me how you implemented windowing,
    how do you deal with offscreen buffering, how do you refresh
    which part and how do you manipulate which gets pulled where
    etc. etc., it is a long way to go but once you have some
    screenshots it will be interesting to compare this or that.
    Mine are there to see and well, I have not stopped working
    either.

    Dimiter

    ======================================================
    Dimiter Popoff, TGI http://www.tgi-sci.com ====================================================== http://www.flickr.com/photos/didi_tgi/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Don Y@21:1/5 to All on Thu Jun 10 21:55:23 2021
    On 6/10/2021 3:13 PM, Dimiter_Popoff wrote:
    On 6/11/2021 0:09, Don Y wrote:
    On 6/10/2021 8:32 AM, Dimiter_Popoff wrote:
    On 6/10/2021 16:55, Don Y wrote:
    On 6/10/2021 3:45 AM, Dimiter_Popoff wrote:

    [attrs elided]

    Don, this becomes way too lengthy and repeating itself.

    You keep on saying that a linear 64 bit address space means exposing
    everything to everybody after I explained this is not true at all.

    Task A has built a structure -- a page worth of data residing
    at 0x123456. It wants to pass this to TaskB so that TaskB can perform
    some operations on it.

    Can TaskB acccess the data at 0x123456 *before* TaskA has told it
    to do so >
    Can TaskB access the data at 0x123456 WHILE TaskA is manipulating it?

    Can TaskA alter the data at 0x123456 *after* it has "passed it along"
    to TaskB -- possibly while TaskB is still using it?

    If task A does not want any of the above it just places them in a
    page to which it only has access. Or it can allow read access only.
    *Why* do you confuse this with linear address space? What does the
    one have to do with the other?

    As I tease more of your design out of you, it becomes apparent why
    you "need" a flat address space. You push much of the responsibility
    for managing the environment into the developer's hands. *He* decides
    which regions of memory to share. He talks to the MMU (even if through
    an API). He directly retrieves values from other tasks. Etc.

    So, he must be able to get anywhere and do anything at any time
    (by altering permissions, if need be).

    By contrast, I remove all of that from the developer's shoulders.
    I only expect a developer to be able to read the IDL for the
    objects that he "needs" to access and understand the syntax required
    for each such access (RMI/RPC). The machine LOOKS like it is
    a simple uniprocessor with no synchronization issues that the
    developer has to contend with, no network addressing, no cache
    or memory management, etc.

    EVERYTHING is done *indirectly* in my world. Much like a file system
    interface (your developer doesn't directly write bytes onto the disk
    but, rather, lets the file system resolve a filename and create
    a file handle which is then used to route bytes to the media).

    The interface to EVERYTHING in my system is through such an
    extra layer of indirection. Because things exist in different
    address spaces, on different processors, etc. the OS mediates
    all such accesses. ALL of them! Yes, it's inefficient. But,
    the processor runs at 500MHz and I have 244 of them in my
    (small!) alpha site -- I figure I can *afford* to be a little
    inefficient (especially as you want to *minimize* interactions
    between objects just as a general design goal)

    Because of this, I can enforce fine-grained protection mechanisms;
    I can let you increment a counter -- but not decrement it
    (assuming a counter is an object). Or, let you read its contents
    but never alter them. Meanwhile, some other client (task)
    can reset it but never read it.

    And, the OS can act as a bridge/proxy to an object residing on
    a different node -- what "address" do you access to reference
    Counter 34 on node 56? Who tells you that it resides on 56
    and hasn't been moved to node 29??

    Because the OS can provide that proxy interface, I can *move*
    an object between successive accesses -- without its clients
    knowing this has happened. As if the file server you
    are accessing had suddenly been replaced by another machine
    at a different IP address WHILE you were accessing files!

    Likewise, because the access is indirect, I can interpose
    an agency on selective objects to implement redundancy for that
    object without the client using THAT interface ever knowing.

    Or, support different versions of an interface simultaneously
    (which address do you access to see the value of the
    counter as an unsigned binary long? which address to see
    it as a floating point number? which address to see it
    as an ASCII string?)

    Note that I can do all of these things with a flat *or* overlapping
    address space. Because a task doesn't need to be able to DIRECTLY
    access anything -- other than the system trap!

    You, on the other hand, have to build a different mechanism (e.g.,
    your distributed filesystem) to access certain TYPES of objects
    (e.g., files) without concern for their location. That ability
    comes "free" for EVERY type of object in my world.

    It is essential as I expect to be interacting with other nodes
    continuously -- and those nodes can be powered up or down
    independent of my wishes. Can I pull a board out of your MCA
    and expect it to keep running? Unplug one of my nodes (or
    cut the cable, light it on fire, etc.) and there will be
    a hiccup while I respawn the services/objects that were
    running on that node to another node. But, clients of
    those services/objects will just see a prolonged RMI/RPC
    (if one was in progress when the node was killed)

    Note that I've not claimed it is "better". What I have claimed
    is that it "does more" (than <whatever>). And, because it does
    more (because I EXPECT it to), any perceived advantages of a
    flat address space are just down in the "noise floor". They
    don't factor into the implementation decisions. By the time I
    "bolted on" these features OUTSIDE your OS onto your implementation,
    I'd have a BIGGER solution to the same problem!

    ["Access this memory address directly -- unless the object you want
    has been moved to another node. In which case, access this OTHER
    address to figure out where it's gone to; then access yet another
    address to actually do what you initially set out to do, had the
    object remained 'local'"]

    This sums up our differences:

    Why would you want to protect regions you don't want protected?

    Why WOULDN"T you want to protect EVERYTHING??

    Sharing should be an exception. It should be more expensive to
    share than NOT to share. You don't want things comingling unless
    they absolutely MUST. And, the more such interaction, the more
    you should look at the parties involved to see if refactoring
    may be warranted. "Opaque" is the operative word. The more
    you expose, the more interdependencies you create.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dimiter_Popoff@21:1/5 to Don Y on Fri Jun 11 14:14:46 2021
    On 6/11/2021 7:55, Don Y wrote:
    ...
    As I tease more of your design out of you, it becomes apparent why
    you "need" a flat address space.  You push much of the responsibility
    for managing the environment into the developer's hands.  *He* decides
    which regions of memory to share.  He talks to the MMU (even if through
    an API).  He directly retrieves values from other tasks.  Etc.

    It is not true that the developer is in control of all that. Messaging
    from one task to another goes through a system call.

    Anyway, I am not interested in discussing dps here/now.

    The *only* thing I would like you to answer me is why you think
    a linear 64 bit address space can add vulnerability to a design.

    Dimiter

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Don Y@21:1/5 to that's what you on Fri Jun 11 05:10:27 2021
    On 6/11/2021 4:14 AM, Dimiter_Popoff wrote:
    On 6/11/2021 7:55, Don Y wrote:
    ...
    As I tease more of your design out of you, it becomes apparent why
    you "need" a flat address space. You push much of the responsibility
    for managing the environment into the developer's hands. *He* decides
    which regions of memory to share. He talks to the MMU (even if through
    an API). He directly retrieves values from other tasks. Etc.

    It is not true that the developer is in control of all that. Messaging
    from one task to another goes through a system call.

    But the client directly retrieves the values. The OS doesn't provide
    them (at least, that's what you said previously)

    Anyway, I am not interested in discussing dps here/now.

    The *only* thing I would like you to answer me is why you think
    a linear 64 bit address space can add vulnerability to a design.

    Please tell me where I said it -- in and of itself -- makes a
    design vulnerable?

    HOW any aspect of an MCU is *used* is the cause of vulnerability;
    to internal bugs, external threats, etc. The more stuff that's exposed,
    the more places fault can creep into a design. It's why we litter code
    with invariants, check for the validity of input parameters, etc.
    Every interface is a potential for a fault; and an *opportunity*
    to bolster your confidence in the design (by verifying the interfaces
    are being used correctly!)

    [Do you think all of these ransomware attacks we hear of are
    the result of developers being INCREDIBLY stupid? Or, just
    "not paranoid enough"??]

    Turning off an MMU (when you have one available) is obviously
    putting you in a more "exposed" position than correctly
    *using* it (all else being equal). Unless, of course, you
    don't have the skills to use it properly.

    There are firewire implementations that actually let the external
    peripheral DMA directly into the host's memory. Any fault in the implementation *inside* the host obviously exposes the internals
    of the system to an external agent. Can you be 100.0% sure that
    the device you're plugging in (likely sold with your type of
    computer in mind and, thus, aware of what's where, inside!) is
    benign?

    <https://en.wikipedia.org/wiki/DMA_attack>

    Is there anything *inherently* wrong with DMA? Or Firewire? No.
    Do they create the potential for a VULNERABILITY in a system? Yes.
    The vulnerability is a result of how they are *used*.

    My protecting-everything-from-everything-else is intended to eliminate unanticipated attack vectors before a hostile actor (third party
    software or external agent) can discover an exploit. Or, a latent
    bug can compromise the proper operation of the system. It's why I
    *don't* have any global namespaces (if you can't NAME something,
    then you can't ACCESS it -- even if you KNOW it exists, somewhere;
    controlling the names you can see controls the things you can access)

    It's why I require you to have a valid "Handle" to every object with
    which you want to interact; if you don't have a handle to the
    object, then you can't talk to it. You can't consume it's resources
    or try to exploit vulnerabilities that may be present. Or, just plain
    ask it (mistakenly) to do something incorrect!

    It's why I don't let you invoke EVERY method on a particular object,
    even if you have a valid handle! Because you don't need to be ABLE
    to do something that you don't NEED to do! Attempting to do so
    is indicative of either a bug (because you didn't declare a need
    to access that method when you were installed!) or an attempted
    exploit. In either case, there is no value to letting you continue
    with a simple error message.

    <https://en.wikipedia.org/wiki/Principle_of_least_privilege>

    It's why each object can decide to *sever* your "legitimate" connection
    to any of it's Interfaces if it doesn't like what you are doing
    or asking it to do. "Too bad, so sad. Take it up with Management!
    And, no, we won't be letting you get restarted cuz we know there's
    something unhealthy about you!"

    It's why access controls are applied on the *client* side of
    a transaction instead of requiring the server/object to make
    that determination (like some other capability-based systems).
    Because any server-side activities consume the server's
    resources, even if it will ultimately deny your request
    (move the denial into YOUR resources)

    It's why I enforce quotas on the resources you can consume -- or
    have others consume for your *benefit* -- so an application's
    (task) "load" on the system can be constrained.

    If you want to put staff in place to vet each third party application
    before "offering it in your store", then you have to assume that
    overhead -- and HOPE you catch any malevolent/buggy actors before
    folks install those apps. I think that's the wrong approach as
    it requires a sizeable effort to test/validate any submitted
    application "thoroughly" (you end up doing the developer's work
    FOR him!)

    Note that bugs also exist, even in the absence of "bad intent".
    Should they be allowed to bring down your product/system? Or,
    should their problems be constrained to THEIR demise??

    [I'm assuming your MCA has the ability to "print" hardcopy
    of <whatever>. Would it be acceptable if a bug in your print
    service brought down the instrument? This *session*?
    Silently corrupted the data that it was asked to print?]

    ANYTHING (and EVERYTHING) that I can do to make my system more robust
    is worth the effort. Hardware is cheap (relatively speaking).
    Debugging time is VERY costly. And, "user inconvenience/aggravation"
    is *outrageously* expensive! I let the OS "emulate" features that
    I wished existed in the silicon -- because, there, they would
    likely be less expensive to utilize (time, resources)

    This is especially true in my alpha site application. Imagine being
    blind, deaf, wheelchair confined, paralyzed/amputee, early onset
    altzheimers, or "just plain old", etc. and having to deal with something
    that is misbehaving ALL AROUND YOU (because it pervades your home
    environment). It was intended to *facilitate* your continued presence
    in YOUR home, delaying your transfer to an a$$i$ted care facility.
    Now, it's making life "very difficult"!

    "Average Joes" get pissed off when their PC misbehaves.
    Imagine your garage door opening in the middle of the night.
    Or, the stereo turns on -- loud -- while you're on the phone.
    Or, the phone hangs up mid conversation.
    Or, the wrong audio stream accompanies a movie you're viewing.
    Or, a visitor is announced at the front door, but noone is there!
    Or, the coffee maker turned on too early and your morning coffee is mud.
    Or, the heat turns on midafternoon on a summer day.
    Or, the garage door closes on your vehicle as you are exiting.
    Or, your bedside alarm goes off at 3AM.
    How long will you wait for "repair" in that sort of environment?
    When are you overwhelmed by the technology (that is supposed to be
    INVISIBLE) coupled with your current condition -- and just throw
    in the towel?

    YOU can sell a spare MCA to a customer who wants to minimize his
    downtime "at any cost". Should I make "spare houses" available?
    Maybe deeply discounted?? :<

    What about spare factories??

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dimiter_Popoff@21:1/5 to Don Y on Fri Jun 11 16:35:12 2021
    On 6/11/2021 15:10, Don Y wrote:
    On 6/11/2021 4:14 AM, Dimiter_Popoff wrote:
    On 6/11/2021 7:55, Don Y wrote:
    ...
    As I tease more of your design out of you, it becomes apparent why
    you "need" a flat address space.  You push much of the responsibility
    for managing the environment into the developer's hands.  *He* decides
    which regions of memory to share.  He talks to the MMU (even if through >>> an API).  He directly retrieves values from other tasks.  Etc.

    It is not true that the developer is in control of all that. Messaging
    from one task to another goes through a system call.

    But the client directly retrieves the values.  The OS doesn't provide
    them (at least, that's what you said previously)

    I am not sure what this means. The recipient task has advertised a field
    where messages can be queued, the sending task makes a system call
    designating the message and which task is to receive it; during that
    call execution the message is written into the memory of the recipient.
    Then at some point later the recipient can see that and process the
    message. What more do you need?


    Anyway, I am not interested in discussing dps here/now.

    The *only* thing I would like you to answer me is why you think
    a linear 64 bit address space can add vulnerability to a design.

    Please tell me where I said it -- in and of itself -- makes a
    design vulnerable?

    This is how the exchange started:


    Dimiter_Popoff wrote:


    The real value in 64 bit integer registers and 64 bit address space is >>>> just that, having an orthogonal "endless" space (well I remember some
    30 years ago 32 bits seemed sort of "endless" to me...).

    Not needing to assign overlapping logical addresses to anything
    can make a big difference to how the OS is done.

    That depends on what you expect from the OS. If you are
    comfortable with the possibility of bugs propagating between
    different subsystems, then you can live with a logical address
    space that exactly coincides with a physical address space.

    So how does the linear 64 bt address space get in the way of
    any protection you want to implement? Pages are still 4 k and
    each has its own protection attributes governed by the OS,
    it is like that with 32 bit processors as well (I talk power, I am
    not interested in half baked stuff like ARM, risc-v etc., I don't
    know if there could be a problem like that with one of these).

    With a linear address space, you typically have to link EVERYTHING
    as a single image to place each thing in its own piece of memory
    (or use segment based addressing).

    Now if you have missed the "logical" word in my post I can
    understand why you went into all that. But I was quite explicit
    about it.
    Anyway, I am glad we agree that a 64 bit logical address space
    is no obstacle to security. From there on it can only be something
    to make programming life easier.

    Dimiter





    HOW any aspect of an MCU is *used* is the cause of vulnerability;
    to internal bugs, external threats, etc.  The more stuff that's exposed,
    the more places fault can creep into a design.  It's why we litter code
    with invariants, check for the validity of input parameters, etc.
    Every interface is a potential for a fault; and an *opportunity*
    to bolster your confidence in the design (by verifying the interfaces
    are being used correctly!)

    [Do you think all of these ransomware attacks we hear of are
    the result of developers being INCREDIBLY stupid?  Or, just
    "not paranoid enough"??]

    Turning off an MMU (when you have one available) is obviously
    putting you in a more "exposed" position than correctly
    *using* it (all else being equal).  Unless, of course, you
    don't have the skills to use it properly.

    There are firewire implementations that actually let the external
    peripheral DMA directly into the host's memory.  Any fault in the implementation *inside* the host obviously exposes the internals
    of the system to an external agent.  Can you be 100.0% sure that
    the device you're plugging in (likely sold with your type of
    computer in mind and, thus, aware of what's where, inside!) is
    benign?

    <https://en.wikipedia.org/wiki/DMA_attack>

    Is there anything *inherently* wrong with DMA?  Or Firewire?  No.
    Do they create the potential for a VULNERABILITY in a system?  Yes.
    The vulnerability is a result of how they are *used*.

    My protecting-everything-from-everything-else is intended to eliminate unanticipated attack vectors before a hostile actor (third party
    software or external agent) can discover an exploit.  Or, a latent
    bug can compromise the proper operation of the system.  It's why I
    *don't* have any global namespaces (if you can't NAME something,
    then you can't ACCESS it -- even if you KNOW it exists, somewhere; controlling the names you can see controls the things you can access)

    It's why I require you to have a valid "Handle" to every object with
    which you want to interact; if you don't have a handle to the
    object, then you can't talk to it.  You can't consume it's resources
    or try to exploit vulnerabilities that may be present.  Or, just plain
    ask it (mistakenly) to do something incorrect!

    It's why I don't let you invoke EVERY method on a particular object,
    even if you have a valid handle!  Because you don't need to be ABLE
    to do something that you don't NEED to do!  Attempting to do so
    is indicative of either a bug (because you didn't declare a need
    to access that method when you were installed!) or an attempted
    exploit.  In either case, there is no value to letting you continue
    with a simple error message.

    <https://en.wikipedia.org/wiki/Principle_of_least_privilege>

    It's why each object can decide to *sever* your "legitimate" connection
    to any of it's Interfaces if it doesn't like what you are doing
    or asking it to do.  "Too bad, so sad.  Take it up with Management!
    And, no, we won't be letting you get restarted cuz we know there's
    something unhealthy about you!"

    It's why access controls are applied on the *client* side of
    a transaction instead of requiring the server/object to make
    that determination (like some other capability-based systems).
    Because any server-side activities consume the server's
    resources, even if it will ultimately deny your request
    (move the denial into YOUR resources)

    It's why I enforce quotas on the resources you can consume -- or
    have others consume for your *benefit* -- so an application's
    (task) "load" on the system can be constrained.

    If you want to put staff in place to vet each third party application
    before "offering it in your store", then you have to assume that
    overhead -- and HOPE you catch any malevolent/buggy actors before
    folks install those apps.  I think that's the wrong approach as
    it requires a sizeable effort to test/validate any submitted
    application "thoroughly" (you end up doing the developer's work
    FOR him!)

    Note that bugs also exist, even in the absence of "bad intent".
    Should they be allowed to bring down your product/system?  Or,
    should their problems be constrained to THEIR demise??

    [I'm assuming your MCA has the ability to "print" hardcopy
    of <whatever>.  Would it be acceptable if a bug in your print
    service brought down the instrument?  This *session*?
    Silently corrupted the data that it was asked to print?]

    ANYTHING (and EVERYTHING) that I can do to make my system more robust
    is worth the effort.  Hardware is cheap (relatively speaking).
    Debugging time is VERY costly.  And, "user inconvenience/aggravation"
    is *outrageously* expensive!  I let the OS "emulate" features that
    I wished existed in the silicon -- because, there, they would
    likely be less expensive to utilize (time, resources)

    This is especially true in my alpha site application.  Imagine being
    blind, deaf, wheelchair confined, paralyzed/amputee, early onset
    altzheimers, or "just plain old", etc. and having to deal with something
    that is misbehaving ALL AROUND YOU (because it pervades your home environment).  It was intended to *facilitate* your continued presence
    in YOUR home, delaying your transfer to an a$$i$ted care facility.
    Now, it's making life "very difficult"!

    "Average Joes" get pissed off when their PC misbehaves.
      Imagine your garage door opening in the middle of the night.
      Or, the stereo turns on -- loud -- while you're on the phone.
      Or, the phone hangs up mid conversation.
      Or, the wrong audio stream accompanies a movie you're viewing.
      Or, a visitor is announced at the front door, but noone is there!
      Or, the coffee maker turned on too early and your morning coffee is mud.
      Or, the heat turns on midafternoon on a summer day.
      Or, the garage door closes on your vehicle as you are exiting.
      Or, your bedside alarm goes off at 3AM.
    How long will you wait for "repair" in that sort of environment?
    When are you overwhelmed by the technology (that is supposed to be
    INVISIBLE) coupled with your current condition -- and just throw
    in the towel?

    YOU can sell a spare MCA to a customer who wants to minimize his
    downtime "at any cost".  Should I make "spare houses" available?
    Maybe deeply discounted??  :<

    What about spare factories??

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Don Y@21:1/5 to All on Fri Jun 11 16:24:43 2021
    On 6/11/2021 6:35 AM, Dimiter_Popoff wrote:
    On 6/11/2021 15:10, Don Y wrote:
    On 6/11/2021 4:14 AM, Dimiter_Popoff wrote:
    On 6/11/2021 7:55, Don Y wrote:
    ...
    As I tease more of your design out of you, it becomes apparent why
    you "need" a flat address space. You push much of the responsibility
    for managing the environment into the developer's hands. *He* decides >>>> which regions of memory to share. He talks to the MMU (even if through >>>> an API). He directly retrieves values from other tasks. Etc.

    It is not true that the developer is in control of all that. Messaging
    from one task to another goes through a system call.

    But the client directly retrieves the values. The OS doesn't provide
    them (at least, that's what you said previously)

    I am not sure what this means. The recipient task has advertised a field where messages can be queued, the sending task makes a system call designating the message and which task is to receive it; during that
    call execution the message is written into the memory of the recipient.
    Then at some point later the recipient can see that and process the
    message. What more do you need?


    Anyway, I am not interested in discussing dps here/now.

    The *only* thing I would like you to answer me is why you think
    a linear 64 bit address space can add vulnerability to a design.

    Please tell me where I said it -- in and of itself -- makes a
    design vulnerable?

    This is how the exchange started:


    Dimiter_Popoff wrote:


    The real value in 64 bit integer registers and 64 bit address space is >>>>> just that, having an orthogonal "endless" space (well I remember some >>>>> 30 years ago 32 bits seemed sort of "endless" to me...).

    Not needing to assign overlapping logical addresses to anything
    can make a big difference to how the OS is done.

    That depends on what you expect from the OS. If you are
    comfortable with the possibility of bugs propagating between
    different subsystems, then you can live with a logical address
    space that exactly coincides with a physical address space.

    So how does the linear 64 bt address space get in the way of
    any protection you want to implement? Pages are still 4 k and
    each has its own protection attributes governed by the OS,
    it is like that with 32 bit processors as well (I talk power, I am
    not interested in half baked stuff like ARM, risc-v etc., I don't
    know if there could be a problem like that with one of these).

    With a linear address space, you typically have to link EVERYTHING
    as a single image to place each thing in its own piece of memory
    (or use segment based addressing).

    Now if you have missed the "logical" word in my post I can
    understand why you went into all that. But I was quite explicit
    about it.

    It's easier to get some part of a flat address space *wrong*.
    And, as you've exposed (even if hiding behind an MMU) everything,
    that presents an opportunity for SOMETHING to leak -- that
    shouldn't.

    Alpha (OS) took this to an extreme. Each object had its own
    address space fitted neatly onto some number of (contiguous?)
    pages.

    When you invoked a method on an object, you trapped to the OS.
    The OS marshalled your arguments into an "input page(s)"
    and created an empty "ouput page(s)". It then built an address
    space consisting of the input and output pages (at logical
    addresses that an object recognized, by convention) AND the
    page(s) for the object's code. NOTHING ELSE. Then, transfered
    control to one of N entry points in the first page of the
    object's implementation (another convention).

    So, the object's code had free access to its inputs, its
    own code and it's outputs. Attempting to reference anything else
    would signal a protection violation. There *is* nothing else!
    A bug, errant pointer, exploit, etc. would just land on
    unmapped memory!

    [Note that an object could invoke ANOTHER object -- but, that
    object would then be built up in yet another address space
    while the current object's address space was idled]

    This approach makes it hard for "active" objects to be
    made persistent (e.g., a *process* that is "doing something
    yet has an object interface) so objects tend to want to be
    passive.

    I don't go to these extremes. But close!

    An instance of a "foo" object is served by (an instance of)
    a foo_object_server. That server can serve multiple foo objects
    concurrently (and can even use multiple threads to do so).

    The object_server accepts the messages for the object and
    invokes the corresponding method in the context of that
    particular object. Because it exists in an isolated address
    space, no other objects/clients can examine the object's
    implementation -- the code executed by the object_server
    to handle each request/method as well as the "private data"
    associated with an object instance. Nor can they interfere
    with any of this.

    Another instance of a foo_object_server can serve other
    foo objects -- on the same node or on some other node.

    [i.e., to migrate a foo object to another node, instantiate
    a foo_object_server on the target node -- if one doesn't
    already exist there -- and then transfer the internal
    representation of the desired foo object to that other
    server. And, arrange for all "Interfaces" to that
    particular object to simultaneously be transferred
    (so future connections to the object are unaware of
    the migration)]

    As my *_object_servers can be persistent, an object can be
    active -- can continue doing something even after (or before!)
    every request (method invocation) has been serviced. It's
    up to the object designer to decide the semantics of
    each method.

    E.g., should garagedoor.open() start the opening process and
    wait until it is complete before returning to the caller?
    Or, should it just start the process and return immediately?
    Should rosebush.irrigate() block the caller for the hour or two
    it takes to irrigate?

    Anyway, I am glad we agree that a 64 bit logical address space
    is no obstacle to security. From there on it can only be something
    to make programming life easier.

    It's not an obstacle. But, it's an *opportunity* for a bug to
    manifest or information to leak or an exploit to gain a beachhead.
    The less stuff that can be exposed (even accidentally), the less
    opportunity for these problems to manifest.

    Think of all the "account compromises" that have occurred.
    All of the "personal/account information" that has leaked.
    Why was that information accessible? I'm sure it wasn't
    SUPPOSED to be accessible. But, why was it even resident
    ANYWHERE on an out-facing machine?

    If I make a purchase, I provide a CC number, name ("as it
    appears on the card") and the CVC number off the back.
    To validate my card, it would be foolish to:
    select name, cvc from credit_cards where card_number = CC_provided
    if name != name_provided
    or cvc != cvc_provided
    then reject_transaction
    while this, by itself, is correct and "secure", the approach
    requires the credit_cards table to contain ALL of the data for
    every credit card. AND, has the potential to allow an
    adversary to trick the software into revealing all or part of it!

    If, OTOH, the implementation was:
    if !verify(CC_provided, name_provided, cvc_provided)
    then reject_transaction
    all of the details can be hidden behind verify() which can
    execute on a more secure processor with a more secure protocol.
    E.g., compute a hash of these three values and ask the
    DBMS if the hash is correct, without ever having the raw data
    stored!

    An adversary could try to *guess* a valid combination of
    name/cc/cvc --- but, would have to repeatedly issue "validate()"
    requests -- which can then attract suspicion.

    [This is why I allow an object to sever an incoming connection
    if *it* detects behaviours that it considers harmful or
    suspicious; the OS has no way of making that determination!]

    Making MORE things inaccessible (i.e., everything that doesn't
    NEED to be accessible -- like Alpha's approach) improves
    reliability and security. Because REAL implementations
    always have oversights and bugs -- things that you hadn't
    considered or for which you hadn't tested.

    Again, your application and application domain is likely far
    more benign than mine. The people operating your devices
    are likely more technically capable (which MIGHT lead to
    better process compliance). How likely would one of your
    instruments find itself targeted by an attacker? And, your
    budget is likely tighter than mine (in terms of money,
    resources and performance).

    I can *afford* a more featureful OS to offload more of the
    work/vulnerability from developers. Doing so lets more
    developers tinker in my environment. And, makes the
    resulting design (which evolves as each developer adds
    to the system) more robust from attack, fault, compromise.

    [An adversary is more likely to target one of my systems
    than yours: "Deposit $1,000 in this bitcoin account to
    regain control over your home/factory/business..."]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From George Neuner@21:1/5 to All on Sat Jun 12 12:58:11 2021
    On Wed, 9 Jun 2021 03:12:12 -0700, Don Y <blockedofcourse@foo.invalid>
    wrote:

    On 6/9/2021 12:17 AM, David Brown wrote:

    Process geometries are not targeted at 64-bit. They are targeted at
    smaller, faster and lower dynamic power. In order to produce such a big
    design as a 64-bit cpu, you'll aim for a minimum level of process
    sophistication - but that same process can be used for twice as many
    32-bit cores, or bigger sram, or graphics accelerators, or whatever else
    suits the needs of the device.

    They will apply newer process geometries to newer devices.
    No one is going to retool an existing design -- unless doing
    so will result in a significant market enhancement.

    Why don't we have 100MHz MC6800's?

    A number of years ago somebody had a 200MHz 6502. Granted, it was a
    soft core implemented in an ASIC.

    No idea what it was used for.


    But you are absolutely right about maths (floating point or integer) -
    having 32-bit gives you a lot more freedom and less messing around with
    scaling back and forth to make things fit and work efficiently in 8-bit
    or 16-bit. And if you have floating point hardware (and know how to use
    it properly), that opens up new possibilities.

    64-bit cores will extend that, but the step is almost negligable in
    comparison. It would be wrong to say "int32_t is enough for anyone",
    but it is /almost/ true. It is certainly true enough that it is not a
    problem that using "int64_t" takes two instructions instead of one.

    Except that int64_t can take *four* instead of one (add/sub/mul two
    int64_t's with 32b hardware).

    A 32b CPU could require a dozen instructions to do 64b math depending
    on whether it has condition flags, whether math ops set the condition
    flags (vs requiring explicit compare or compare/branch), and whether
    it even has carry aware ops [some chips don't]

    If detecting wrap-around/overflow requires comparing the result
    against the operands, multi-word arithmetic (even just 2 words)
    quickly becomes long and messy.

    YMMV,
    George

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Don Y@21:1/5 to George Neuner on Sat Jun 12 12:40:53 2021
    Hi George,

    On 6/12/2021 9:58 AM, George Neuner wrote:
    On Wed, 9 Jun 2021 03:12:12 -0700, Don Y <blockedofcourse@foo.invalid>
    wrote:

    On 6/9/2021 12:17 AM, David Brown wrote:

    Process geometries are not targeted at 64-bit. They are targeted at
    smaller, faster and lower dynamic power. In order to produce such a big >>> design as a 64-bit cpu, you'll aim for a minimum level of process
    sophistication - but that same process can be used for twice as many
    32-bit cores, or bigger sram, or graphics accelerators, or whatever else >>> suits the needs of the device.

    They will apply newer process geometries to newer devices.
    No one is going to retool an existing design -- unless doing
    so will result in a significant market enhancement.

    Why don't we have 100MHz MC6800's?

    A number of years ago somebody had a 200MHz 6502. Granted, it was a
    soft core implemented in an ASIC.

    No idea what it was used for.

    AFAICT, the military still uses them. I know there was a radhard
    8080 (or 8085?) made some years back.

    I suspect it would just be a curiosity piece, though. You'd need
    < 10ns memory to use it in its original implementation. Easier
    to write an emulator and run it on a faster COTS machine!

    But you are absolutely right about maths (floating point or integer) -
    having 32-bit gives you a lot more freedom and less messing around with
    scaling back and forth to make things fit and work efficiently in 8-bit
    or 16-bit. And if you have floating point hardware (and know how to use >>> it properly), that opens up new possibilities.

    64-bit cores will extend that, but the step is almost negligable in
    comparison. It would be wrong to say "int32_t is enough for anyone",
    but it is /almost/ true. It is certainly true enough that it is not a
    problem that using "int64_t" takes two instructions instead of one.

    Except that int64_t can take *four* instead of one (add/sub/mul two
    int64_t's with 32b hardware).

    A 32b CPU could require a dozen instructions to do 64b math depending
    on whether it has condition flags, whether math ops set the condition
    flags (vs requiring explicit compare or compare/branch), and whether
    it even has carry aware ops [some chips don't]

    If detecting wrap-around/overflow requires comparing the result
    against the operands, multi-word arithmetic (even just 2 words)
    quickly becomes long and messy.

    If you look back to life with 8b registers, you understand the
    pain of even 32b operations.

    Wider architectures make data manipulation easier. Bigger
    *address* spaces (wider address buses) make it easier to
    "do more".

    So, an 8b CPU with extended address space (bank switching, etc.)
    can tackle a bigger (more varied) problem (at a slow rate).
    But a wider CPU with a much smaller address space can handle
    a smaller (in scope) problem at a much faster rate (all else
    being equal -- memory speed, etc.)

    When doing video games, this was a common discussion (price
    sensitive); do you move to a wider processor to gain performance?
    or, do you move to a faster one? (where you put the money changes)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From George Neuner@21:1/5 to All on Mon Jun 14 05:41:34 2021
    On Tue, 8 Jun 2021 18:29:24 -0700, Don Y <blockedofcourse@foo.invalid>
    wrote:

    On 6/8/2021 3:01 PM, Dimiter_Popoff wrote:

    Am trying to puzzle out what a 64-bit embedded processor should look like. >>> At the low end, yeah, a simple RISC processor. And support for complex
    arithmetic
    using 32-bit floats? And support for pixel alpha blending using quad 16-bit
    numbers?
    32-bit pointers into the software?

    The real value in 64 bit integer registers and 64 bit address space is
    just that, having an orthogonal "endless" space (well I remember some
    30 years ago 32 bits seemed sort of "endless" to me...).

    Not needing to assign overlapping logical addresses to anything
    can make a big difference to how the OS is done.

    That depends on what you expect from the OS. If you are
    comfortable with the possibility of bugs propagating between
    different subsystems, then you can live with a logical address
    space that exactly coincides with a physical address space.

    Propagation of bugs is mostly independent of the logical address
    space. In actual fact, existing SAS operating systems are MORE
    resistant to problems than MPAS systems.


    But, consider how life was before Windows used compartmentalized
    applications (and OS). How easily it is for one "application"
    (or subsystem) to cause a reboot -- unceremoniously.

    You can kill a Windows systems with 2 lines of code:
    :
    SetThreadPriority(GetCurrentThread(),THREAD_PRIORITY_TIME_CRITICAL);
    while( 1 );
    :


    The general direction (in software development, and, by
    association, hardware) seems to be to move away from unrestrained
    access to the underlying hardware in an attempt to limit the
    amount of damage that a "misbehaving" application can cause.

    You see this in languages designed to eliminate dereferencing
    pointers, pointer arithmetic, etc. Languages that claim to
    ensure your code can't misbehave because it can only do
    exactly what the language allows (no more injecting ASM
    into your HLL code).

    Managed runtimes.

    Pointers, per se, are not a problem. However, explicit pointer
    arithmetic /is/ known to be the cause of many program bugs.

    IMO high level languages should not allow it - there's nothing you can
    do with (source level) pointer arithmetic can do that can't be done
    more safely using array indexing. The vast majority of programmers
    should just let the compiler deal with it.


    32 bit FPU seems useless to me, 64 bit is OK. Although 32 FP
    *numbers* can be quite useful for storing/passing data.

    32 bit numbers have appeal if you're registers are 32b;
    they "fit nicely". Ditto 64b in 64b registers.

    Depends on the problem domain. If you don't need the extra precision, calculations with 32b floats often are twice or more as fast as with
    64b doubles.

    Particularly with SIMD, you gain both by 32b calculations taking
    fewer cycles than 64b, and by being able to perform twice as many
    simultaneous calculations.

    YMMV,
    George

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)