Sometimes things move faster than expected. As someone with an embedded background this caught me by surprise:
Tera-Byte microSD cards are readily available and getting cheaper. Heck, you can carry ten of them in a credit card pouch. Likely to move to the same price range as hard disks ($20/TB).
That means that a 2+ square inch PCB can hold a 64-bit processor and enough storage for memory mapped files larger than 4GB.
Is the 32-bit embedded processor cost vulnerable to 64-bit 7nm devices as
the FABs mature? Will video data move to the IOT edge? Will AI move to the edge? Will every embedded CPU have a built-in radio?
Wait a few years and find out.
James Brakefield <jim.brakefield@ieee.org> writes:
Is the 32-bit embedded processor cost vulnerable to 64-bit 7nm devices
as the FABs mature? Will video data move to the IOT edge? Will AI move
to the edge? Will every embedded CPU have a built-in radio?
I don't care what the people say--
32 bits are here to stay.
Is the 32-bit embedded processor cost vulnerable to 64-bit 7nm devices
as the FABs mature? Will video data move to the IOT edge? Will AI move
to the edge? Will every embedded CPU have a built-in radio?
8-bit microcontrollers are still far more common than 32-bit devices in
the embedded world (and 4-bit devices are not gone yet). At the other
end, 64-bit devices have been used for a decade or two in some kinds of embedded systems.
We'll see 64-bit take a greater proportion of the embedded systems that demand high throughput or processing power (network devices, hard cores
in expensive FPGAs, etc.) where the extra cost in dollars, power,
complexity, board design are not a problem. They will probably become
more common in embedded Linux systems as the core itself is not usually
the biggest part of the cost. And such systems are definitely on the increase.
But for microcontrollers - which dominate embedded systems - there has
been a lot to gain by going from 8-bit and 16-bit to 32-bit for little
cost. There is almost nothing to gain from a move to 64-bit, but the
cost would be a good deal higher.
So it is not going to happen - at
least not more than a very small and very gradual change.
The OP sounds more like a salesman than someone who actually works with embedded development in reality.
On 6/7/2021 10:59 PM, David Brown wrote:
8-bit microcontrollers are still far more common than 32-bit devices in
the embedded world (and 4-bit devices are not gone yet). At the other
end, 64-bit devices have been used for a decade or two in some kinds of
embedded systems.
I contend that a good many "32b" implementations are really glorified
8/16b applications that exhausted their memory space.
We'll see 64-bit take a greater proportion of the embedded systems that
demand high throughput or processing power (network devices, hard cores
in expensive FPGAs, etc.) where the extra cost in dollars, power,
complexity, board design are not a problem. They will probably become
more common in embedded Linux systems as the core itself is not usually
the biggest part of the cost. And such systems are definitely on the
increase.
But for microcontrollers - which dominate embedded systems - there has
been a lot to gain by going from 8-bit and 16-bit to 32-bit for little
I disagree. The "cost" (barrier) that I see clients facing is the
added complexity of a 32b platform and how it often implies (or even *requires*) a more formal OS underpinning the application.
cost. There is almost nothing to gain from a move to 64-bit, but the
cost would be a good deal higher.
Why is the cost "a good deal higher"? Code/data footprints don't
uniformly "double" in size. The CPU doesn't slow down to handle
bigger data.
So it is not going to happen - at
least not more than a very small and very gradual change.
We got 32b processors NOT because the embedded world cried out for
them but, rather, because of the influence of the 32b desktop world.
We've had 32b processors since the early 80's. But, we've only had
PCs since about the same timeframe! One assumes ubiquity in the
desktop world would need to happen before any real spillover to embedded. (When the "desktop" was an '11 sitting in a back room, it wasn't seen
as ubiquitous.)
In the future, we'll see the 64b *phone* world drive the evolution
of embedded designs, similarly. (do you really need 32b/64b to
make a phone? how much code is actually executing at any given
time and in how many different containers?)
The OP sounds more like a salesman than someone who actually works with
embedded development in reality.
Possibly. Or, just someone that wanted to stir up discussion...
But for microcontrollers - which dominate embedded systems - there has
been a lot to gain by going from 8-bit and 16-bit to 32-bit for little
cost. There is almost nothing to gain from a move to 64-bit, but the
cost would be a good deal higher. So it is not going to happen - at
least not more than a very small and very gradual change.
On 6/7/2021 10:59 PM, David Brown wrote:
8-bit microcontrollers are still far more common than 32-bit devices inI contend that a good many "32b" implementations are really glorified
the embedded world (and 4-bit devices are not gone yet). At the other
end, 64-bit devices have been used for a decade or two in some kinds of embedded systems.
8/16b applications that exhausted their memory space. I still see lots
of designs build on a small platform (8/16b) and augment it -- either
with some "memory enhancement" technology or additional "slave"
processors to split the binaries. Code increases in complexity but
there doesn't seem to be a need for the "work-per-unit-time" to.
[This has actually been the case for a long time. The appeal of
newer CPUs is often in the set of peripherals that accompany the
processor, not the processor itself.]
We'll see 64-bit take a greater proportion of the embedded systems that demand high throughput or processing power (network devices, hard cores
in expensive FPGAs, etc.) where the extra cost in dollars, power, complexity, board design are not a problem. They will probably become
more common in embedded Linux systems as the core itself is not usually
the biggest part of the cost. And such systems are definitely on the increase.
But for microcontrollers - which dominate embedded systems - there hasI disagree. The "cost" (barrier) that I see clients facing is the
been a lot to gain by going from 8-bit and 16-bit to 32-bit for little
added complexity of a 32b platform and how it often implies (or even *requires*) a more formal OS underpinning the application. Where you
could hack together something on bare metal in the 8/16b worlds,
moving to 32 often requires additional complexity in managing
mechanisms that aren't usually present in smaller CPUs (caches,
MMU/MPU, DMA, etc.) Developers (and their organizations) can't just
play "coder cowboy" and coerce the hardware to behaving as they
would like. Existing staff (hired with the "bare metal" mindset)
are often not equipped to move into a more structured environment.
[I can hack together a device to meet some particular purpose
much easier on "development hardware" than I can on a "PC" -- simply
because there's too much I have to "work around" on a PC that isn't
present on development hardware.]
Not every product needs a filesystem, network stack, protected
execution domains, etc. Those come with additional costs -- often
in the form of a lack of understanding as to what the ACTUAL
code in your product is doing at any given time. (this isn't the
case in the smaller MCU world; it's possible for a developer to
have written EVERY line of code in a smaller platform)
cost. There is almost nothing to gain from a move to 64-bit, but theWhy is the cost "a good deal higher"? Code/data footprints don't
cost would be a good deal higher.
uniformly "double" in size. The CPU doesn't slow down to handle
bigger data.
The cost is driven by where the market goes. Note how many 68Ks found design-ins vs. the T11, F11, 16032, etc. My first 32b design was
physically large, consumed a boatload of power and ran at only a modest improvement (in terms of system clock) over 8b processors of its day.
Now, I can buy two orders of magnitude more horsepower PLUS a
bunch of built-in peripherals for two cups of coffee (at QTY 1)
So it is not going to happen - atWe got 32b processors NOT because the embedded world cried out for
least not more than a very small and very gradual change.
them but, rather, because of the influence of the 32b desktop world.
We've had 32b processors since the early 80's. But, we've only had
PCs since about the same timeframe! One assumes ubiquity in the
desktop world would need to happen before any real spillover to embedded. (When the "desktop" was an '11 sitting in a back room, it wasn't seen
as ubiquitous.)
In the future, we'll see the 64b *phone* world drive the evolution
of embedded designs, similarly. (do you really need 32b/64b to
make a phone? how much code is actually executing at any given
time and in how many different containers?)
[The OP suggests MCus with radios -- maybe they'll be cell phone
radios and *not* wifi/BLE as I assume he's thinking! Why add the
need for some sort of access point to a product's deployment if
the product *itself* can make a direct connection??]
My current design can't fill a 32b address space (but, that's because
I've decomposed apps to the point that they can be relatively small).
OTOH, designing a system with a 32b limitation seems like an invitation
to do it over when 64b is "cost effective". The extra "baggage" has
proven to be relatively insignificant (I have ports of my codebase
to SPARC as well as Atom running alongside a 32b ARM)
The OP sounds more like a salesman than someone who actually works with embedded development in reality.Possibly. Or, just someone that wanted to stir up discussion...
I contend that a good many "32b" implementations are really glorified
8/16b applications that exhausted their memory space.
The only thing that will take more than 4GB is video or a day's worth of photos.
So there is likely to be some embedded aps that need a > 32-bit address space.
Cost, size or storage capacity are no longer limiting factors.
Am trying to puzzle out what a 64-bit embedded processor should look like.
At the low end, yeah, a simple RISC processor.
And support for complex arithmetic
using 32-bit floats?
And support for pixel alpha blending using quad 16-bit numbers?
32-bit pointers into the software?
On 08/06/2021 16:46, Theo wrote:
......
Memory bus/cache width
No, that is not a common way to measure cpu "width", for many reasons.
A chip is likely to have many buses outside the cpu core itself (and the cache(s) may or may not be considered part of the core). It's common to
have 64-bit wide buses on 32-bit processors, it's also common to have
16-bit external databuses on a microcontroller. And the cache might be
128 bits wide.
David Brown <david.brown@hesbynett.no> wrote:
But for microcontrollers - which dominate embedded systems - there has
been a lot to gain by going from 8-bit and 16-bit to 32-bit for little
cost. There is almost nothing to gain from a move to 64-bit, but the
cost would be a good deal higher. So it is not going to happen - at
least not more than a very small and very gradual change.
I think there will be divergence about what people mean by an N-bit system:
Register size
Unit of logical/arithmetical processing
Memory address/pointer size
Memory bus/cache width
I think we will increasingly see parts which have different sizes on one
area but not the other.
For example, for doing some kinds of logical operations (eg crypto), having 64-bit registers and ALU makes sense, but you might only need kilobytes of memory so only have <32 address bits.
For something else, like a microcontroller that's hung off the side of a bigger system (eg the MCU on a PCIe card) you might want the ability to handle 64 bit addresses but don't need to pay the price for 64-bit
registers.
Or you might operate with 16 or 32 bit wide external RAM chip, but your
cache could extend that to a wider word width.
There are many permutations, and I think people will pay the cost where it benefits them and not where it doesn't.
This is not a new phenomenon, of course. But for a time all these numbers were in the range between 16 and 32 bits, which made 32 simplest all round. Just like we previously had various 8/16 hybrids (eg 8 bit datapath, 16 bit address) I think we're going to see more 32/64 hybrids.
On 08/06/2021 21:38, James Brakefield wrote:at?
Could you explain your background here, and what you are trying to get
at? That would make it easier to give you better answers.
The only thing that will take more than 4GB is video or a day's worth of photos.No, video is not the only thing that takes 4GB or more. But it is,
perhaps, one of the more common cases. Most embedded systems don't need anything remotely like that much memory - to the nearest percent, 100%
of embedded devices don't even need close to 4MB of memory (ram and
flash put together).
So there is likely to be some embedded aps that need a > 32-bit address space.Some, yes. Many, no.
Cost, size or storage capacity are no longer limiting factors.Cost and size (and power) are /always/ limiting factors in embedded systems.
Am trying to puzzle out what a 64-bit embedded processor should look like.There are plenty to look at. There are ARMs, PowerPC, MIPS, RISC-V.
And of course there are some x86 processors used in embedded systems.
At the low end, yeah, a simple RISC processor.Pretty much all processors except x86 and brain-dead old-fashioned 8-bit
CISC devices are RISC. Not all are simple.
And support for complex arithmeticA 64-bit processor will certainly support 64-bit doubles as well as
using 32-bit floats?
32-bit floats. Complex arithmetic is rarely needed, except perhaps for
FFT's, but is easily done using real arithmetic. You can happily do
32-bit complex arithmetic on an 8-bit AVR, albeit taking significant
code space and run time. I believe the latest gcc for the AVR will do
64-bit doubles as well - using exactly the same C code you would on any
other processor.
And support for pixel alpha blending using quad 16-bit numbers?You would use a hardware 2D graphics accelerator for that, not the
processor.
32-bit pointers into the software?
With 64-bit processors you usually use 64-bit pointers.
Could you explain your background here, and what you are trying to get
On Tuesday, June 8, 2021 at 2:39:29 AM UTC-5, Don Y wrote:
On 6/7/2021 10:59 PM, David Brown wrote:
8-bit microcontrollers are still far more common than 32-bit devices inI contend that a good many "32b" implementations are really glorified
the embedded world (and 4-bit devices are not gone yet). At the other
end, 64-bit devices have been used for a decade or two in some kinds of
embedded systems.
8/16b applications that exhausted their memory space. I still see lots
of designs build on a small platform (8/16b) and augment it -- either
with some "memory enhancement" technology or additional "slave"
processors to split the binaries. Code increases in complexity but
there doesn't seem to be a need for the "work-per-unit-time" to.
[This has actually been the case for a long time. The appeal of
newer CPUs is often in the set of peripherals that accompany the
processor, not the processor itself.]
We'll see 64-bit take a greater proportion of the embedded systems thatI disagree. The "cost" (barrier) that I see clients facing is the
demand high throughput or processing power (network devices, hard cores
in expensive FPGAs, etc.) where the extra cost in dollars, power,
complexity, board design are not a problem. They will probably become
more common in embedded Linux systems as the core itself is not usually
the biggest part of the cost. And such systems are definitely on the
increase.
But for microcontrollers - which dominate embedded systems - there has
been a lot to gain by going from 8-bit and 16-bit to 32-bit for little
added complexity of a 32b platform and how it often implies (or even
*requires*) a more formal OS underpinning the application. Where you
could hack together something on bare metal in the 8/16b worlds,
moving to 32 often requires additional complexity in managing
mechanisms that aren't usually present in smaller CPUs (caches,
MMU/MPU, DMA, etc.) Developers (and their organizations) can't just
play "coder cowboy" and coerce the hardware to behaving as they
would like. Existing staff (hired with the "bare metal" mindset)
are often not equipped to move into a more structured environment.
[I can hack together a device to meet some particular purpose
much easier on "development hardware" than I can on a "PC" -- simply
because there's too much I have to "work around" on a PC that isn't
present on development hardware.]
Not every product needs a filesystem, network stack, protected
execution domains, etc. Those come with additional costs -- often
in the form of a lack of understanding as to what the ACTUAL
code in your product is doing at any given time. (this isn't the
case in the smaller MCU world; it's possible for a developer to
have written EVERY line of code in a smaller platform)
cost. There is almost nothing to gain from a move to 64-bit, but theWhy is the cost "a good deal higher"? Code/data footprints don't
cost would be a good deal higher.
uniformly "double" in size. The CPU doesn't slow down to handle
bigger data.
The cost is driven by where the market goes. Note how many 68Ks found
design-ins vs. the T11, F11, 16032, etc. My first 32b design was
physically large, consumed a boatload of power and ran at only a modest
improvement (in terms of system clock) over 8b processors of its day.
Now, I can buy two orders of magnitude more horsepower PLUS a
bunch of built-in peripherals for two cups of coffee (at QTY 1)
So it is not going to happen - atWe got 32b processors NOT because the embedded world cried out for
least not more than a very small and very gradual change.
them but, rather, because of the influence of the 32b desktop world.
We've had 32b processors since the early 80's. But, we've only had
PCs since about the same timeframe! One assumes ubiquity in the
desktop world would need to happen before any real spillover to embedded.
(When the "desktop" was an '11 sitting in a back room, it wasn't seen
as ubiquitous.)
In the future, we'll see the 64b *phone* world drive the evolution
of embedded designs, similarly. (do you really need 32b/64b to
make a phone? how much code is actually executing at any given
time and in how many different containers?)
[The OP suggests MCus with radios -- maybe they'll be cell phone
radios and *not* wifi/BLE as I assume he's thinking! Why add the
need for some sort of access point to a product's deployment if
the product *itself* can make a direct connection??]
My current design can't fill a 32b address space (but, that's because
I've decomposed apps to the point that they can be relatively small).
OTOH, designing a system with a 32b limitation seems like an invitation
to do it over when 64b is "cost effective". The extra "baggage" has
proven to be relatively insignificant (I have ports of my codebase
to SPARC as well as Atom running alongside a 32b ARM)
The OP sounds more like a salesman than someone who actually works withPossibly. Or, just someone that wanted to stir up discussion...
embedded development in reality.
I contend that a good many "32b" implementations are really glorified 8/16b applications that exhausted their memory space.
The only thing that will take more than 4GB is video or a day's worth of photos.
So there is likely to be some embedded aps that need a > 32-bit address space.
Cost, size or storage capacity are no longer limiting factors.
Am trying to puzzle out what a 64-bit embedded processor should look like.
At the low end, yeah, a simple RISC processor. And support for complex arithmetic
using 32-bit floats? And support for pixel alpha blending using quad 16-bit numbers?
32-bit pointers into the software?
David Brown <david.brown@hesbynett.no> wrote:
But for microcontrollers - which dominate embedded systems - there has
been a lot to gain by going from 8-bit and 16-bit to 32-bit for little
cost. There is almost nothing to gain from a move to 64-bit, but the
cost would be a good deal higher. So it is not going to happen - at
least not more than a very small and very gradual change.
I think there will be divergence about what people mean by an N-bit system:
Register size
Unit of logical/arithmetical processing
Memory address/pointer size
Memory bus/cache width
I think we will increasingly see parts which have different sizes on one
area but not the other.
For example, for doing some kinds of logical operations (eg crypto), having 64-bit registers and ALU makes sense, but you might only need kilobytes of memory so only have <32 address bits.
For something else, like a microcontroller that's hung off the side of a bigger system (eg the MCU on a PCIe card) you might want the ability to handle 64 bit addresses but don't need to pay the price for 64-bit
registers.
Or you might operate with 16 or 32 bit wide external RAM chip, but your
cache could extend that to a wider word width.
There are many permutations, and I think people will pay the cost where it benefits them and not where it doesn't.
This is not a new phenomenon, of course. But for a time all these numbers were in the range between 16 and 32 bits, which made 32 simplest all round. Just like we previously had various 8/16 hybrids (eg 8 bit datapath, 16 bit address) I think we're going to see more 32/64 hybrids.
Theo
I contend that a good many "32b" implementations are really glorified 8/16b applications that exhausted their memory space.
The only thing that will take more than 4GB is video or a day's worth of photos.
So there is likely to be some embedded aps that need a > 32-bit address space.
Cost, size or storage capacity are no longer limiting factors.
Am trying to puzzle out what a 64-bit embedded processor should look like.
At the low end, yeah, a simple RISC processor. And support for complex arithmetic
using 32-bit floats? And support for pixel alpha blending using quad 16-bit numbers?
32-bit pointers into the software?
On 08/06/2021 09:39, Don Y wrote:
On 6/7/2021 10:59 PM, David Brown wrote:
8-bit microcontrollers are still far more common than 32-bit devices in
the embedded world (and 4-bit devices are not gone yet). At the other
end, 64-bit devices have been used for a decade or two in some kinds of
embedded systems.
I contend that a good many "32b" implementations are really glorified
8/16b applications that exhausted their memory space.
Sure. Previously you might have used 32 kB flash on an 8-bit device,
now you can use 64 kB flash on a 32-bit device. The point is, you are
/not/ going to find yourself hitting GB limits any time soon. The step
from 8-bit or 16-bit to 32-bit is useful to get a bit more out of the
system - the step from 32-bit to 64-bit is totally pointless for 99.99%
of embedded systems. (Even for most embedded Linux systems, you usually
only have a 64-bit cpu because you want bigger and faster, not because
of memory limitations. It is only when you have a big gui with fast
graphics that 32-bit address space becomes a limitation.)
A 32-bit microcontroller is simply much easier to work with than an
8-bit or 16-bit with "extended" or banked memory to get beyond 64 K
address space limits.
We'll see 64-bit take a greater proportion of the embedded systems that
demand high throughput or processing power (network devices, hard cores
in expensive FPGAs, etc.) where the extra cost in dollars, power,
complexity, board design are not a problem. They will probably become
more common in embedded Linux systems as the core itself is not usually
the biggest part of the cost. And such systems are definitely on the
increase.
But for microcontrollers - which dominate embedded systems - there has
been a lot to gain by going from 8-bit and 16-bit to 32-bit for little
I disagree. The "cost" (barrier) that I see clients facing is the
added complexity of a 32b platform and how it often implies (or even
*requires*) a more formal OS underpinning the application.
Yes, that is definitely a cost in some cases - 32-bit microcontrollers
are usually noticeably more complicated than 8-bit ones. How
significant the cost is depends on the balances of the project between development costs and production costs, and how beneficial the extra functionality can be (like moving from bare metal to RTOS, or supporting networking).
cost. There is almost nothing to gain from a move to 64-bit, but the
cost would be a good deal higher.
Why is the cost "a good deal higher"? Code/data footprints don't
uniformly "double" in size. The CPU doesn't slow down to handle
bigger data.
Some parts of code and data /do/ double in size - but not uniformly, of course. But your chip is bigger, faster, requires more power, has wider buses, needs more advanced memories, has more balls on the package,
requires finer pitched pcb layouts, etc.
In theory, you /could/ make a microcontroller in a 64-pin LQFP and
replace the 72 MHz Cortex-M4 with a 64-bit ARM core at the same clock
speed. The die would only cost two or three times more, and take
perhaps less than 10 times the power for the core. But it would be so utterly pointless that no manufacturer would make such a device.
So a move to 64-bit in practice means moving from a small, cheap, self-contained microcontroller to an embedded PC. Lots of new
possibilities, lots of new costs of all kinds.
Oh, and the cpu /could/ be slower for some tasks - bigger cpus that are optimised for throughput often have poorer latency and more jitter for interrupts and other time-critical features.
So it is not going to happen - at
least not more than a very small and very gradual change.
We got 32b processors NOT because the embedded world cried out for
them but, rather, because of the influence of the 32b desktop world.
We've had 32b processors since the early 80's. But, we've only had
PCs since about the same timeframe! One assumes ubiquity in the
desktop world would need to happen before any real spillover to embedded.
(When the "desktop" was an '11 sitting in a back room, it wasn't seen
as ubiquitous.)
I don't assume there is any direct connection between the desktop world
and the embedded world - the needs are usually very different. There is
a small overlap in the area of embedded devices with good networking and
a gui, where similarity to the desktop world is useful.
We have had 32-bit microcontrollers for decades. I used a 16-bit
Windows system when working with my first 32-bit microcontroller. But
at that time, 32-bit microcontrollers cost a lot more and required more
from the board (external memories, more power, etc.) than 8-bit or
16-bit devices. That has gradually changed with an almost total
disregard for what has happened in the desktop world.
Yes, the embedded world /did/ cry out for 32-bit microcontrollers for an increasing proportion of tasks. We cried many tears when then microcontroller manufacturers offered to give more flash space to their
8-bit devices by having different memory models, banking, far jumps, and
all the other shit that goes with not having a big enough address space.
We cried out when we wanted to have Ethernet and the microcontroller
only had a few KB of ram. I have used maybe 6 or 8 different 32-bit microcontroller processor architectures, and I used them because I
needed them for the task. It's only in the past 5+ years that I have
been using 32-bit microcontrollers for tasks that could be done fine
with 8-bit devices, but the 32-bit devices are smaller, cheaper and
easier to work with than the corresponding 8-bit parts.
In the future, we'll see the 64b *phone* world drive the evolution
of embedded designs, similarly. (do you really need 32b/64b to
make a phone? how much code is actually executing at any given
time and in how many different containers?)
We will see that on devices that are, roughly speaking, tablets -
embedded systems with a good gui, a touchscreen, networking. And that's fine. But these are a tiny proportion of the embedded devices made.
The OP sounds more like a salesman than someone who actually works with
embedded development in reality.
Possibly. Or, just someone that wanted to stir up discussion...
Could be. And there's no harm in that!
Am trying to puzzle out what a 64-bit embedded processor should look like. >> At the low end, yeah, a simple RISC processor. And support for complex
arithmetic
using 32-bit floats? And support for pixel alpha blending using quad 16-bit >> numbers?
32-bit pointers into the software?
The real value in 64 bit integer registers and 64 bit address space is
just that, having an orthogonal "endless" space (well I remember some
30 years ago 32 bits seemed sort of "endless" to me...).
Not needing to assign overlapping logical addresses to anything
can make a big difference to how the OS is done.
32 bit FPU seems useless to me, 64 bit is OK. Although 32 FP
*numbers* can be quite useful for storing/passing data.
Not long ago in a chat with a guy who knew some of ARM 64 bit I gathered there is some real mess with their out of order execution, one needs to
do... hmmmm.. "sync", whatever they call it, all the time and there is
a huge performance cost because of that. Anybody heard anything about
it? (I only know what I was told).
Am trying to puzzle out what a 64-bit embedded processor should look like.
Pretty much all processors except x86 and brain-dead old-fashioned 8-bit
CISC devices are RISC...
... Not all [RISC] are simple.
On 6/8/2021 4:04 AM, David Brown wrote:
On 08/06/2021 09:39, Don Y wrote:
On 6/7/2021 10:59 PM, David Brown wrote:
8-bit microcontrollers are still far more common than 32-bit devices in >>>> the embedded world (and 4-bit devices are not gone yet). At the other >>>> end, 64-bit devices have been used for a decade or two in some kinds of >>>> embedded systems.
I contend that a good many "32b" implementations are really glorified
8/16b applications that exhausted their memory space.
Sure. Previously you might have used 32 kB flash on an 8-bit device,
now you can use 64 kB flash on a 32-bit device. The point is, you are
/not/ going to find yourself hitting GB limits any time soon. The step
I don't see the "problem" with 32b devices as one of address space limits (except devices utilizing VMM with insanely large page sizes). As I said, in my application, task address spaces are really just a handful of pages.
I *do* see (flat) address spaces that find themselves filling up with stack-and-heap-per-task, big chunks set aside for "onboard" I/Os,
*partial* address decoding for offboard I/Os, etc. (i.e., you're
not likely going to fully decode a single address to access a set
of DIP switches as the decode logic is disproportionately high
relative to the functionality it adds)
How often do you see a high-order address line used for kernel/user?
(gee, now your "user" space has been halved)
from 8-bit or 16-bit to 32-bit is useful to get a bit more out of the
system - the step from 32-bit to 64-bit is totally pointless for 99.99%
of embedded systems. (Even for most embedded Linux systems, you usually
only have a 64-bit cpu because you want bigger and faster, not because
of memory limitations. It is only when you have a big gui with fast
graphics that 32-bit address space becomes a limitation.)
You're assuming there has to be some "capacity" value to the 64b move.
You might discover that the ultralow power devices (for phones!)
are being offered in the process geometries targeted for the 64b
devices.
Or, that some integrated peripheral "makes sense" for
phones (but not MCUs targeting motor control applications). Or,
that there are additional power management strategies supported
in the hardware.
In my mind, the distinction brought about by "32b" was more advanced
memory protection/management -- even if not used in a particular application. You simply didn't see these sorts of mechanisms
in 8/16b offerings. Likewise, floating point accelerators. Working
in smaller processors meant you had to spend extra effort to
bullet-proof your code, economize on math operators, etc.
Some parts of code and data /do/ double in size - but not uniformly, of
course. But your chip is bigger, faster, requires more power, has wider
buses, needs more advanced memories, has more balls on the package,
requires finer pitched pcb layouts, etc.
And has been targeted to a market that is EXTREMELY power sensitive (phones!).
We will see that on devices that are, roughly speaking, tablets -
embedded systems with a good gui, a touchscreen, networking. And that's
fine. But these are a tiny proportion of the embedded devices made.
Again, I disagree.
You've already admitted to using 32b processors
where 8b could suffice. What makes you think you won't be using 64b processors when 32b could suffice?
It's just as hard for me to prototype a 64b SoC as it is a 32b SoC.
The boards are essentially the same size. "System" power consumption
is almost identical. Cost is the sole differentiating factor, today.
Possibly. Or, just someone that wanted to stir up discussion...
Could be. And there's no harm in that!
On that, we agree.
Time for ice cream (easiest -- and most enjoyable -- way to lose weight)!
On Tue, 8 Jun 2021 22:11:18 +0200, David Brown
<david.brown@hesbynett.no> wrote:
Pretty much all processors except x86 and brain-dead old-fashioned 8-bit
CISC devices are RISC...
It certainly is correct to say of the x86 that its legacy, programmer visible, instruction set is CISC ... but it is no longer correct to
say that the chip design is CISC.
Since (at least) the Pentium 4 x86 really are a CISC decoder bolted
onto the front of what essentially is a load/store RISC.
"Complex" x86 instructions (in RAM and/or $I cache) are dynamically translated into equivalent short sequences[*] of RISC-like wide format instructions which are what actually is executed. Those sequences
also are stored into a special trace cache in case they will be used
again soon - e.g., in a loop - so they (hopefully) will not have to be translated again.
[*] Actually, a great many x86 instructions map 1:1 to internal RISC instructions - only a small percentage of complex x86 instructions
require "emulation" via a sequence of RISC instructions.
... Not all [RISC] are simple.
Correct. Every successful RISC CPU has supported a suite of complex instructions.
Of course, YMMV.
George
On 6/8/2021 23:18, David Brown wrote:
On 08/06/2021 16:46, Theo wrote:
......
Memory bus/cache width
No, that is not a common way to measure cpu "width", for many reasons.
A chip is likely to have many buses outside the cpu core itself (and the
cache(s) may or may not be considered part of the core). It's common to
have 64-bit wide buses on 32-bit processors, it's also common to have
16-bit external databuses on a microcontroller. And the cache might be
128 bits wide.
I agree with your points and those of Theo, but the cache is basically
as wide as the registers? Logically, that is; a cacheline is several
times that, probably you refer to that.
Not that it makes much of a difference to the fact that 64 bit data buses/registers in an MCU (apart from FPU registers, 32 bit FPUs are
useless to me) are unlikely to attract much interest, nothing of
significance to be gained as you said.
To me 64 bit CPUs are of interest of course and thankfully there are
some available, but this goes somewhat past what we call "embedded".
Not long ago in a chat with a guy who knew some of ARM 64 bit I gathered there is some real mess with their out of order execution, one needs to
do... hmmmm.. "sync", whatever they call it, all the time and there is
a huge performance cost because of that. Anybody heard anything about
it? (I only know what I was told).
from 8-bit or 16-bit to 32-bit is useful to get a bit more out of the
system - the step from 32-bit to 64-bit is totally pointless for 99.99%
of embedded systems. (Even for most embedded Linux systems, you usually >>> only have a 64-bit cpu because you want bigger and faster, not because
of memory limitations. It is only when you have a big gui with fast
graphics that 32-bit address space becomes a limitation.)
You're assuming there has to be some "capacity" value to the 64b move.
I'm trying to establish if there is any value at all in moving to
64-bit. And I have no doubt that for the /great/ majority of embedded systems, it would not.
I don't even see it as having noticeable added value in the solid
majority of embedded Linux systems produced. But in those systems, the
cost is minor or irrelevant once you have a big enough processor.
You might discover that the ultralow power devices (for phones!)
are being offered in the process geometries targeted for the 64b
devices.
Process geometries are not targeted at 64-bit. They are targeted at
smaller, faster and lower dynamic power. In order to produce such a big design as a 64-bit cpu, you'll aim for a minimum level of process sophistication - but that same process can be used for twice as many
32-bit cores, or bigger sram, or graphics accelerators, or whatever else suits the needs of the device.
A major reason you see 64-bit cores in big SOC's is that the die space
is primarily taken up by caches, graphics units, on-board ram,
networking, interfaces, and everything else. Moving the cpu core from
32-bit to 64-bit only increases the die size by a few percent, and for
some tasks it will also increase the the performance of the code by a
small but helpful amount. So it is not uncommon, even if you don't need
the additional address space.
(The other major reason is that for some systems, you want to work with
more than about 2 GB ram, and then life is much easier with 64-bit cores.)
On microcontrollers - say, a random Cortex-M4 or M7 device - changing to
a 64-bit core will increase the die by maybe 30% and give roughly /zero/ performance increase. You don't use 64-bit unless you really need it.
Or, that some integrated peripheral "makes sense" for
phones (but not MCUs targeting motor control applications). Or,
that there are additional power management strategies supported
in the hardware.
In my mind, the distinction brought about by "32b" was more advanced
memory protection/management -- even if not used in a particular
application. You simply didn't see these sorts of mechanisms
in 8/16b offerings. Likewise, floating point accelerators. Working
in smaller processors meant you had to spend extra effort to
bullet-proof your code, economize on math operators, etc.
You need to write correct code regardless of the size of the device. I disagree entirely about memory protection being useful there. This is comp.arch.embedded, not comp.programs.windows (or whatever). An MPU
might make it easier to catch and fix bugs while developing and testing,
but code that hits MPU traps should not leave your workbench.
But you are absolutely right about maths (floating point or integer) -
having 32-bit gives you a lot more freedom and less messing around with scaling back and forth to make things fit and work efficiently in 8-bit
or 16-bit. And if you have floating point hardware (and know how to use
it properly), that opens up new possibilities.
64-bit cores will extend that, but the step is almost negligable in comparison. It would be wrong to say "int32_t is enough for anyone",
but it is /almost/ true. It is certainly true enough that it is not a problem that using "int64_t" takes two instructions instead of one.
Some parts of code and data /do/ double in size - but not uniformly, of
course. But your chip is bigger, faster, requires more power, has wider >>> buses, needs more advanced memories, has more balls on the package,
requires finer pitched pcb layouts, etc.
And has been targeted to a market that is EXTREMELY power sensitive
(phones!).
A phone cpu takes orders of magnitude more power to do the kinds of
tasks that might be typical for a microcontroller cpu - reading sensors, controlling outputs, handling UARTs, SPI and I²C buses, etc. Phone cpus
are optimised for doing the "big phone stuff" efficiently - because
that's what takes the time, and therefore the power.
(I'm snipping because there is far too much here - I have read your
comments, but I'm trying to limit the ones I reply to.)
We will see that on devices that are, roughly speaking, tablets -
embedded systems with a good gui, a touchscreen, networking. And that's >>> fine. But these are a tiny proportion of the embedded devices made.
Again, I disagree.
I assume you are disagreeing about seeing 64-bit cpus only on devices
that need a lot of memory or processing power, rather than disagreeing
that such devices are only a tiny proportion of embedded devices.
You've already admitted to using 32b processors
where 8b could suffice. What makes you think you won't be using 64b
processors when 32b could suffice?
As I have said, I think there will be an increase in the proportion of
64-bit embedded devices - but I think it will be very slow and gradual.
Perhaps in 20 years time 64-bit will be in the place that 32-bit is
now. But it won't happen for a long time.
Why do I use 32-bit microcontrollers where an 8-bit one could do the
job? Well, we mentioned above that you can be freer with the maths.
You can, in general, be freer in the code - and you can use better tools
and languages.
With ARM microcontrollers I can use the latest gcc and
C++ standards - I don't have to program in a weird almost-C dialect
using extensions to get data in flash, or pay thousands for a limited
C++ compiler with last century's standards. I don't have to try and
squeeze things into 8-bit scaled integers, or limit my use of pointers
due to cpu limitations.
And manufacturers make the devices smaller, cheaper, lower power and
faster than 8-bit devices in many cases.
If manufactures made 64-bit devices that are smaller, cheaper and lower
power than the 32-bit ones today, I'd use them. But they would not be
better for the job, or better to work with and better for development in
the way 32-bit devices are better than 8-bit and 16-bit.
It's just as hard for me to prototype a 64b SoC as it is a 32b SoC.
The boards are essentially the same size. "System" power consumption
is almost identical. Cost is the sole differentiating factor, today.
For you, perhaps. Not necessarily for others.
We design, program and manufacture electronics. Production and testing
of simpler cards is cheaper. The pcbs are cheaper. The chips are
cheaper. The mounting is faster. The programming and testing is
faster. You don't mix big, thick tracks and high power on the same
board as tight-packed BGA with blind/buried vias - but you /can/ happily
work with less dense packages on the same board.
If you are talking about replacing one 400-ball SOC with another
400-ball SOC with a 64-bit core instead of a 32-bit core, then it will
make no difference in manufacturing. But if you are talking about
replacing a Cortex-M4 microcontroller with a Cortex-A53 SOC, it /will/
be a lot more expensive in most volumes.
I can't really tell what kinds of designs you are discussing here. When
I talk about embedded systems in general, I mean microcontrollers
running specific programs - not general-purpose computers in embedded
formats (such as phones).
(For very small volumes, the actual physical production costs are a
small proportion of the price, and for very large volumes you have
dedicated machines for the particular board.)
Possibly. Or, just someone that wanted to stir up discussion...
Could be. And there's no harm in that!
On that, we agree.
Time for ice cream (easiest -- and most enjoyable -- way to lose weight)!
I've not heard of that as a dieting method, but I shall give it a try :-)
On 6/8/2021 7:46 AM, Theo wrote:
I think there will be divergence about what people mean by an N-bit system:
Register size
Unit of logical/arithmetical processing
Memory address/pointer size
Memory bus/cache width
(General) Register size is the primary driver.
However, it support 16b operations -- on register PAIRs
(an implicit acknowledgement that the REGISTER is smaller
than the register pair). This is common on many smaller
processors. The address space is 16b -- with a separate 16b
address space for I/Os. The Z180 extends the PHYSICAL
address space to 20b but the logical address space
remains unchanged at 16b (if you want to specify a physical
address, you must use 20+ bits to represent it -- and invoke
a separate mechanism to access it!). The ALU is *4* bits.
But you don't buy MCUs with a-la-carte pricing. How much does an extra
timer cost me? What if I want it to also serve as a *counter*? What
cost for 100K of internal ROM? 200K?
[It would be an interesting exercise to try to do a linear analysis of product prices with an idea of trying to tease out the "costs" (to
the developer) for each feature in EXISTING products!]
Instead, you see a *price* that is reflective of how widely used the
device happens to be, today. You are reliant on the preferences of others
to determine which is the most cost effective product -- for *you*.
Don Y <blockedofcourse@foo.invalid> wrote:
On 6/8/2021 7:46 AM, Theo wrote:
I think there will be divergence about what people mean by an N-bit system: >>>
Register size
Unit of logical/arithmetical processing
Memory address/pointer size
Memory bus/cache width
(General) Register size is the primary driver.
Is it, though? What's driving that?
Why do you want larger registers without a larger ALU width?
I don't think register size is of itself a primary pressure. On larger CPUs with lots of rename or vector registers, they have kilobytes of SRAM to hold the registers, and increasing the size is a cost. On a basic in-order MCU with 16 or 32 registers, is the register width an issue? We aren't
designing them on 10 micron technology any more.
I would expect datapath width to be more critical, but again that's relatively small on an in-order CPU, especially compared with on-chip SRAM.
However, it support 16b operations -- on register PAIRs
(an implicit acknowledgement that the REGISTER is smaller
than the register pair). This is common on many smaller
processors. The address space is 16b -- with a separate 16b
address space for I/Os. The Z180 extends the PHYSICAL
address space to 20b but the logical address space
remains unchanged at 16b (if you want to specify a physical
address, you must use 20+ bits to represent it -- and invoke
a separate mechanism to access it!). The ALU is *4* bits.
This is not really the world of a current 32-bit MCU, which has a 32 bit datapath and 32 bit registers.
Maybe it does 64 bit arithmetic in 32 bit
chunks, which then leads to the question of which MCU workloads require 64 bit arithmetic?
But you don't buy MCUs with a-la-carte pricing. How much does an extra
timer cost me? What if I want it to also serve as a *counter*? What
cost for 100K of internal ROM? 200K?
[It would be an interesting exercise to try to do a linear analysis of
product prices with an idea of trying to tease out the "costs" (to
the developer) for each feature in EXISTING products!]
Instead, you see a *price* that is reflective of how widely used the
device happens to be, today. You are reliant on the preferences of others >> to determine which is the most cost effective product -- for *you*.
Sure, what you buy is a 'highest common denominator' - you get things you don't use, but that other people do. But it still depends on a significant chunk of the market demanding those features.
It's then a cost function of
how much the market wants a feature against how much it'll cost to implement (and at runtime). If the cost is tiny, it may well get implemented even if almost nobody asked for it.
If there's a use case, people will pay for it.
(although maybe not enough)
I can't really tell what kinds of designs you are discussing here. When
I talk about embedded systems in general, I mean microcontrollers
running specific programs - not general-purpose computers in embedded
formats (such as phones).
James Brakefield <jim.brakefield@ieee.org> writes:
Am trying to puzzle out what a 64-bit embedded processor should look like.
Buy yourself a Raspberry Pi 4 and set it up to run your fish tank via a remote web browser. There's your 64 bit embedded system.
4GiB RAM, and those that do so because it makes programmers' lives easier?
David Brown <david.brown@hesbynett.no> writes:
I can't really tell what kinds of designs you are discussing here. When
I talk about embedded systems in general, I mean microcontrollers
running specific programs - not general-purpose computers in embedded
formats (such as phones).
Philip Munts made a comment a while back that stayed with me: that these days, in anything mains powered, there is usually little reason to use
an MCU instead of a Linux board.
Philip Munts made a comment a while back that stayed with me: that these days, in anything mains powered, there is usually little reason to use
an MCU instead of a Linux board.
Buy yourself a Raspberry Pi 4 and set it up to run your fish tank via aI suppose there's a question of what embedded tasks intrinsically require
remote web browser. There's your 64 bit embedded system.
4GiB RAM, and those that do so because it makes programmers' lives easier?
There are obviously plenty of computer systems doing that, but the
question I don't know is what applications can be said to be
'embedded' but need that kind of RAM.
On 6/8/2021 3:01 PM, Dimiter_Popoff wrote:
Am trying to puzzle out what a 64-bit embedded processor should look
like.
At the low end, yeah, a simple RISC processor. And support for
complex arithmetic
using 32-bit floats? And support for pixel alpha blending using quad
16-bit numbers?
32-bit pointers into the software?
The real value in 64 bit integer registers and 64 bit address space is
just that, having an orthogonal "endless" space (well I remember some
30 years ago 32 bits seemed sort of "endless" to me...).
Not needing to assign overlapping logical addresses to anything
can make a big difference to how the OS is done.
That depends on what you expect from the OS. If you are
comfortable with the possibility of bugs propagating between
different subsystems, then you can live with a logical address
space that exactly coincides with a physical address space.
David Brown <david.brown@hesbynett.no> writes:
I can't really tell what kinds of designs you are discussing here. When
I talk about embedded systems in general, I mean microcontrollers
running specific programs - not general-purpose computers in embedded
formats (such as phones).
Philip Munts made a comment a while back that stayed with me: that these days, in anything mains powered, there is usually little reason to use
an MCU instead of a Linux board.
On 08/06/2021 22:39, Dimiter_Popoff wrote:
On 6/8/2021 23:18, David Brown wrote:
On 08/06/2021 16:46, Theo wrote:
......
Memory bus/cache width
No, that is not a common way to measure cpu "width", for many reasons.
A chip is likely to have many buses outside the cpu core itself (and the >>> cache(s) may or may not be considered part of the core). It's common to >>> have 64-bit wide buses on 32-bit processors, it's also common to have
16-bit external databuses on a microcontroller. And the cache might be >>> 128 bits wide.
I agree with your points and those of Theo, but the cache is basically
as wide as the registers? Logically, that is; a cacheline is several
times that, probably you refer to that.
Not that it makes much of a difference to the fact that 64 bit data
buses/registers in an MCU (apart from FPU registers, 32 bit FPUs are
useless to me) are unlikely to attract much interest, nothing of
significance to be gained as you said.
To me 64 bit CPUs are of interest of course and thankfully there are
some available, but this goes somewhat past what we call "embedded".
Not long ago in a chat with a guy who knew some of ARM 64 bit I gathered
there is some real mess with their out of order execution, one needs to
do... hmmmm.. "sync", whatever they call it, all the time and there is
a huge performance cost because of that. Anybody heard anything about
it? (I only know what I was told).
sync instructions of various types can be needed to handle
thread/process synchronisation, atomic accesses, and coordination
between software and hardware registers. Software normally runs with
the idea that it is the only thing running, and the cpu can re-order and re-arrange the instructions and execution as long as it maintains the illusion that the assembly instructions in the current thread are
executed one after the other. These re-arrangements and parallel
execution can give very large performance benefits.
But it also means that when you need to coordinate with other things,
you need syncs, perhaps cache flushes, etc. Full syncs can take
hundreds of cycles to execute on large processors. So you need to distinguish between reads and writes, acquires and releases, syncs on
single addresses or general memory syncs. Big processors are optimised
for throughput, not latency or quick reaction to hardware events.
There are good reasons why big cpus are often paired with a Cortex-M
core in SOCs.
On 6/9/2021 11:59, David Brown wrote:
On 08/06/2021 22:39, Dimiter_Popoff wrote:
On 6/8/2021 23:18, David Brown wrote:
On 08/06/2021 16:46, Theo wrote:
......
Memory bus/cache width
No, that is not a common way to measure cpu "width", for many reasons. >>>> A chip is likely to have many buses outside the cpu core itself (and
the
cache(s) may or may not be considered part of the core). It's
common to
have 64-bit wide buses on 32-bit processors, it's also common to have
16-bit external databuses on a microcontroller. And the cache might be >>>> 128 bits wide.
I agree with your points and those of Theo, but the cache is basically
as wide as the registers? Logically, that is; a cacheline is several
times that, probably you refer to that.
Not that it makes much of a difference to the fact that 64 bit data
buses/registers in an MCU (apart from FPU registers, 32 bit FPUs are
useless to me) are unlikely to attract much interest, nothing of
significance to be gained as you said.
To me 64 bit CPUs are of interest of course and thankfully there are
some available, but this goes somewhat past what we call "embedded".
Not long ago in a chat with a guy who knew some of ARM 64 bit I gathered >>> there is some real mess with their out of order execution, one needs to
do... hmmmm.. "sync", whatever they call it, all the time and there is
a huge performance cost because of that. Anybody heard anything about
it? (I only know what I was told).
sync instructions of various types can be needed to handle
thread/process synchronisation, atomic accesses, and coordination
between software and hardware registers. Software normally runs with
the idea that it is the only thing running, and the cpu can re-order and
re-arrange the instructions and execution as long as it maintains the
illusion that the assembly instructions in the current thread are
executed one after the other. These re-arrangements and parallel
execution can give very large performance benefits.
But it also means that when you need to coordinate with other things,
you need syncs, perhaps cache flushes, etc. Full syncs can take
hundreds of cycles to execute on large processors. So you need to
distinguish between reads and writes, acquires and releases, syncs on
single addresses or general memory syncs. Big processors are optimised
for throughput, not latency or quick reaction to hardware events.
There are good reasons why big cpus are often paired with a Cortex-M
core in SOCs.
Of course I know all that David, I have been using power processors
which do things out of order for over 20 years now.
What I was told was something about a real mess, like system memory
accesses getting wrong because of out of order execution hence
plenty of syncs needed to keep the thing working. I have not
even tried to verify that, only someone with experience with 64 bit
ARM can do that - so far none here seems to have that.
Paul Rubin wrote:
David Brown <david.brown@hesbynett.no> writes:
I can't really tell what kinds of designs you are discussing here. When >>> I talk about embedded systems in general, I mean microcontrollers
running specific programs - not general-purpose computers in embedded
formats (such as phones).
Philip Munts made a comment a while back that stayed with me: that these
days, in anything mains powered, there is usually little reason to use
an MCU instead of a Linux board.
Except that if it has a network connection, you have to patch it
unendingly or suffer the common-as-dirt IoT security nightmares.
Cheers
Phil Hobbs
On 09/06/2021 20:00, Dimiter_Popoff wrote:
On 6/9/2021 11:59, David Brown wrote:
On 08/06/2021 22:39, Dimiter_Popoff wrote:
On 6/8/2021 23:18, David Brown wrote:
On 08/06/2021 16:46, Theo wrote:
......
Memory bus/cache width
No, that is not a common way to measure cpu "width", for many reasons. >>>>> A chip is likely to have many buses outside the cpu core itself (and >>>>> the
cache(s) may or may not be considered part of the core). It's
common to
have 64-bit wide buses on 32-bit processors, it's also common to have >>>>> 16-bit external databuses on a microcontroller. And the cache might be >>>>> 128 bits wide.
I agree with your points and those of Theo, but the cache is basically >>>> as wide as the registers? Logically, that is; a cacheline is several
times that, probably you refer to that.
Not that it makes much of a difference to the fact that 64 bit data
buses/registers in an MCU (apart from FPU registers, 32 bit FPUs are
useless to me) are unlikely to attract much interest, nothing of
significance to be gained as you said.
To me 64 bit CPUs are of interest of course and thankfully there are
some available, but this goes somewhat past what we call "embedded". >>>> Not long ago in a chat with a guy who knew some of ARM 64 bit I gathered >>>> there is some real mess with their out of order execution, one needs to >>>> do... hmmmm.. "sync", whatever they call it, all the time and there is >>>> a huge performance cost because of that. Anybody heard anything about
it? (I only know what I was told).
sync instructions of various types can be needed to handle
thread/process synchronisation, atomic accesses, and coordination
between software and hardware registers. Software normally runs with
the idea that it is the only thing running, and the cpu can re-order and >>> re-arrange the instructions and execution as long as it maintains the
illusion that the assembly instructions in the current thread are
executed one after the other. These re-arrangements and parallel
execution can give very large performance benefits.
But it also means that when you need to coordinate with other things,
you need syncs, perhaps cache flushes, etc. Full syncs can take
hundreds of cycles to execute on large processors. So you need to
distinguish between reads and writes, acquires and releases, syncs on
single addresses or general memory syncs. Big processors are optimised >>> for throughput, not latency or quick reaction to hardware events.
There are good reasons why big cpus are often paired with a Cortex-M
core in SOCs.
Of course I know all that David, I have been using power processors
which do things out of order for over 20 years now.
It depends on the actual PPC's in question - with single core devices targeted for embedded systems, you don't need much of that at all.
What I was told was something about a real mess, like system memory
accesses getting wrong because of out of order execution hence
plenty of syncs needed to keep the thing working. I have not
even tried to verify that, only someone with experience with 64 bit
ARM can do that - so far none here seems to have that.
If the person programming the device has made incorrect assumptions, or incorrect setup, then yes, things can go wrong if something other than
the current core is affected by the reads or writes.
But if you're using a RasPi or Beaglebone or something like that, you
need a reasonably well-upholstered Linux distro, which has to be
patched regularly. At very least it'll need a kernel, and kernel
patches affecting security are not exactly rare.
Dimiter_Popoff wrote:
On 6/9/2021 20:44, Phil Hobbs wrote:
Paul Rubin wrote:
David Brown <david.brown@hesbynett.no> writes:
I can't really tell what kinds of designs you are discussing here.
When
I talk about embedded systems in general, I mean microcontrollers
running specific programs - not general-purpose computers in embedded >>>>> formats (such as phones).
Philip Munts made a comment a while back that stayed with me: that
these
days, in anything mains powered, there is usually little reason to use >>>> an MCU instead of a Linux board.
Except that if it has a network connection, you have to patch it
unendingly or suffer the common-as-dirt IoT security nightmares.
Those nightmares do not apply if you are in complete control of your
firmware - which few people are nowadays indeed.
I have had netMCA devices on the net for over 10 years now in many
countries, the worst problem I have seen was some Chinese IP hanging
on port 80 to no consequences.
But if you're using a RasPi or Beaglebone or something like that, you
need a reasonably well-upholstered Linux distro, which has to be patched regularly. At very least it'll need a kernel, and kernel patches
affecting security are not exactly rare.
Cheers
Phil Hobbs
On 6/9/2021 20:44, Phil Hobbs wrote:
Paul Rubin wrote:
David Brown <david.brown@hesbynett.no> writes:
I can't really tell what kinds of designs you are discussing here.
When
I talk about embedded systems in general, I mean microcontrollers
running specific programs - not general-purpose computers in embedded
formats (such as phones).
Philip Munts made a comment a while back that stayed with me: that these >>> days, in anything mains powered, there is usually little reason to use
an MCU instead of a Linux board.
Except that if it has a network connection, you have to patch it
unendingly or suffer the common-as-dirt IoT security nightmares.
Those nightmares do not apply if you are in complete control of your
firmware - which few people are nowadays indeed.
I have had netMCA devices on the net for over 10 years now in many
countries, the worst problem I have seen was some Chinese IP hanging
on port 80 to no consequences.
On 09/06/2021 06:16, George Neuner wrote:
Since (at least) the Pentium 4 x86 really are a CISC decoder bolted
onto the front of what essentially is a load/store RISC.
Absolutely. But from the user viewpoint, it is the ISA that matters -
Theo <theom+news@chiark.greenend.org.uk> writes:
Buy yourself a Raspberry Pi 4 and set it up to run your fish tank via aI suppose there's a question of what embedded tasks intrinsically require >>> 4GiB RAM, and those that do so because it makes programmers' lives easier?
remote web browser. There's your 64 bit embedded system.
You can buy a Raspberry Pi 4 with up to 8gb of ram, but the most common configuration is 2gb. The cpu is 64 bit anyway because why not?
There are obviously plenty of computer systems doing that, but the
question I don't know is what applications can be said to be
'embedded' but need that kind of RAM.
Lots of stuff is using 32 bit cpus with a few KB of ram these days. 32
bits is displacing 8 bits in the MCU world.
Is 64 bit displacing 32 bit in application processors like the Raspberry
Pi, even when less than 4GB of ram is involved? I think yes, at least
to some extent, and it will continue. My fairly low end mobile phone
has 2GB of ram and a 64 bit 4-core processor, I think.
Will 64 bit MCU's displace 32 bit MCUs? I don't know, maybe not.
Are application processors displacing MCU's in embedded systems? Not
much in portable and wearable stuff (other than phones) at least for
now, but in larger devices I think yes, at least somewhat for now, and probably more going forward. Even if you're not using networking, it
makes software and UI development a heck of a lot easier.
Phil Hobbs <pcdhSpamMeSenseless@electrooptical.net> writes:
But if you're using a RasPi or Beaglebone or something like that, you
need a reasonably well-upholstered Linux distro, which has to be
patched regularly. At very least it'll need a kernel, and kernel
patches affecting security are not exactly rare.
You're in the same situation with almost anything else connected to the internet. Think of the notorious "smart light bulbs".
On the other hand, you are in reasonable shape if the raspberry pi
running your fish tank is only reachable through a LAN or VPN.
Non-networked low end linux boards are also a thing.
Paul Rubin <no.email@nospam.invalid> wrote:
James Brakefield <jim.brakefield@ieee.org> writes:
Am trying to puzzle out what a 64-bit embedded processor should look like. >>Buy yourself a Raspberry Pi 4 and set it up to run your fish tank via a
remote web browser. There's your 64 bit embedded system.
I suppose there's a question of what embedded tasks intrinsically require
4GiB RAM, and those that do so because it makes programmers' lives easier?
In other words, you /can/ write a function to detect if your fish tank is
hot or cold in Javascript that runs in a web app on top of Chromium on top
of Linux. Or you could make it out of a 6502, or a pair of logic gates.
That's complexity that's not fundamental to the application. OTOH maintaining a database that's larger than 4GB physically won't work without that amount of memory (or storage, etc).
There are obviously plenty of computer systems doing that, but the question
I don't know is what applications can be said to be 'embedded' but need that kind of RAM.
On 6/9/2021 4:29, Don Y wrote:
On 6/8/2021 3:01 PM, Dimiter_Popoff wrote:
Am trying to puzzle out what a 64-bit embedded processor should look like. >>>> At the low end, yeah, a simple RISC processor. And support for complex >>>> arithmetic
using 32-bit floats? And support for pixel alpha blending using quad
16-bit numbers?
32-bit pointers into the software?
The real value in 64 bit integer registers and 64 bit address space is
just that, having an orthogonal "endless" space (well I remember some
30 years ago 32 bits seemed sort of "endless" to me...).
Not needing to assign overlapping logical addresses to anything
can make a big difference to how the OS is done.
That depends on what you expect from the OS. If you are
comfortable with the possibility of bugs propagating between
different subsystems, then you can live with a logical address
space that exactly coincides with a physical address space.
So how does the linear 64 bt address space get in the way of
any protection you want to implement? Pages are still 4 k and
each has its own protection attributes governed by the OS,
it is like that with 32 bit processors as well (I talk power, I am
not interested in half baked stuff like ARM, risc-v etc., I don't
know if there could be a problem like that with one of these).
There is *nothing* to gain on a 64 bit machine from segmentation, assigning overlapping address spaces to tasks etc.
Notice I am talking *logical* addresses, I was explicit about
that.
Am 09.06.2021 um 10:40 schrieb David Brown:
On 09/06/2021 06:16, George Neuner wrote:
Since (at least) the Pentium 4 x86 really are a CISC decoder bolted
onto the front of what essentially is a load/store RISC.
... and at about that time they also abandoned the last traces of their original von-Neumann architecture. The actual core is quite strictly Harvard now, treating the external RAM banks more like mass storage
devices than an actual combined code+data memory.
Absolutely. But from the user viewpoint, it is the ISA that matters -
That depends rather a lot on who gets to be called the "user".
On 6/9/2021 10:56 AM, Dimiter_Popoff wrote:
On 6/9/2021 4:29, Don Y wrote:
On 6/8/2021 3:01 PM, Dimiter_Popoff wrote:
Am trying to puzzle out what a 64-bit embedded processor should
look like.
At the low end, yeah, a simple RISC processor. And support for
complex arithmetic
using 32-bit floats? And support for pixel alpha blending using
quad 16-bit numbers?
32-bit pointers into the software?
The real value in 64 bit integer registers and 64 bit address space is >>>> just that, having an orthogonal "endless" space (well I remember some
30 years ago 32 bits seemed sort of "endless" to me...).
Not needing to assign overlapping logical addresses to anything
can make a big difference to how the OS is done.
That depends on what you expect from the OS. If you are
comfortable with the possibility of bugs propagating between
different subsystems, then you can live with a logical address
space that exactly coincides with a physical address space.
So how does the linear 64 bt address space get in the way of
any protection you want to implement? Pages are still 4 k and
each has its own protection attributes governed by the OS,
it is like that with 32 bit processors as well (I talk power, I am
not interested in half baked stuff like ARM, risc-v etc., I don't
know if there could be a problem like that with one of these).
With a linear address space, you typically have to link EVERYTHING
as a single image to place each thing in its own piece of memory
(or use segment based addressing).
I can share code between tasks without conflicting addressing;
the "data" for one instance of the app is isolated from other
instances while the code is untouched -- the code doesn't even
need to know that it is being invoked on different "data"
from one timeslice to the next. In a flat address space,
you'd need the equivalent of a "context pointer" that you'd
have to pass to the "shared code". And, have to hope that
all of your context could be represented in a single such
reference! (I can rearrange physical pages so they each
appear "where expected" to a bit of const CODE).
Similarly, the data passed (or shared) from one task (process) to
another can "appear" at entirely different logical addresses
"at the same time" as befitting the needs of each task WITHOUT
CONCERN (or awareness) of the existence of the other task.
Again, I don't need to pass a pointer to the data; the address
space has been manipulated to make sure it's where it should be.
The needs of a task can be met by resources "harvested" from
some other task. E.g., where is the stack for your TaskA?
How large is it? How much of it is in-use *now*? How much
can it GROW before it bumps into something (because that something
occupies space in "its" address space).
I start a task (thread) with a single page of stack. And, a
limit on how much it is allowed to consume during its execution.
Then, when it pushes something "off the end" of that page,
I fault a new page in and map it at the faulting address.
This continues as the task's stack needs grow.
When I run out of available pages, I do a GC cycle to
reclaim pages from (other?) tasks that are no longer using
them.
In this way, I can effectively SHARE a stack (or heap)
between multiple tasks -- without having to give any
consideration for where, in memory, they (or the stacks!)
reside.
I can move a page from one task (full of data) to another
task at some place that the destination task finds "convenient".
I can import a page from another network device or export
one *to* another device.
Because each task's address space is effectively empty/sparse,
mapping a page doesn't require much effort to find a "free"
place for it.
Not needing to assign overlapping logical addresses to anything
can make a big difference to how the OS is done.
That depends on what you expect from the OS. If you are
comfortable with the possibility of bugs propagating between
different subsystems, then you can live with a logical address
space that exactly coincides with a physical address space.
So how does the linear 64 bt address space get in the way of
any protection you want to implement? Pages are still 4 k and
each has its own protection attributes governed by the OS,
it is like that with 32 bit processors as well (I talk power, I am
not interested in half baked stuff like ARM, risc-v etc., I don't
know if there could be a problem like that with one of these).
With a linear address space, you typically have to link EVERYTHING
as a single image to place each thing in its own piece of memory
(or use segment based addressing).
Nothing could be further from the truth. What kind of crippled
environment can make you think that? Code can be position
independent on processors which are not dead by design nowadays.
When I started dps some 27 years ago I allowed program modules
to demand a fixed address on which they would reside. This exists
to this day and has been used 0 (zero) times. Same about object
descriptors, program library modules etc., the first system call
I wrote is called "allocm$", allocate memory. You request a number
of bytes and you get back an address and the actual number of
bytes you were given (it comes rounded by the memory cluster
size, typically 4k (a page). This was the *first* thing I did.
And yes, all allocation is done using worst fit strategy, sometimes
enhanced worst fit - things the now popular OS-s have yet to get to,
they still have to defragment their disks, LOL.
I can share code between tasks without conflicting addressing;
the "data" for one instance of the app is isolated from other
instances while the code is untouched -- the code doesn't even
need to know that it is being invoked on different "data"
from one timeslice to the next. In a flat address space,
you'd need the equivalent of a "context pointer" that you'd
have to pass to the "shared code". And, have to hope that
all of your context could be represented in a single such
reference! (I can rearrange physical pages so they each
appear "where expected" to a bit of const CODE).
Similarly, the data passed (or shared) from one task (process) to
another can "appear" at entirely different logical addresses
"at the same time" as befitting the needs of each task WITHOUT
CONCERN (or awareness) of the existence of the other task.
Again, I don't need to pass a pointer to the data; the address
space has been manipulated to make sure it's where it should be.
So how do you pass the offset from the page beginning if you do
not pass an address.
And how is page manipulation simpler and/or safer than just passing
an address, sounds like a recipe for quite a mess to me.
In a 64 bit address space there is nothing stopping you to
pass addresses or not passing them and allow access to areas
you want to and disallow it elsewhere.
Other than that there is nothing to be gained by a 64 bit architecture really, on 32 bit machines you do have FPUs, vector units etc.
doing calculation probably faster than the integer unit of a
64 bit processor.
The *whole point* of a 64 bit core is the 64 bit address space.
The needs of a task can be met by resources "harvested" from
some other task. E.g., where is the stack for your TaskA?
How large is it? How much of it is in-use *now*? How much
can it GROW before it bumps into something (because that something
occupies space in "its" address space).
This is the beauty of 64 bit logical address space. You allocate
enough logical memory and then you allocate physical on demand,
this is what MMUs are there for. If you want to grow your stack
indefinitely - the messy C style - you can just allocate it
a few gigabytes of logical memory and use the first few kilobytes
of it to no waste of resources. Of course there are much slicker
ways to deal with memory allocation.
I start a task (thread) with a single page of stack. And, a
limit on how much it is allowed to consume during its execution.
Then, when it pushes something "off the end" of that page,
I fault a new page in and map it at the faulting address.
This continues as the task's stack needs grow.
This is called "allocate on demand" and has been around
for times immemorial, check my former paragraph.
When I run out of available pages, I do a GC cycle to
reclaim pages from (other?) tasks that are no longer using
them.
This is called "memory swapping", also for times immemorial.
For the case when there is no physical memory to reclaim, that
is.
The first version of dps - some decades ago - ran on a CPU32
(a 68340). It had no MMU so I implemented "memory blocks",
a task can declare a piece a swap-able block and allow/disallow
its swapping. Those blocks would then be shared or written to disk when
more memory was needed etc., memory swapping without an MMU.
Worked fine, must be still working for code I have not
touched since on my power machines, all those decades later.
In this way, I can effectively SHARE a stack (or heap)
between multiple tasks -- without having to give any
consideration for where, in memory, they (or the stacks!)
reside.
You can do this in a linear address space, too - this is what
the MMU is for.
I can move a page from one task (full of data) to another
task at some place that the destination task finds "convenient".
I can import a page from another network device or export
one *to* another device.
So instead of simply passing an address you have to switch page
translation entries, adjust them on each task switch, flush and
sync whatever it takes - does not sound very efficient to me.
Because each task's address space is effectively empty/sparse,
mapping a page doesn't require much effort to find a "free"
place for it.
This is the beauty of having the 64 bit address space, you always
have enough logical memory. The "64 bit address space per task"
buys you *nothing*.
On 6/10/2021 3:45 AM, Dimiter_Popoff wrote:
[attrs elided]
On 6/10/2021 16:55, Don Y wrote:
On 6/10/2021 3:45 AM, Dimiter_Popoff wrote:
[attrs elided]
Don, this becomes way too lengthy and repeating itself.
You keep on saying that a linear 64 bit address space means exposing everything to everybody after I explained this is not true at all.
You keep on claiming this or that about how I do things without
bothering to understand what I said - like your claim that I use the MMU
for "protection only".
NO, this is not true either. On 32 bit machines - as mine in
production are - mapping 4G logical space into say 128M of physical
memory goes all the way through page translation, block translation
for regions where page translation would be impractical etc.
You sound the way I would have sounded before I had written and
built on for years what is now dps. The devil is in the detail :-).
You pass "objects", pages etc. Well guess what, it *always* boils
down to an *address* for the CPU. The rest is generic talk.
And if you choose to have overlapping address spaces when you
pass a pointer from one task to another the OS has to deal with this
at a significant cost.
In a linear address space, you pass the pointer *as is* so the OS does
not have to deal with anything except access restrictions.
In dps, you can send a message to another task - the message being
data the OS will copy into that tasks memory, the data being
perfectly able to be an address of something in another task's
memory. If a task accesses an address it is not supposed to
the user is notified and allowed to press CR to kill that task.
Then there are common data sections for groups of tasks etc.,
it is pretty huge really.
The concept "one entire address space to all tasks" is from the 60-s
if not earlier (I just don't know and don't care to check now) and it
has done a good job while it was necessary, mostly on 16 bit CPUs.
For today's processors this means just making them run with the
handbrake on, *nothing* is gained because of that - no more security
(please don't repeat that "expose everything" nonsense), just
burning more CPU power, constantly having to remap addresses etc.
On 6/10/2021 8:32 AM, Dimiter_Popoff wrote:
On 6/10/2021 16:55, Don Y wrote:
On 6/10/2021 3:45 AM, Dimiter_Popoff wrote:Don, this becomes way too lengthy and repeating itself.
[attrs elided]
;
You keep on saying that a linear 64 bit address space means exposing
everything to everybody after I explained this is not true at all.
Task A has built a structure -- a page worth of data residing
at 0x123456. It wants to pass this to TaskB so that TaskB can perform
some operations on it.
Can TaskB acccess the data at 0x123456 *before* TaskA has told it
to do so >
Can TaskB access the data at 0x123456 WHILE TaskA is manipulating it?
Can TaskA alter the data at 0x123456 *after* it has "passed it along"
to TaskB -- possibly while TaskB is still using it?
You keep on claiming this or that about how I do things without
bothering to understand what I said - like your claim that I use the MMU
for "protection only".
I didn't say that YOU did that. I said that to be able to ignore
the MMU after setting it up, you can ONLY use it to protect
code from alteration, data from execution, etc. The "permissions"
that it applies have to be invariant over the execution time of
ALL of the code.
So, if you DON'T use it "for protection only", then you are admitting
to having to dynamically tweek it.
*THIS* is the cost that the OS incurs -- and having a flat address
space doesn't make it any easier! If you aren't incurring that cost,
then you're not protecting something.
NO, this is not true either. On 32 bit machines - as mine in
production are - mapping 4G logical space into say 128M of physical
memory goes all the way through page translation, block translation
for regions where page translation would be impractical etc.
You sound the way I would have sounded before I had written and
built on for years what is now dps. The devil is in the detail :-).
You pass "objects", pages etc. Well guess what, it *always* boils
down to an *address* for the CPU. The rest is generic talk.
Yes, the question is "who manages the protocol for sharing".
Since forever, you could pass pointers around and let anyone
access anything they wanted. You could impose -- but not
ENFORCE -- schemes that ensured data was shared properly
(e.g., so YOU wouldn't be altering data that *I* was using).
[Monitors can provide some structure to that sharing but
are costly when you consider the number of things that may
potentially need to be shared. And, you can still poke
directly at the data being shared, bypassing the monitor,
if you want to (or have a bug)]
But, you had to rely on programming discipline to ensure this
worked. Just like you have to rely on discipline to ensure
code is "bugfree" (how's that worked for the industry?)
And if you choose to have overlapping address spaces when you
pass a pointer from one task to another the OS has to deal with this
at a significant cost.
How does your system handle the above example? How do you "pass" the pointer from TaskA to TaskB -- if not via the OS? Do you expose a
shared memory region that both tasks can use to exchange data
and hope they follow some rules? Always use synchronization
primitives for each data exchange? RELY on the developer to
get it right? ALWAYS?
Once you've passed the pointer, how does TaskB access that data
WITHOUT having to update the MMU? Or, has TaskB had access to
the data all along?
What happens when B wants to pass the modified data to C?
Does the MMU have to be updated (C's tables) to grant that
access? Or, like B, has C had access all along? And, has
C had to remain disciplined enough not to go mucking around
with that region of memory until A *and* B have done modifying
it?
I don't allow anyone to see anything -- until the owner of that thing explicitly grants access. If you try to access something before it's
been made available for your access, the OS traps and aborts your
process -- you've violated the discipline and the OS is going to
enforce it! In an orderly manner that doesn't penalize other
tasks that have behaved properly.
In a linear address space, you pass the pointer *as is* so the OS does
not have to deal with anything except access restrictions.
In dps, you can send a message to another task - the message being
data the OS will copy into that tasks memory, the data being
perfectly able to be an address of something in another task's
So, you don't use the MMU to protect TaskA's resources from TaskB
(or TaskC!) access. You expect LESS from your OS.
memory. If a task accesses an address it is not supposed to
the user is notified and allowed to press CR to kill that task.
What are the addresses "it's not supposed to?" Some *subset* of
the addresses that "belong" to other tasks? Perhaps I can
access a buffer that belongs to TaskB but not TaskB's code > Or, some OTHER buffer that TaskB doesn't want me to see? Do
you explicitly have to locate ("org") each buffer so that you
can place SOME in protected portions of the address space and
others in shared areas? How do you change these distinctions
dynamically -- or, do you do a lot of data copying from
"protected" space to "shared" space?
Then there are common data sections for groups of tasks etc.,
it is pretty huge really.
Again, you expose things by default -- even if only a subset
of things. You create shared memory regions where there are
no protections and then rely on your application to behave and
not access data (that has been exposed for its access) until
it *should*.
Everybody does this. And everyone has bugs as a result. You
are relying on the developer to *repeatedly* implement the sharing
protocol -- instead of relying on the OS to enforce that for you.
On 6/11/2021 0:09, Don Y wrote:
On 6/10/2021 8:32 AM, Dimiter_Popoff wrote:
On 6/10/2021 16:55, Don Y wrote:
On 6/10/2021 3:45 AM, Dimiter_Popoff wrote:Don, this becomes way too lengthy and repeating itself.
[attrs elided]
You keep on saying that a linear 64 bit address space means exposing
everything to everybody after I explained this is not true at all.
Task A has built a structure -- a page worth of data residing
at 0x123456. It wants to pass this to TaskB so that TaskB can perform
some operations on it.
Can TaskB acccess the data at 0x123456 *before* TaskA has told it
to do so >
Can TaskB access the data at 0x123456 WHILE TaskA is manipulating it?
Can TaskA alter the data at 0x123456 *after* it has "passed it along"
to TaskB -- possibly while TaskB is still using it?
If task A does not want any of the above it just places them in a
page to which it only has access. Or it can allow read access only.
*Why* do you confuse this with linear address space? What does the
one have to do with the other?
Why would you want to protect regions you don't want protected?
...
As I tease more of your design out of you, it becomes apparent why
you "need" a flat address space. You push much of the responsibility
for managing the environment into the developer's hands. *He* decides
which regions of memory to share. He talks to the MMU (even if through
an API). He directly retrieves values from other tasks. Etc.
On 6/11/2021 7:55, Don Y wrote:
...
As I tease more of your design out of you, it becomes apparent why
you "need" a flat address space. You push much of the responsibility
for managing the environment into the developer's hands. *He* decides
which regions of memory to share. He talks to the MMU (even if through
an API). He directly retrieves values from other tasks. Etc.
It is not true that the developer is in control of all that. Messaging
from one task to another goes through a system call.
Anyway, I am not interested in discussing dps here/now.
The *only* thing I would like you to answer me is why you think
a linear 64 bit address space can add vulnerability to a design.
On 6/11/2021 4:14 AM, Dimiter_Popoff wrote:
On 6/11/2021 7:55, Don Y wrote:
...
As I tease more of your design out of you, it becomes apparent why
you "need" a flat address space. You push much of the responsibility
for managing the environment into the developer's hands. *He* decides
which regions of memory to share. He talks to the MMU (even if through >>> an API). He directly retrieves values from other tasks. Etc.
It is not true that the developer is in control of all that. Messaging
from one task to another goes through a system call.
But the client directly retrieves the values. The OS doesn't provide
them (at least, that's what you said previously)
Anyway, I am not interested in discussing dps here/now.
The *only* thing I would like you to answer me is why you think
a linear 64 bit address space can add vulnerability to a design.
Please tell me where I said it -- in and of itself -- makes a
design vulnerable?
Dimiter_Popoff wrote:
The real value in 64 bit integer registers and 64 bit address space is >>>> just that, having an orthogonal "endless" space (well I remember some
30 years ago 32 bits seemed sort of "endless" to me...).
Not needing to assign overlapping logical addresses to anything
can make a big difference to how the OS is done.
That depends on what you expect from the OS. If you are
comfortable with the possibility of bugs propagating between
different subsystems, then you can live with a logical address
space that exactly coincides with a physical address space.
So how does the linear 64 bt address space get in the way of
any protection you want to implement? Pages are still 4 k and
each has its own protection attributes governed by the OS,
it is like that with 32 bit processors as well (I talk power, I am
not interested in half baked stuff like ARM, risc-v etc., I don't
know if there could be a problem like that with one of these).
With a linear address space, you typically have to link EVERYTHING
as a single image to place each thing in its own piece of memory
(or use segment based addressing).
HOW any aspect of an MCU is *used* is the cause of vulnerability;
to internal bugs, external threats, etc. The more stuff that's exposed,
the more places fault can creep into a design. It's why we litter code
with invariants, check for the validity of input parameters, etc.
Every interface is a potential for a fault; and an *opportunity*
to bolster your confidence in the design (by verifying the interfaces
are being used correctly!)
[Do you think all of these ransomware attacks we hear of are
the result of developers being INCREDIBLY stupid? Or, just
"not paranoid enough"??]
Turning off an MMU (when you have one available) is obviously
putting you in a more "exposed" position than correctly
*using* it (all else being equal). Unless, of course, you
don't have the skills to use it properly.
There are firewire implementations that actually let the external
peripheral DMA directly into the host's memory. Any fault in the implementation *inside* the host obviously exposes the internals
of the system to an external agent. Can you be 100.0% sure that
the device you're plugging in (likely sold with your type of
computer in mind and, thus, aware of what's where, inside!) is
benign?
<https://en.wikipedia.org/wiki/DMA_attack>
Is there anything *inherently* wrong with DMA? Or Firewire? No.
Do they create the potential for a VULNERABILITY in a system? Yes.
The vulnerability is a result of how they are *used*.
My protecting-everything-from-everything-else is intended to eliminate unanticipated attack vectors before a hostile actor (third party
software or external agent) can discover an exploit. Or, a latent
bug can compromise the proper operation of the system. It's why I
*don't* have any global namespaces (if you can't NAME something,
then you can't ACCESS it -- even if you KNOW it exists, somewhere; controlling the names you can see controls the things you can access)
It's why I require you to have a valid "Handle" to every object with
which you want to interact; if you don't have a handle to the
object, then you can't talk to it. You can't consume it's resources
or try to exploit vulnerabilities that may be present. Or, just plain
ask it (mistakenly) to do something incorrect!
It's why I don't let you invoke EVERY method on a particular object,
even if you have a valid handle! Because you don't need to be ABLE
to do something that you don't NEED to do! Attempting to do so
is indicative of either a bug (because you didn't declare a need
to access that method when you were installed!) or an attempted
exploit. In either case, there is no value to letting you continue
with a simple error message.
<https://en.wikipedia.org/wiki/Principle_of_least_privilege>
It's why each object can decide to *sever* your "legitimate" connection
to any of it's Interfaces if it doesn't like what you are doing
or asking it to do. "Too bad, so sad. Take it up with Management!
And, no, we won't be letting you get restarted cuz we know there's
something unhealthy about you!"
It's why access controls are applied on the *client* side of
a transaction instead of requiring the server/object to make
that determination (like some other capability-based systems).
Because any server-side activities consume the server's
resources, even if it will ultimately deny your request
(move the denial into YOUR resources)
It's why I enforce quotas on the resources you can consume -- or
have others consume for your *benefit* -- so an application's
(task) "load" on the system can be constrained.
If you want to put staff in place to vet each third party application
before "offering it in your store", then you have to assume that
overhead -- and HOPE you catch any malevolent/buggy actors before
folks install those apps. I think that's the wrong approach as
it requires a sizeable effort to test/validate any submitted
application "thoroughly" (you end up doing the developer's work
FOR him!)
Note that bugs also exist, even in the absence of "bad intent".
Should they be allowed to bring down your product/system? Or,
should their problems be constrained to THEIR demise??
[I'm assuming your MCA has the ability to "print" hardcopy
of <whatever>. Would it be acceptable if a bug in your print
service brought down the instrument? This *session*?
Silently corrupted the data that it was asked to print?]
ANYTHING (and EVERYTHING) that I can do to make my system more robust
is worth the effort. Hardware is cheap (relatively speaking).
Debugging time is VERY costly. And, "user inconvenience/aggravation"
is *outrageously* expensive! I let the OS "emulate" features that
I wished existed in the silicon -- because, there, they would
likely be less expensive to utilize (time, resources)
This is especially true in my alpha site application. Imagine being
blind, deaf, wheelchair confined, paralyzed/amputee, early onset
altzheimers, or "just plain old", etc. and having to deal with something
that is misbehaving ALL AROUND YOU (because it pervades your home environment). It was intended to *facilitate* your continued presence
in YOUR home, delaying your transfer to an a$$i$ted care facility.
Now, it's making life "very difficult"!
"Average Joes" get pissed off when their PC misbehaves.
Imagine your garage door opening in the middle of the night.
Or, the stereo turns on -- loud -- while you're on the phone.
Or, the phone hangs up mid conversation.
Or, the wrong audio stream accompanies a movie you're viewing.
Or, a visitor is announced at the front door, but noone is there!
Or, the coffee maker turned on too early and your morning coffee is mud.
Or, the heat turns on midafternoon on a summer day.
Or, the garage door closes on your vehicle as you are exiting.
Or, your bedside alarm goes off at 3AM.
How long will you wait for "repair" in that sort of environment?
When are you overwhelmed by the technology (that is supposed to be
INVISIBLE) coupled with your current condition -- and just throw
in the towel?
YOU can sell a spare MCA to a customer who wants to minimize his
downtime "at any cost". Should I make "spare houses" available?
Maybe deeply discounted?? :<
What about spare factories??
On 6/11/2021 15:10, Don Y wrote:
On 6/11/2021 4:14 AM, Dimiter_Popoff wrote:
On 6/11/2021 7:55, Don Y wrote:
...
As I tease more of your design out of you, it becomes apparent why
you "need" a flat address space. You push much of the responsibility
for managing the environment into the developer's hands. *He* decides >>>> which regions of memory to share. He talks to the MMU (even if through >>>> an API). He directly retrieves values from other tasks. Etc.
It is not true that the developer is in control of all that. Messaging
from one task to another goes through a system call.
But the client directly retrieves the values. The OS doesn't provide
them (at least, that's what you said previously)
I am not sure what this means. The recipient task has advertised a field where messages can be queued, the sending task makes a system call designating the message and which task is to receive it; during that
call execution the message is written into the memory of the recipient.
Then at some point later the recipient can see that and process the
message. What more do you need?
Anyway, I am not interested in discussing dps here/now.
The *only* thing I would like you to answer me is why you think
a linear 64 bit address space can add vulnerability to a design.
Please tell me where I said it -- in and of itself -- makes a
design vulnerable?
This is how the exchange started:
Dimiter_Popoff wrote:
The real value in 64 bit integer registers and 64 bit address space is >>>>> just that, having an orthogonal "endless" space (well I remember some >>>>> 30 years ago 32 bits seemed sort of "endless" to me...).
Not needing to assign overlapping logical addresses to anything
can make a big difference to how the OS is done.
That depends on what you expect from the OS. If you are
comfortable with the possibility of bugs propagating between
different subsystems, then you can live with a logical address
space that exactly coincides with a physical address space.
So how does the linear 64 bt address space get in the way of
any protection you want to implement? Pages are still 4 k and
each has its own protection attributes governed by the OS,
it is like that with 32 bit processors as well (I talk power, I am
not interested in half baked stuff like ARM, risc-v etc., I don't
know if there could be a problem like that with one of these).
With a linear address space, you typically have to link EVERYTHING
as a single image to place each thing in its own piece of memory
(or use segment based addressing).
Now if you have missed the "logical" word in my post I can
understand why you went into all that. But I was quite explicit
about it.
Anyway, I am glad we agree that a 64 bit logical address space
is no obstacle to security. From there on it can only be something
to make programming life easier.
On 6/9/2021 12:17 AM, David Brown wrote:
Process geometries are not targeted at 64-bit. They are targeted at
smaller, faster and lower dynamic power. In order to produce such a big
design as a 64-bit cpu, you'll aim for a minimum level of process
sophistication - but that same process can be used for twice as many
32-bit cores, or bigger sram, or graphics accelerators, or whatever else
suits the needs of the device.
They will apply newer process geometries to newer devices.
No one is going to retool an existing design -- unless doing
so will result in a significant market enhancement.
Why don't we have 100MHz MC6800's?
But you are absolutely right about maths (floating point or integer) -
having 32-bit gives you a lot more freedom and less messing around with
scaling back and forth to make things fit and work efficiently in 8-bit
or 16-bit. And if you have floating point hardware (and know how to use
it properly), that opens up new possibilities.
64-bit cores will extend that, but the step is almost negligable in
comparison. It would be wrong to say "int32_t is enough for anyone",
but it is /almost/ true. It is certainly true enough that it is not a
problem that using "int64_t" takes two instructions instead of one.
Except that int64_t can take *four* instead of one (add/sub/mul two
int64_t's with 32b hardware).
On Wed, 9 Jun 2021 03:12:12 -0700, Don Y <blockedofcourse@foo.invalid>
wrote:
On 6/9/2021 12:17 AM, David Brown wrote:
Process geometries are not targeted at 64-bit. They are targeted at
smaller, faster and lower dynamic power. In order to produce such a big >>> design as a 64-bit cpu, you'll aim for a minimum level of process
sophistication - but that same process can be used for twice as many
32-bit cores, or bigger sram, or graphics accelerators, or whatever else >>> suits the needs of the device.
They will apply newer process geometries to newer devices.
No one is going to retool an existing design -- unless doing
so will result in a significant market enhancement.
Why don't we have 100MHz MC6800's?
A number of years ago somebody had a 200MHz 6502. Granted, it was a
soft core implemented in an ASIC.
No idea what it was used for.
But you are absolutely right about maths (floating point or integer) -
having 32-bit gives you a lot more freedom and less messing around with
scaling back and forth to make things fit and work efficiently in 8-bit
or 16-bit. And if you have floating point hardware (and know how to use >>> it properly), that opens up new possibilities.
64-bit cores will extend that, but the step is almost negligable in
comparison. It would be wrong to say "int32_t is enough for anyone",
but it is /almost/ true. It is certainly true enough that it is not a
problem that using "int64_t" takes two instructions instead of one.
Except that int64_t can take *four* instead of one (add/sub/mul two
int64_t's with 32b hardware).
A 32b CPU could require a dozen instructions to do 64b math depending
on whether it has condition flags, whether math ops set the condition
flags (vs requiring explicit compare or compare/branch), and whether
it even has carry aware ops [some chips don't]
If detecting wrap-around/overflow requires comparing the result
against the operands, multi-word arithmetic (even just 2 words)
quickly becomes long and messy.
On 6/8/2021 3:01 PM, Dimiter_Popoff wrote:
Am trying to puzzle out what a 64-bit embedded processor should look like. >>> At the low end, yeah, a simple RISC processor. And support for complex
arithmetic
using 32-bit floats? And support for pixel alpha blending using quad 16-bit
numbers?
32-bit pointers into the software?
The real value in 64 bit integer registers and 64 bit address space is
just that, having an orthogonal "endless" space (well I remember some
30 years ago 32 bits seemed sort of "endless" to me...).
Not needing to assign overlapping logical addresses to anything
can make a big difference to how the OS is done.
That depends on what you expect from the OS. If you are
comfortable with the possibility of bugs propagating between
different subsystems, then you can live with a logical address
space that exactly coincides with a physical address space.
But, consider how life was before Windows used compartmentalized
applications (and OS). How easily it is for one "application"
(or subsystem) to cause a reboot -- unceremoniously.
The general direction (in software development, and, by
association, hardware) seems to be to move away from unrestrained
access to the underlying hardware in an attempt to limit the
amount of damage that a "misbehaving" application can cause.
You see this in languages designed to eliminate dereferencing
pointers, pointer arithmetic, etc. Languages that claim to
ensure your code can't misbehave because it can only do
exactly what the language allows (no more injecting ASM
into your HLL code).
32 bit FPU seems useless to me, 64 bit is OK. Although 32 FP
*numbers* can be quite useful for storing/passing data.
32 bit numbers have appeal if you're registers are 32b;
they "fit nicely". Ditto 64b in 64b registers.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 286 |
Nodes: | 16 (2 / 14) |
Uptime: | 89:53:49 |
Calls: | 6,496 |
Calls today: | 7 |
Files: | 12,100 |
Messages: | 5,277,556 |