Forum: >>> Magnum BBS <<<

How to write a simple driver in bare metal systems: volatile, memory ba

From pozz@21:1/5 to All on Sat Oct 23 00:07:40 2021

Even I write software for embedded systems for more than 10 years,
there's an argument that from time to time let me think for hours and
leave me with many doubts.

Consider a simple embedded system based on a MCU (AVR8 or Cortex-Mx).
The software is bare metal, without any OS. The main pattern is the well
known mainloop (background code) that is interrupted by ISR.

Interrupts are used mainly for timings and for low-level driver. For
example, the UART reception ISR move the last received char in a FIFO
buffer, while the mainloop code pops new data from the FIFO.

static struct {
unsigned char buf[RXBUF_SIZE];
uint8_t in;
uint8_t out;
} rxfifo;

/* ISR */
void uart_rx_isr(void) {
unsigned char c = UART->DATA;
rxfifo.buf[in % RXBUF_SIZE] = c;
rxfifo.in++;
// Reset interrupt flag
}

/* Called regularly from mainloop code */
int uart_task(void) {
int c = -1;
if (out != in) {
c = rxfifo.buf[out % RXBUF_SIZE];
out++;
}
return -1;
}

From a 20-years old article[1] by Nigle Jones, this seems a situation
where volatile must be used for rxfifo.in, because is modified by an ISR
and used in the mainloop code.

I don't think so, rxfifo.in is read from memory only one time in
uart_task(), so there isn't the risk that compiler can optimize badly.
Even if ISR is fired immediately after the if statement, this doesn't
bring to a dangerous state: the just received data will be processed at
the next call to uart_task().

So IMHO volatile isn't necessary here. And critical sections (i.e.
disabling interrupts) aren't useful too.

However I'm thinking about memory barrier. Suppose the compiler reorder
the instructions in uart_task() as follows:

c = rxfifo.buf[out % RXBUF_SIZE]
if (out != in) {
out++;
return c;
} else {
return -1;
}

Here there's a big problem, because compiler decided to firstly read rxfifo.buf[] and then test in and out equality. If the ISR is fired
immediately after moving data to c (most probably an internal register),
the condition in the if statement will be true and the register value is returned. However the register value isn't correct.

I don't think any modern C compiler reorder uart_task() in this way, but
we can't be sure. The result shouldn't change for the compiler, so it
can do this kind of things.

How to fix this issue if I want to be extremely sure the compiler will
not reorder this way? Applying volatile to rxfifo.in shouldn't help for
this, because compiler is allowed to reorder access of non volatile
variables yet[2].

One solution is adding a memory barrier in this way:

int uart_task(void) {
int c = -1;
if (out != in) {
memory_barrier();
c = rxfifo.buf[out % RXBUF_SIZE];
out++;
}
return -1;
}

However this approach appears to me dangerous. You have to check and
double check if, when and where memory barriers are necessary and it's
simple to skip a barrier where it's nedded and add a barrier where it
isn't needed.

So I'm thinking that a sub-optimal (regarding efficiency) but reliable (regarding the risk to skip a barrier where it is needed) could be to
enter a critical section (disabling interrupts) anyway, if it isn't
strictly needed.

int uart_task(void) {
ENTER_CRITICAL_SECTION();
int c = -1;
if (out != in) {
c = rxfifo.buf[out % RXBUF_SIZE];
out++;
}
EXIT_CRITICAL_SECTION();
return -1;
}

Another solution could be to apply volatile keyword to rxfifo.in *AND* rxfifo.buf too, so compiler can't change the order of accesses them.

Do you have other suggestions?

[1] https://barrgroup.com/embedded-systems/how-to/c-volatile-keyword
[2] https://blog.regehr.org/archives/28

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Clifford Heath@21:1/5 to pozz on Sat Oct 23 13:40:40 2021

On 23/10/21 9:07 am, pozz wrote:

Even I write software for embedded systems for more than 10 years,
there's an argument that from time to time let me think for hours and
leave me with many doubts.

Consider a simple embedded system based on a MCU (AVR8 or Cortex-Mx).
The software is bare metal, without any OS. The main pattern is the well known mainloop (background code) that is interrupted by ISR.

Interrupts are used mainly for timings and for low-level driver. For
example, the UART reception ISR move the last received char in a FIFO
buffer, while the mainloop code pops new data from the FIFO.

static struct {
� unsigned char buf[RXBUF_SIZE];
� uint8_t in;
� uint8_t out;
} rxfifo;

/* ISR */
void uart_rx_isr(void) {
� unsigned char c = UART->DATA;
� rxfifo.buf[in % RXBUF_SIZE] = c;
� rxfifo.in++;
� // Reset interrupt flag
}

/* Called regularly from mainloop code */
int uart_task(void) {
� int c = -1;
� if (out != in) {
�� c = rxfifo.buf[out % RXBUF_SIZE];
�� out++;
� }
� return -1;
}

From a 20-years old article[1] by Nigle Jones, this seems a situation
where volatile must be used for rxfifo.in, because is modified by an ISR
and used in the mainloop code.

I don't think so, rxfifo.in is read from memory only one time in
uart_task(), so there isn't the risk that compiler can optimize badly.
Even if ISR is fired immediately after the if statement, this doesn't
bring to a dangerous state: the just received data will be processed at
the next call to uart_task().

So IMHO volatile isn't necessary here. And critical sections (i.e.
disabling interrupts) aren't useful too.

However I'm thinking about memory barrier. Suppose the compiler reorder
the instructions in uart_task() as follows:

� c = rxfifo.buf[out % RXBUF_SIZE]
� if (out != in) {
�� out++;
�� return c;
� } else {
�� return -1;
� }

Here there's a big problem, because compiler decided to firstly read rxfifo.buf[] and then test in and out equality. If the ISR is fired immediately after moving data to c (most probably an internal register),
the condition in the if statement will be true and the register value is returned. However the register value isn't correct.

I don't think any modern C compiler reorder uart_task() in this way, but
we can't be sure. The result shouldn't change for the compiler, so it
can do this kind of things.

How to fix this issue if I want to be extremely sure the compiler will
not reorder this way? Applying volatile to rxfifo.in shouldn't help for
this, because compiler is allowed to reorder access of non volatile
variables yet[2].

One solution is adding a memory barrier in this way:

int uart_task(void) {
� int c = -1;
� if (out != in) {
�� memory_barrier();
�� c = rxfifo.buf[out % RXBUF_SIZE];
�� out++;
� }
� return -1;
}

However this approach appears to me dangerous. You have to check and
double check if, when and where memory barriers are necessary and it's
simple to skip a barrier where it's nedded and add a barrier where it
isn't needed.

So I'm thinking that a sub-optimal (regarding efficiency) but reliable (regarding the risk to skip a barrier where it is needed) could be to
enter a critical section (disabling interrupts) anyway, if it isn't
strictly needed.

int uart_task(void) {
� ENTER_CRITICAL_SECTION();
� int c = -1;
� if (out != in) {
�� c = rxfifo.buf[out % RXBUF_SIZE];
�� out++;
� }
� EXIT_CRITICAL_SECTION();
� return -1;
}

Another solution could be to apply volatile keyword to rxfifo.in *AND* rxfifo.buf too, so compiler can't change the order of accesses them.

Do you have other suggestions?

[1] https://barrgroup.com/embedded-systems/how-to/c-volatile-keyword
[2] https://blog.regehr.org/archives/28

This is a good introduction to how Linux makes this possible for its
horde of device-driver authors:

<https://www.kernel.org/doc/Documentation/memory-barriers.txt>

Clifford Heath

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Don Y@21:1/5 to pozz on Fri Oct 22 22:09:17 2021

On 10/22/2021 3:07 PM, pozz wrote:

static struct {
unsigned char buf[RXBUF_SIZE];
uint8_t in;
uint8_t out;
} rxfifo;

/* ISR */
void uart_rx_isr(void) {
unsigned char c = UART->DATA;
rxfifo.buf[in % RXBUF_SIZE] = c;
rxfifo.in++;
// Reset interrupt flag
}

/* Called regularly from mainloop code */
int uart_task(void) {
int c = -1;

Why? And why a retval from uart_task -- if it is always "-1"?

if (out != in) {
c = rxfifo.buf[out % RXBUF_SIZE];
out++;
}
return -1;
}

This is a bug(s) waiting to happen.

How is RXBUF_SIZE defined? How does it reflect the data rate (and,
thus, interrupt rate) as well as the maximum latency between "main
loop" accesses? I.e., what happens when the buffer is *full* -- and,
thus, appears EMPTY? What stops the "in" member from growing to the
maximum size of a uint8 -- and then wrapping? How do you convey this
to the upper level code ("Hey, we just lost a whole RXBUF_SIZE of
characters so if the character stream doesn't make sense, that might
be a cause...")? What if RXBUF_SIZE is relatively prime wrt uint8max?

When writing UART handlers, I fetch the received datum along with
the uart's flags and stuff *both* of those things in the FIFO.
If the FIFO would be full, I, instead, modify the flags of the
preceeding datum to reflect this fact ("Some number of characters
have been lost AFTER this one...") and discard the current character.

I then signal an event and let a task waiting for that specific event
wake up and retrieve the contents of the FIFO (which may include more
than one character, at that time as characters can arrive after the
initial event has been signaled).

This lets me move the line discipline out of the ISR and still keep
the system "responsive".

Figure out everything that you need to do before you start sorting out
how the compiler can "shaft" you...

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to pozz on Sat Oct 23 18:09:13 2021

On 23/10/2021 00:07, pozz wrote:

Even I write software for embedded systems for more than 10 years,
there's an argument that from time to time let me think for hours and
leave me with many doubts.

It's nice to see a thread like this here - the group needs such discussions!

Consider a simple embedded system based on a MCU (AVR8 or Cortex-Mx).
The software is bare metal, without any OS. The main pattern is the well known mainloop (background code) that is interrupted by ISR.

Interrupts are used mainly for timings and for low-level driver. For
example, the UART reception ISR move the last received char in a FIFO
buffer, while the mainloop code pops new data from the FIFO.

static struct {
� unsigned char buf[RXBUF_SIZE];
� uint8_t in;
� uint8_t out;
} rxfifo;

/* ISR */
void uart_rx_isr(void) {
� unsigned char c = UART->DATA;
� rxfifo.buf[in % RXBUF_SIZE] = c;
� rxfifo.in++;

Unless you are sure that RXBUF_SIZE is a power of two, this is going to
be quite slow on an AVR. Modulo means division, and while division by a constant is usually optimised to a multiplication by the compiler, you
still have a multiply, a shift, and some compensation for it all being
done as signed integer arithmetic.

It's also wrong, for non-power of two sizes, since the wrapping of your increment and your modulo RXBUF_SIZE get out of sync.

The usual choice is to track "head" and "tail", and use something like:

void uart_rx_isr(void) {
unsigned char c = UART->DATA;
// Reset interrupt flag
uint8_t next = rxfifo.tail;
rxfifo.buf[next] = c;
next++;
if (next >= RXBUF_SIZE) next -= RXBUF_SIZE;
rxfifo.tail = next;
}

� // Reset interrupt flag
}

/* Called regularly from mainloop code */
int uart_task(void) {
� int c = -1;
� if (out != in) {
�� c = rxfifo.buf[out % RXBUF_SIZE];
�� out++;
� }
� return -1;
}

int uart_task(void) {
int c = -1;
uint8_t next = rxfifo.head;
if (next != rxfifo.tail) {
c = rxfifo.buf[next];
next++;
if (next >= RXBUF_SIZE) next -= RXBUF_SIZE;
rxfifo.head = next;
}
return c;
}

These don't track buffer overflow at all - you need to call uart_task()
often enough to avoid that.

(I'm skipping volatiles so we don't get ahead of your point.)

From a 20-years old article[1] by Nigle Jones, this seems a situation
where volatile must be used for rxfifo.in, because is modified by an ISR
and used in the mainloop code.

Certainly whenever data is shared between ISR's and mainloop code, or
different threads, then you need to think about how to make sure data is synchronised and exchanged. "volatile" is one method, atomics are
another, and memory barriers can be used.

I don't think so, rxfifo.in is read from memory only one time in
uart_task(), so there isn't the risk that compiler can optimize badly.

That is incorrect in two ways. One - baring compiler bugs (which do
occur, but they are very rare compared to user bugs), there is no such
thing as "optimising badly". If optimising changes the behaviour of the
code, other than its size and speed, the code is wrong. Two - it is a
very bad idea to imagine that having code inside a function somehow
"protects" it from re-ordering or other optimisation.

Functions can be inlined, outlined, cloned, and shuffled about.
Link-time optimisation, code in headers, C++ modules, and other
cross-unit optimisations are becoming more and more common. So while it
might be true /today/ that the compiler has no alternative but to read rxfifo.in once per call to uart_task(), you cannot assume that will be
the case with later compilers or with more advanced optimisation
techniques enabled. It is safer, more portable, and more future-proof
to avoid such assumptions.

Even if ISR is fired immediately after the if statement, this doesn't
bring to a dangerous state: the just received data will be processed at
the next call to uart_task().

So IMHO volatile isn't necessary here. And critical sections (i.e.
disabling interrupts) aren't useful too.

However I'm thinking about memory barrier. Suppose the compiler reorder
the instructions in uart_task() as follows:

� c = rxfifo.buf[out % RXBUF_SIZE]
� if (out != in) {
�� out++;
�� return c;
� } else {
�� return -1;
� }

Here there's a big problem, because compiler decided to firstly read rxfifo.buf[] and then test in and out equality. If the ISR is fired immediately after moving data to c (most probably an internal register),
the condition in the if statement will be true and the register value is returned. However the register value isn't correct.

You are absolutely correct.

I don't think any modern C compiler reorder uart_task() in this way, but
we can't be sure. The result shouldn't change for the compiler, so it
can do this kind of things.

It is not an unreasonable re-arrangement. On processors with
out-of-order execution (which does not apply to the AVR or Cortex-M),
compilers will often push loads as early as they can in the instruction
stream so that they start the cache loading process as quickly as
possible. (But note that on such "big" processors, much of this
discussion on volatile and memory barriers is not sufficient, especially
if there is more than one core. You need atomics and fences, but that's
a story for another day.)

How to fix this issue if I want to be extremely sure the compiler will
not reorder this way? Applying volatile to rxfifo.in shouldn't help for
this, because compiler is allowed to reorder access of non volatile
variables yet[2].

The important thing about "volatile" is that it is /accesses/ that are volatile, not objects. A volatile object is nothing more than an object
for which all accesses are volatile by default. But you can use
volatile accesses on non-volatile objects. This macro is your friend:

#define volatileAccess(v) *((volatile typeof((v)) *) &(v))

(Linux has the same macro, called ACCESS_ONCE. It uses a gcc extension
- if you are using other compilers then you can make an uglier
equivalent using _Generic. However, if you are using a C compiler that supports C11, it is probably gcc or clang, and you can use the "typeof" extension.)

That macro will let you make a volatile read or write to an object
without requiring that /all/ accesses to it are volatile.

One solution is adding a memory barrier in this way:

int uart_task(void) {
� int c = -1;
� if (out != in) {
�� memory_barrier();
�� c = rxfifo.buf[out % RXBUF_SIZE];
�� out++;
� }
� return -1;
}

Note that you are forcing the compiler to read "out" twice here, as it
can't keep the value of "out" in a register across the memory barrier.
(And as I mentioned before, the compiler might be able to do larger
scale optimisation across compilation units or functions, and in that
way keep values across multiple calls to uart_task.)

However this approach appears to me dangerous. You have to check and
double check if, when and where memory barriers are necessary and it's
simple to skip a barrier where it's nedded and add a barrier where it
isn't needed.

Memory barriers are certainly useful, but they are a shotgun approach -
they affect /everything/ involving reads and writes to memory. (But
remember they don't affect ordering of calculations.)

So I'm thinking that a sub-optimal (regarding efficiency) but reliable (regarding the risk to skip a barrier where it is needed) could be to
enter a critical section (disabling interrupts) anyway, if it isn't
strictly needed.

int uart_task(void) {
� ENTER_CRITICAL_SECTION();
� int c = -1;
� if (out != in) {
�� c = rxfifo.buf[out % RXBUF_SIZE];
�� out++;
� }
� EXIT_CRITICAL_SECTION();
� return -1;
}

Critical sections for something like this are /way/ overkill. And a
critical section with a division in the middle? Not a good idea.

Another solution could be to apply volatile keyword to rxfifo.in *AND* rxfifo.buf too, so compiler can't change the order of accesses them.

Do you have other suggestions?

Marking "in" and "buf" as volatile is /far/ better than using a critical section, and likely to be more efficient than a memory barrier. You can
also use volatileAccess rather than making buf volatile, and it is often slightly more efficient to cache volatile variables in a local variable
while working with them.

[1] https://barrgroup.com/embedded-systems/how-to/c-volatile-keyword
[2] https://blog.regehr.org/archives/28

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From pozz@21:1/5 to As I on Sat Oct 23 22:12:47 2021

Il 23/10/2021 07:09, Don Y ha scritto:

On 10/22/2021 3:07 PM, pozz wrote:

static struct {
�� unsigned char buf[RXBUF_SIZE];
�� uint8_t in;
�� uint8_t out;
} rxfifo;

/* ISR */
void uart_rx_isr(void) {
�� unsigned char c = UART->DATA;
�� rxfifo.buf[in % RXBUF_SIZE] = c;
�� rxfifo.in++;
�� // Reset interrupt flag
}

/* Called regularly from mainloop code */
int uart_task(void) {
�� int c = -1;

Why?� And why a retval from uart_task -- if it is always "-1"?

It was my mistake. The last instruction of uart_task() should be

return c;

And maybe the name of uart_task() is not so good, it should be uart_rx().

�� if (out != in) {
�� c = rxfifo.buf[out % RXBUF_SIZE];
�� out++;
�� }
�� return -1;
}

This is a bug(s) waiting to happen.

How is RXBUF_SIZE defined?

Power of two.

How does it reflect the data rate (and,
thus, interrupt rate) as well as the maximum latency between "main
loop" accesses?

Rx FIFO filled by interrupt is needed to face a burst (a packet?) of
incoming characters.

If the baudrate is 9600bps 8n1, interrupt would be fired every
10/9600=1ms. If maximum interval between two successive uart_task()
calls is 10ms, it is sufficient a buffer of 10 bytes, so RXBUF_SIZE
could be 16 or 32.

I.e., what happens when the buffer is *full* -- and,
thus, appears EMPTY?

These are good questions, but I didn't want to discuss about them. Of
course ISR is not complete, because before pushing a new byte, we must
check if FIFO is full. For example:

/* The difference in-out gives a correct result even after a
* wrap-around of in only, thanks to unsigned arithmetic. */
#define RXFIFO_IS_FULL() (rxfifo.in - rxfifo.out < RXBUF_SIZE)

void uart_rx_isr(void) {
unsigned char c = UART->DATA;
if (!RXFIFO_IS_FULL()) {
rxfifo.buf[in % RXBUF_SIZE] = c;
rxfifo.in++;
} else {
// FIFO is full, ignore the char
}
// Reset interrupt flag
}

What stops the "in" member from growing to the
maximum size of a uint8 -- and then wrapping?

As I wrote, this should work even after a wrap-around.

How do you convey this
to the upper level code ("Hey, we just lost a whole RXBUF_SIZE of
characters so if the character stream doesn't make sense, that might
be a cause...")?

FIFO full is event is extremely rare if I'm able to size rx FIFO
correctly, i.e. on the worst case.
Anyway I usually ignore incoming chars when the FIFO is full. The high
level protocols are usually defined in such a way the absence of chars
are detected, mostly thanks to CRC.

What if RXBUF_SIZE is relatively prime wrt uint8max?

When writing UART handlers, I fetch the received datum along with
the uart's flags and stuff *both* of those things in the FIFO.
If the FIFO would be full, I, instead, modify the flags of the
preceeding datum to reflect this fact ("Some number of characters
have been lost AFTER this one...") and discard the current character.

I then signal an event and let a task waiting for that specific event
wake up and retrieve the contents of the FIFO (which may include more
than one character, at that time as characters can arrive after the
initial event has been signaled).

Signal an event? Task waiting for a specific event? Maybe you are
thinking of a full RTOS. I was thinking of bare metal systems.

This lets me move the line discipline out of the ISR and still keep
the system "responsive".

Figure out everything that you need to do before you start sorting out
how the compiler can "shaft" you...

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From pozz@21:1/5 to All on Sat Oct 23 22:49:37 2021

Il 23/10/2021 18:09, David Brown ha scritto:

On 23/10/2021 00:07, pozz wrote:

Even I write software for embedded systems for more than 10 years,
there's an argument that from time to time let me think for hours and
leave me with many doubts.

It's nice to see a thread like this here - the group needs such discussions!

Consider a simple embedded system based on a MCU (AVR8 or Cortex-Mx).
The software is bare metal, without any OS. The main pattern is the well
known mainloop (background code) that is interrupted by ISR.

Interrupts are used mainly for timings and for low-level driver. For
example, the UART reception ISR move the last received char in a FIFO
buffer, while the mainloop code pops new data from the FIFO.

static struct {
� unsigned char buf[RXBUF_SIZE];
� uint8_t in;
� uint8_t out;
} rxfifo;

/* ISR */
void uart_rx_isr(void) {
� unsigned char c = UART->DATA;
� rxfifo.buf[in % RXBUF_SIZE] = c;
� rxfifo.in++;

Unless you are sure that RXBUF_SIZE is a power of two, this is going to
be quite slow on an AVR. Modulo means division, and while division by a constant is usually optimised to a multiplication by the compiler, you
still have a multiply, a shift, and some compensation for it all being
done as signed integer arithmetic.

It's also wrong, for non-power of two sizes, since the wrapping of your increment and your modulo RXBUF_SIZE get out of sync.

Yes, RXBUF_SIZE is a power of two.

The usual choice is to track "head" and "tail", and use something like:

void uart_rx_isr(void) {
unsigned char c = UART->DATA;
// Reset interrupt flag
uint8_t next = rxfifo.tail;
rxfifo.buf[next] = c;
next++;
if (next >= RXBUF_SIZE) next -= RXBUF_SIZE;
rxfifo.tail = next;
}

This isn't the point of this thread, anyway...
You insist that tail is always in the range [0...RXBUF_SIZE - 1]. My
approach is different.

RXBUF_SIZE is a power of two, usualy <=256. head and tail are uint8_t
and *can* reach the maximum value of 255, even RXBUF_SIZE is 128. All
works well.

Suppose rxfifo.in=rxfifo.out=127, FIFO is empty. When a new char is
received, it is saved into rxfifo.buf[127 % 128=127] and rxfifo.in will
be increased to 128.
Now mainloop detect the new char (in != out), reads the new char at rxfifo.buf[127 % 128=127] and increase out that will be 128.

The next byte will be saved into rxfifo.rxbuf[rxfifo.in % 128=128 % 128
= 0] and rxfifo.in will be 129. Again, the next byte will be saved to rxbuf[rxfifo.in % 128=129 % 128=1] and rxfifo.in will be 130.

When the mainloop tries to pop data from fifo, it tests

rxfifo.in(130) !=rxfifo.out(128)

The test is true, so the code extracts chars from rxbuf[out % 128] that
is rxbuf[0]... and so on.

I hope that explanation is good.

� // Reset interrupt flag
}

/* Called regularly from mainloop code */
int uart_task(void) {
� int c = -1;
� if (out != in) {
�� c = rxfifo.buf[out % RXBUF_SIZE];
�� out++;
� }
� return -1;
}

int uart_task(void) {
int c = -1;
uint8_t next = rxfifo.head;
if (next != rxfifo.tail) {
c = rxfifo.buf[next];
next++;
if (next >= RXBUF_SIZE) next -= RXBUF_SIZE;
rxfifo.head = next;
}
return c;
}

These don't track buffer overflow at all - you need to call uart_task()
often enough to avoid that.

Sure, with a good number for RXBUF_SIZE, buffer overflow shouldn't
happen ever. Anyway, if it happens, the higher level layers (protocol)
should detect a corrupted packet.

(I'm skipping volatiles so we don't get ahead of your point.)

From a 20-years old article[1] by Nigle Jones, this seems a situation
where volatile must be used for rxfifo.in, because is modified by an ISR
and used in the mainloop code.

Certainly whenever data is shared between ISR's and mainloop code, or different threads, then you need to think about how to make sure data is synchronised and exchanged. "volatile" is one method, atomics are
another, and memory barriers can be used.

I don't think so, rxfifo.in is read from memory only one time in
uart_task(), so there isn't the risk that compiler can optimize badly.

That is incorrect in two ways. One - baring compiler bugs (which do
occur, but they are very rare compared to user bugs), there is no such
thing as "optimising badly". If optimising changes the behaviour of the code, other than its size and speed, the code is wrong.

Yes of course, but I don't think the absence of volatile for rxfifo.in,
even if it can change in ISR, could be a *real* problem with *modern and current* compilers.

voltile attribute needs to avoid compiler optimization (that would be a
bad thing, because of volatile nature of the variabile), but on that
code it's difficult to think of an optimization, caused by the absence
of volatile, that changes the behaviour erroneously... except reorering.

Two - it is a
very bad idea to imagine that having code inside a function somehow "protects" it from re-ordering or other optimisation.

I didn't say this, at the contrary I was thinking exactly to reordering
issues.

Functions can be inlined, outlined, cloned, and shuffled about.
Link-time optimisation, code in headers, C++ modules, and other
cross-unit optimisations are becoming more and more common. So while it might be true /today/ that the compiler has no alternative but to read rxfifo.in once per call to uart_task(), you cannot assume that will be
the case with later compilers or with more advanced optimisation
techniques enabled. It is safer, more portable, and more future-proof
to avoid such assumptions.

Ok, you are talking of future scenarios. I don't think actually this
could be a real problem. Anyway your observation makes sense.

Even if ISR is fired immediately after the if statement, this doesn't
bring to a dangerous state: the just received data will be processed at
the next call to uart_task().

So IMHO volatile isn't necessary here. And critical sections (i.e.
disabling interrupts) aren't useful too.

However I'm thinking about memory barrier. Suppose the compiler reorder
the instructions in uart_task() as follows:

� c = rxfifo.buf[out % RXBUF_SIZE]
� if (out != in) {
�� out++;
�� return c;
� } else {
�� return -1;
� }

Here there's a big problem, because compiler decided to firstly read
rxfifo.buf[] and then test in and out equality. If the ISR is fired
immediately after moving data to c (most probably an internal register),
the condition in the if statement will be true and the register value is
returned. However the register value isn't correct.

You are absolutely correct.

I don't think any modern C compiler reorder uart_task() in this way, but
we can't be sure. The result shouldn't change for the compiler, so it
can do this kind of things.

It is not an unreasonable re-arrangement. On processors with
out-of-order execution (which does not apply to the AVR or Cortex-M), compilers will often push loads as early as they can in the instruction stream so that they start the cache loading process as quickly as
possible. (But note that on such "big" processors, much of this
discussion on volatile and memory barriers is not sufficient, especially
if there is more than one core. You need atomics and fences, but that's
a story for another day.)

How to fix this issue if I want to be extremely sure the compiler will
not reorder this way? Applying volatile to rxfifo.in shouldn't help for
this, because compiler is allowed to reorder access of non volatile
variables yet[2].

The important thing about "volatile" is that it is /accesses/ that are volatile, not objects. A volatile object is nothing more than an object
for which all accesses are volatile by default. But you can use
volatile accesses on non-volatile objects. This macro is your friend:

#define volatileAccess(v) *((volatile typeof((v)) *) &(v))

(Linux has the same macro, called ACCESS_ONCE. It uses a gcc extension
- if you are using other compilers then you can make an uglier
equivalent using _Generic. However, if you are using a C compiler that supports C11, it is probably gcc or clang, and you can use the "typeof" extension.)

That macro will let you make a volatile read or write to an object
without requiring that /all/ accesses to it are volatile.

This is a good point. The code in ISR can't be interrupted, so there's
no need to have volatile access in ISR.

One solution is adding a memory barrier in this way:

int uart_task(void) {
� int c = -1;
� if (out != in) {
�� memory_barrier();
�� c = rxfifo.buf[out % RXBUF_SIZE];
�� out++;
� }
� return -1;
}

Note that you are forcing the compiler to read "out" twice here, as it
can't keep the value of "out" in a register across the memory barrier.

Yes, you're right. A small penalty to avoid the problem of reordering.

(And as I mentioned before, the compiler might be able to do larger
scale optimisation across compilation units or functions, and in that
way keep values across multiple calls to uart_task.)

However this approach appears to me dangerous. You have to check and
double check if, when and where memory barriers are necessary and it's
simple to skip a barrier where it's nedded and add a barrier where it
isn't needed.

Memory barriers are certainly useful, but they are a shotgun approach -
they affect /everything/ involving reads and writes to memory. (But
remember they don't affect ordering of calculations.)

So I'm thinking that a sub-optimal (regarding efficiency) but reliable
(regarding the risk to skip a barrier where it is needed) could be to
enter a critical section (disabling interrupts) anyway, if it isn't
strictly needed.

int uart_task(void) {
� ENTER_CRITICAL_SECTION();
� int c = -1;
� if (out != in) {
�� c = rxfifo.buf[out % RXBUF_SIZE];
�� out++;
� }
� EXIT_CRITICAL_SECTION();
� return -1;
}

Critical sections for something like this are /way/ overkill. And a
critical section with a division in the middle? Not a good idea.

Another solution could be to apply volatile keyword to rxfifo.in *AND*
rxfifo.buf too, so compiler can't change the order of accesses them.

Do you have other suggestions?

Marking "in" and "buf" as volatile is /far/ better than using a critical section, and likely to be more efficient than a memory barrier. You can
also use volatileAccess rather than making buf volatile, and it is often slightly more efficient to cache volatile variables in a local variable
while working with them.

Yes, I think so too. Lastly I read many experts say volatile is often a
bad thing, so I'm re-thinking about its use compared with other approaches.

[1] https://barrgroup.com/embedded-systems/how-to/c-volatile-keyword
[2] https://blog.regehr.org/archives/28

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Don Y@21:1/5 to pozz on Sat Oct 23 15:59:57 2021

On 10/23/2021 1:12 PM, pozz wrote:

This is a bug(s) waiting to happen.

How is RXBUF_SIZE defined?

Power of two.

The point was its relationship to the actual code.

How does it reflect the data rate (and,
thus, interrupt rate) as well as the maximum latency between "main
loop" accesses?

Rx FIFO filled by interrupt is needed to face a burst (a packet?) of incoming characters.

If the baudrate is 9600bps 8n1, interrupt would be fired every 10/9600=1ms. If
maximum interval between two successive uart_task() calls is 10ms, it is sufficient a buffer of 10 bytes, so RXBUF_SIZE could be 16 or 32.

What GUARANTEES this in your system? Folks often see things that "can't happen" -- yet DID (THEY SAW IT!). Your code/design should ensure that
"can't happen" REALLY /can't happen/. It costs very little to explain (commentary) WHY you don't have to check for X, Y or Z in your code.

[If the user's actions (or any outside agency) can affect operation,
then how can you guarantee that THEY "behave"?]

And, give that a very high degree of visibility so that when someone
decides they can increase the baudrate or add some sluggish task
to your "main loop" that this ASSUMPTION isn't silently violated.

I.e., what happens when the buffer is *full* -- and,
thus, appears EMPTY?

These are good questions, but I didn't want to discuss about them. Of course ISR is not complete, because before pushing a new byte, we must check if FIFO is full. For example:

My point is that you should fleshout your code before you start
thinking about what can go wrong.

E.g., if the ISR is the *only* entity to modify ".in" and always does so
in with interrupts off, then it can do so without worrying about conflict
with something else -- if those other things always ensure they read it atomically (if they read it just before or just after it has been modified
by the ISR, the value will still "work" -- they just may not realize, yet,
that there is an extra character in the buffer that they haven't yet seen).

Likewise, if the "task" is the only entity modifying ".out", then ensuring
that those modifications are atomic means the ISR can safely use any *single* reference to it.

How do you convey this
to the upper level code ("Hey, we just lost a whole RXBUF_SIZE of
characters so if the character stream doesn't make sense, that might
be a cause...")?

FIFO full is event is extremely rare if I'm able to size rx FIFO correctly, i.e. on the worst case.

"Rare" and "impossible" are too entirely different scenarios.
It is extremely rare for a specific individual to win the lottery.
But, any individual *can* win it!

Anyway I usually ignore incoming chars when the FIFO is full. The high level protocols are usually defined in such a way the absence of chars are detected,
mostly thanks to CRC.

What if the CRC characters disappear? Are you sure the front of one message can't appear to match the ass end of another?

"Pozz is here."
"Don is not here."

"Pozz is not here."

And that there is no value in knowing that one or more messages may have been dropped?

What if RXBUF_SIZE is relatively prime wrt uint8max?

When writing UART handlers, I fetch the received datum along with
the uart's flags and stuff *both* of those things in the FIFO.
If the FIFO would be full, I, instead, modify the flags of the
preceeding datum to reflect this fact ("Some number of characters
have been lost AFTER this one...") and discard the current character.

I then signal an event and let a task waiting for that specific event
wake up and retrieve the contents of the FIFO (which may include more
than one character, at that time as characters can arrive after the
initial event has been signaled).

Signal an event? Task waiting for a specific event? Maybe you are thinking of a
full RTOS. I was thinking of bare metal systems.

You can implement as much or as little of an OS as you choose;
you're not stuck with "all or nothing".

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Johann Klammer@21:1/5 to pozz on Sun Oct 24 12:39:02 2021

On 10/23/2021 12:07 AM, pozz wrote:

Even I write software for embedded systems for more than 10 years, there's an argument that from time to time let me think for hours and leave me with many doubts.

Consider a simple embedded system based on a MCU (AVR8 or Cortex-Mx). The software is bare metal, without any OS. The main pattern is the well known mainloop (background code) that is interrupted by ISR.

Interrupts are used mainly for timings and for low-level driver. For example, the UART reception ISR move the last received char in a FIFO buffer, while the mainloop code pops new data from the FIFO.

static struct {
unsigned char buf[RXBUF_SIZE];
uint8_t in;
uint8_t out;
} rxfifo;

/* ISR */
void uart_rx_isr(void) {
unsigned char c = UART->DATA;
rxfifo.buf[in % RXBUF_SIZE] = c;
rxfifo.in++;
// Reset interrupt flag
}

/* Called regularly from mainloop code */
int uart_task(void) {
int c = -1;
if (out != in) {
c = rxfifo.buf[out % RXBUF_SIZE];
out++;
}
return -1;
}

From a 20-years old article[1] by Nigle Jones, this seems a situation where volatile must be used for rxfifo.in, because is modified by an ISR and used in the mainloop code.

I don't think so, rxfifo.in is read from memory only one time in uart_task(), so there isn't the risk that compiler can optimize badly. Even if ISR is fired immediately after the if statement, this doesn't bring to a dangerous state: the just received

data will be processed at the next call to uart_task().

So IMHO volatile isn't necessary here. And critical sections (i.e. disabling interrupts) aren't useful too.

However I'm thinking about memory barrier. Suppose the compiler reorder the instructions in uart_task() as follows:

c = rxfifo.buf[out % RXBUF_SIZE]
if (out != in) {
out++;
return c;
} else {
return -1;
}

Here there's a big problem, because compiler decided to firstly read rxfifo.buf[] and then test in and out equality. If the ISR is fired immediately after moving data to c (most probably an internal register), the condition in the if statement will be

true and the register value is returned. However the register value isn't correct.

I don't think any modern C compiler reorder uart_task() in this way, but we can't be sure. The result shouldn't change for the compiler, so it can do this kind of things.

How to fix this issue if I want to be extremely sure the compiler will not reorder this way? Applying volatile to rxfifo.in shouldn't help for this, because compiler is allowed to reorder access of non volatile variables yet[2].

One solution is adding a memory barrier in this way:

int uart_task(void) {
int c = -1;
if (out != in) {
memory_barrier();
c = rxfifo.buf[out % RXBUF_SIZE];
out++;
}
return -1;
}

However this approach appears to me dangerous. You have to check and double check if, when and where memory barriers are necessary and it's simple to skip a barrier where it's nedded and add a barrier where it isn't needed.

So I'm thinking that a sub-optimal (regarding efficiency) but reliable (regarding the risk to skip a barrier where it is needed) could be to enter a critical section (disabling interrupts) anyway, if it isn't strictly needed.

int uart_task(void) {
ENTER_CRITICAL_SECTION();
int c = -1;
if (out != in) {
c = rxfifo.buf[out % RXBUF_SIZE];
out++;
}
EXIT_CRITICAL_SECTION();
return -1;
}

Another solution could be to apply volatile keyword to rxfifo.in *AND* rxfifo.buf too, so compiler can't change the order of accesses them.

Do you have other suggestions?

[1] https://barrgroup.com/embedded-systems/how-to/c-volatile-keyword
[2] https://blog.regehr.org/archives/28

Disable interrupts while accessing the fifo. you really have to.
alternatively you'll often get away not using a fifo at all,
unless you're blocking for a long while in some part of the code.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Dimiter_Popoff@21:1/5 to Johann Klammer on Sun Oct 24 14:14:11 2021

On 10/24/2021 13:39, Johann Klammer wrote:

On 10/23/2021 12:07 AM, pozz wrote:

Even I write software for embedded systems for more than 10 years, there's an argument that from time to time let me think for hours and leave me with many doubts.

Consider a simple embedded system based on a MCU (AVR8 or Cortex-Mx). The software is bare metal, without any OS. The main pattern is the well known mainloop (background code) that is interrupted by ISR.

Interrupts are used mainly for timings and for low-level driver. For example, the UART reception ISR move the last received char in a FIFO buffer, while the mainloop code pops new data from the FIFO.

static struct {
unsigned char buf[RXBUF_SIZE];
uint8_t in;
uint8_t out;
} rxfifo;

/* ISR */
void uart_rx_isr(void) {
unsigned char c = UART->DATA;
rxfifo.buf[in % RXBUF_SIZE] = c;
rxfifo.in++;
// Reset interrupt flag
}

/* Called regularly from mainloop code */
int uart_task(void) {
int c = -1;
if (out != in) {
c = rxfifo.buf[out % RXBUF_SIZE];
out++;
}
return -1;
}

From a 20-years old article[1] by Nigle Jones, this seems a situation where volatile must be used for rxfifo.in, because is modified by an ISR and used in the mainloop code.

I don't think so, rxfifo.in is read from memory only one time in uart_task(), so there isn't the risk that compiler can optimize badly. Even if ISR is fired immediately after the if statement, this doesn't bring to a dangerous state: the just received

data will be processed at the next call to uart_task().

So IMHO volatile isn't necessary here. And critical sections (i.e. disabling interrupts) aren't useful too.

However I'm thinking about memory barrier. Suppose the compiler reorder the instructions in uart_task() as follows:

c = rxfifo.buf[out % RXBUF_SIZE]
if (out != in) {
out++;
return c;
} else {
return -1;
}

Here there's a big problem, because compiler decided to firstly read rxfifo.buf[] and then test in and out equality. If the ISR is fired immediately after moving data to c (most probably an internal register), the condition in the if statement will be

true and the register value is returned. However the register value isn't correct.

I don't think any modern C compiler reorder uart_task() in this way, but we can't be sure. The result shouldn't change for the compiler, so it can do this kind of things.

How to fix this issue if I want to be extremely sure the compiler will not reorder this way? Applying volatile to rxfifo.in shouldn't help for this, because compiler is allowed to reorder access of non volatile variables yet[2].

One solution is adding a memory barrier in this way:

int uart_task(void) {
int c = -1;
if (out != in) {
memory_barrier();
c = rxfifo.buf[out % RXBUF_SIZE];
out++;
}
return -1;
}

However this approach appears to me dangerous. You have to check and double check if, when and where memory barriers are necessary and it's simple to skip a barrier where it's nedded and add a barrier where it isn't needed.

So I'm thinking that a sub-optimal (regarding efficiency) but reliable (regarding the risk to skip a barrier where it is needed) could be to enter a critical section (disabling interrupts) anyway, if it isn't strictly needed.

int uart_task(void) {
ENTER_CRITICAL_SECTION();
int c = -1;
if (out != in) {
c = rxfifo.buf[out % RXBUF_SIZE];
out++;
}
EXIT_CRITICAL_SECTION();
return -1;
}

Another solution could be to apply volatile keyword to rxfifo.in *AND* rxfifo.buf too, so compiler can't change the order of accesses them.

Do you have other suggestions?

[1] https://barrgroup.com/embedded-systems/how-to/c-volatile-keyword
[2] https://blog.regehr.org/archives/28

Disable interrupts while accessing the fifo. you really have to. alternatively you'll often get away not using a fifo at all,
unless you're blocking for a long while in some part of the code.

Why would you do that. The fifo write pointer is only modified by
the interrupt handler, the read pointer is only modified by the
interrupted code. Has been done so for times immemorial.

Although this thread is on how to wrestle a poor
language to do what you want, sort of how to use a hammer on a screw
instead of taking the screwdriver, there would be no need to
mask interrupts with C either.

======================================================
Dimiter Popoff, TGI http://www.tgi-sci.com ====================================================== http://www.flickr.com/photos/didi_tgi/

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to pozz on Sun Oct 24 13:02:32 2021

On 23/10/2021 22:49, pozz wrote:

Il 23/10/2021 18:09, David Brown ha scritto:

On 23/10/2021 00:07, pozz wrote:

Even I write software for embedded systems for more than 10 years,
there's an argument that from time to time let me think for hours and
leave me with many doubts.

It's nice to see a thread like this here - the group needs such
discussions!

Consider a simple embedded system based on a MCU (AVR8 or Cortex-Mx).
The software is bare metal, without any OS. The main pattern is the well >>> known mainloop (background code) that is interrupted by ISR.

Interrupts are used mainly for timings and for low-level driver. For
example, the UART reception ISR move the last received char in a FIFO
buffer, while the mainloop code pops new data from the FIFO.

static struct {
�� unsigned char buf[RXBUF_SIZE];
�� uint8_t in;
�� uint8_t out;
} rxfifo;

/* ISR */
void uart_rx_isr(void) {
�� unsigned char c = UART->DATA;
�� rxfifo.buf[in % RXBUF_SIZE] = c;
�� rxfifo.in++;

Unless you are sure that RXBUF_SIZE is a power of two, this is going to
be quite slow on an AVR.� Modulo means division, and while division by a
constant is usually optimised to a multiplication by the compiler, you
still have a multiply, a shift, and some compensation for it all being
done as signed integer arithmetic.

It's also wrong, for non-power of two sizes, since the wrapping of your
increment and your modulo RXBUF_SIZE get out of sync.

Yes, RXBUF_SIZE is a power of two.

If your code relies on that, make sure the code will fail to compile if
it is not the case. Documentation is good, compile-time check is better:

static_assert((RXBUF_SIZE & (RXBUF_SIZE - 1)) == 0, "Needs power of 2");

The usual choice is to track "head" and "tail", and use something like:

void uart_rx_isr(void) {
�� unsigned char c = UART->DATA;
�� // Reset interrupt flag
�� uint8_t next = rxfifo.tail;
�� rxfifo.buf[next] = c;
�� next++;
�� if (next >= RXBUF_SIZE) next -= RXBUF_SIZE;
�� rxfifo.tail = next;
}

This isn't the point of this thread, anyway...
You insist that tail is always in the range [0...RXBUF_SIZE - 1]. My
approach is different.

RXBUF_SIZE is a power of two, usualy <=256. head and tail are uint8_t
and *can* reach the maximum value of 255, even RXBUF_SIZE is 128. All
works well.

Yes, your approach will work - /if/ you have a power-of-two buffer size.
It has no noticeable efficiency advantages, merely an extra
inconvenient restriction and the possible confusion caused by doing
things in a different way from common idioms.

However, this is not the point of the thread - so I am happy to leave
that for now.

Suppose rxfifo.in=rxfifo.out=127, FIFO is empty. When a new char is
received, it is saved into rxfifo.buf[127 % 128=127] and rxfifo.in will
be increased to 128.
Now mainloop detect the new char (in != out), reads the new char at rxfifo.buf[127 % 128=127] and increase out that will be 128.

The next byte will be saved into rxfifo.rxbuf[rxfifo.in % 128=128 % 128
= 0] and rxfifo.in will be 129. Again, the next byte will be saved to rxbuf[rxfifo.in % 128=129 % 128=1] and rxfifo.in will be 130.

When the mainloop tries to pop data from fifo, it tests

�� rxfifo.in(130) !=rxfifo.out(128)

The test is true, so the code extracts chars from rxbuf[out % 128] that
is rxbuf[0]... and so on.

I hope that explanation is good.

�� // Reset interrupt flag
}

/* Called regularly from mainloop code */
int uart_task(void) {
�� int c = -1;
�� if (out != in) {
�� c = rxfifo.buf[out % RXBUF_SIZE];
�� out++;
�� }
�� return -1;
}

int uart_task(void) {
�� int c = -1;
�� uint8_t next = rxfifo.head;
�� if (next != rxfifo.tail) {
�� c = rxfifo.buf[next];
�� next++;
�� if (next >= RXBUF_SIZE) next -= RXBUF_SIZE;
�� rxfifo.head = next;
�� }
�� return c;
}

These don't track buffer overflow at all - you need to call uart_task()
often enough to avoid that.

Sure, with a good number for RXBUF_SIZE, buffer overflow shouldn't
happen ever. Anyway, if it happens, the higher level layers (protocol)
should detect a corrupted packet.

You risk getting seriously out of sync if there is an overflow.
Normally, on an overflow there will be a dropped character or two (which
as you say, must be caught at a higher level). Here you could end up
going round your buffer an extra time and /gaining/ RXBUF_SIZE extra characters.

Still, if you are sure that your functions are called fast enough so
that overflow is not a concern, then that's fine. Extra code to check
for a situation that can't occur is not helpful.

(I'm skipping volatiles so we don't get ahead of your point.)

�From a 20-years old article[1] by Nigle Jones, this seems a situation
where volatile must be used for rxfifo.in, because is modified by an ISR >>> and used in the mainloop code.

Certainly whenever data is shared between ISR's and mainloop code, or
different threads, then you need to think about how to make sure data is
synchronised and exchanged.� "volatile" is one method, atomics are
another, and memory barriers can be used.

I don't think so, rxfifo.in is read from memory only one time in
uart_task(), so there isn't the risk that compiler can optimize badly.

That is incorrect in two ways.� One - baring compiler bugs (which do
occur, but they are very rare compared to user bugs), there is no such
thing as "optimising badly".� If optimising changes the behaviour of the
code, other than its size and speed, the code is wrong.�

Yes of course, but I don't think the absence of volatile for rxfifo.in,
even if it can change in ISR, could be a *real* problem with *modern and current* compilers.

Personally, I am not satisfied with "it's unlikely to be a problem in
practice" - I prefer "The language guarantees it is not a problem".
Remember, when you know the data needs to be read at this point, then
using a volatile read is free. Volatile does not make code less
efficient unless you use it incorrectly and force more accesses than are necessary. So using volatile accesses for "rxfifo.in" here turns
"probably safe" into "certainly safe" without cost. What's not to like?

voltile attribute needs to avoid compiler optimization (that would be a
bad thing, because of volatile nature of the variabile), but on that
code it's difficult to think of an optimization, caused by the absence
of volatile, that changes the behaviour erroneously... except reorering.

Two - it is a
very bad idea to imagine that having code inside a function somehow
"protects" it from re-ordering or other optimisation.

I didn't say this, at the contrary I was thinking exactly to reordering issues.

Functions can be inlined, outlined, cloned, and shuffled about.
Link-time optimisation, code in headers, C++ modules, and other
cross-unit optimisations are becoming more and more common.� So while it
might be true /today/ that the compiler has no alternative but to read
rxfifo.in once per call to uart_task(), you cannot assume that will be
the case with later compilers or with more advanced optimisation
techniques enabled.� It is safer, more portable, and more future-proof
to avoid such assumptions.

Ok, you are talking of future scenarios. I don't think actually this
could be a real problem. Anyway your observation makes sense.

Even if ISR is fired immediately after the if statement, this doesn't
bring to a dangerous state: the just received data will be processed at
the next call to uart_task().

So IMHO volatile isn't necessary here. And critical sections (i.e.
disabling interrupts) aren't useful too.

However I'm thinking about memory barrier. Suppose the compiler reorder
the instructions in uart_task() as follows:

�� c = rxfifo.buf[out % RXBUF_SIZE]
�� if (out != in) {
�� out++;
�� return c;
�� } else {
�� return -1;
�� }

Here there's a big problem, because compiler decided to firstly read
rxfifo.buf[] and then test in and out equality. If the ISR is fired
immediately after moving data to c (most probably an internal register), >>> the condition in the if statement will be true and the register value is >>> returned. However the register value isn't correct.

You are absolutely correct.

I don't think any modern C compiler reorder uart_task() in this way, but >>> we can't be sure. The result shouldn't change for the compiler, so it
can do this kind of things.

It is not an unreasonable re-arrangement.� On processors with
out-of-order execution (which does not apply to the AVR or Cortex-M),
compilers will often push loads as early as they can in the instruction
stream so that they start the cache loading process as quickly as
possible.� (But note that on such "big" processors, much of this
discussion on volatile and memory barriers is not sufficient, especially
if there is more than one core.� You need atomics and fences, but that's
a story for another day.)

How to fix this issue if I want to be extremely sure the compiler will
not reorder this way? Applying volatile to rxfifo.in shouldn't help for
this, because compiler is allowed to reorder access of non volatile
variables yet[2].

The important thing about "volatile" is that it is /accesses/ that are
volatile, not objects.� A volatile object is nothing more than an object
for which all accesses are volatile by default.� But you can use
volatile accesses on non-volatile objects.� This macro is your friend:

#define volatileAccess(v) *((volatile typeof((v)) *) &(v))

(Linux has the same macro, called ACCESS_ONCE.� It uses a gcc extension
- if you are using other compilers then you can make an uglier
equivalent using _Generic.� However, if you are using a C compiler that
supports C11, it is probably gcc or clang, and you can use the "typeof"
extension.)

That macro will let you make a volatile read or write to an object
without requiring that /all/ accesses to it are volatile.

This is a good point. The code in ISR can't be interrupted, so there's
no need to have volatile access in ISR.

Correct. (Well, /almost/ correct - bigger microcontrollers have
multiple interrupt priorities. But it should be correct in this case,
as no other interrupt would be messing with the same variables anyway.)

One solution is adding a memory barrier in this way:

int uart_task(void) {
�� int c = -1;
�� if (out != in) {
�� memory_barrier();
�� c = rxfifo.buf[out % RXBUF_SIZE];
�� out++;
�� }
�� return -1;
}

Note that you are forcing the compiler to read "out" twice here, as it
can't keep the value of "out" in a register across the memory barrier.

Yes, you're right. A small penalty to avoid the problem of reordering.

But an unnecessary penalty.

(And as I mentioned before, the compiler might be able to do larger
scale optimisation across compilation units or functions, and in that
way keep values across multiple calls to uart_task.)

However this approach appears to me dangerous. You have to check and
double check if, when and where memory barriers are necessary and it's
simple to skip a barrier where it's nedded and add a barrier where it
isn't needed.

Memory barriers are certainly useful, but they are a shotgun approach -
they affect /everything/ involving reads and writes to memory.� (But
remember they don't affect ordering of calculations.)

So I'm thinking that a sub-optimal (regarding efficiency) but reliable
(regarding the risk to skip a barrier where it is needed) could be to
enter a critical section (disabling interrupts) anyway, if it isn't
strictly needed.

int uart_task(void) {
�� ENTER_CRITICAL_SECTION();
�� int c = -1;
�� if (out != in) {
�� c = rxfifo.buf[out % RXBUF_SIZE];
�� out++;
�� }
�� EXIT_CRITICAL_SECTION();
�� return -1;
}

Critical sections for something like this are /way/ overkill.� And a
critical section with a division in the middle?� Not a good idea.

Another solution could be to apply volatile keyword to rxfifo.in *AND*
rxfifo.buf too, so compiler can't change the order of accesses them.

Do you have other suggestions?

Marking "in" and "buf" as volatile is /far/ better than using a critical
section, and likely to be more efficient than a memory barrier.� You can
also use volatileAccess rather than making buf volatile, and it is often
slightly more efficient to cache volatile variables in a local variable
while working with them.

Yes, I think so too. Lastly I read many experts say volatile is often a
bad thing, so I'm re-thinking about its use compared with other approaches.

People who say "volatile is a bad thing" are often wrong. Remember, all generalisations are false :-)

"volatile" is a tool. It doesn't do everything that some people think
it does, but it is a very useful tool nonetheless. It has little place
in big systems - Linus Torvalds wrote a rant against it as being both
too much and too little, and in the context of writing Linux code, he
was correct. For Linux programming, you should be using OS-specific
features (which rely on "volatile" for their implementation) or atomics,
rather than using "volatile" directly.

But for small-systems embedded programming, it is very handy. Used
well, it is free - used excessively it has a cost, but an extra volatile
will not make an otherwise correct program fail.

Memory barriers are great for utility functions such as interrupt enable/disable inline functions, but are usually sub-optimal compared to specific and targeted volatile accesses.

[1] https://barrgroup.com/embedded-systems/how-to/c-volatile-keyword
[2] https://blog.regehr.org/archives/28

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From pozz@21:1/5 to All on Sun Oct 24 17:39:07 2021

Il 24/10/2021 13:02, David Brown ha scritto:

On 23/10/2021 22:49, pozz wrote:

Il 23/10/2021 18:09, David Brown ha scritto:

On 23/10/2021 00:07, pozz wrote:

Even I write software for embedded systems for more than 10 years,
there's an argument that from time to time let me think for hours and
leave me with many doubts.

It's nice to see a thread like this here - the group needs such
discussions!

Consider a simple embedded system based on a MCU (AVR8 or Cortex-Mx).
The software is bare metal, without any OS. The main pattern is the well >>>> known mainloop (background code) that is interrupted by ISR.

Interrupts are used mainly for timings and for low-level driver. For
example, the UART reception ISR move the last received char in a FIFO
buffer, while the mainloop code pops new data from the FIFO.

static struct {
�� unsigned char buf[RXBUF_SIZE];
�� uint8_t in;
�� uint8_t out;
} rxfifo;

/* ISR */
void uart_rx_isr(void) {
�� unsigned char c = UART->DATA;
�� rxfifo.buf[in % RXBUF_SIZE] = c;
�� rxfifo.in++;

Unless you are sure that RXBUF_SIZE is a power of two, this is going to
be quite slow on an AVR.� Modulo means division, and while division by a >>> constant is usually optimised to a multiplication by the compiler, you
still have a multiply, a shift, and some compensation for it all being
done as signed integer arithmetic.

It's also wrong, for non-power of two sizes, since the wrapping of your
increment and your modulo RXBUF_SIZE get out of sync.

Yes, RXBUF_SIZE is a power of two.

If your code relies on that, make sure the code will fail to compile if
it is not the case. Documentation is good, compile-time check is better:

static_assert((RXBUF_SIZE & (RXBUF_SIZE - 1)) == 0, "Needs power of 2");

Good point.

The usual choice is to track "head" and "tail", and use something like:

void uart_rx_isr(void) {
�� unsigned char c = UART->DATA;
�� // Reset interrupt flag
�� uint8_t next = rxfifo.tail;
�� rxfifo.buf[next] = c;
�� next++;
�� if (next >= RXBUF_SIZE) next -= RXBUF_SIZE;
�� rxfifo.tail = next;
}

This isn't the point of this thread, anyway...
You insist that tail is always in the range [0...RXBUF_SIZE - 1]. My
approach is different.

RXBUF_SIZE is a power of two, usualy <=256. head and tail are uint8_t
and *can* reach the maximum value of 255, even RXBUF_SIZE is 128. All
works well.

Yes, your approach will work - /if/ you have a power-of-two buffer size.
It has no noticeable efficiency advantages, merely an extra
inconvenient restriction and the possible confusion caused by doing
things in a different way from common idioms.

However, this is not the point of the thread - so I am happy to leave
that for now.

If you want, we can start a small [OT].

I know my ring-buffer implementation has the restriction of having a
buffer with a power-of-two size. However I like it, because I can avoid introducing a new variable (actual number of elements in the buffer) or
waste an element to solve the ambiguity when the buffer is full or empty.
</OT>

Suppose rxfifo.in=rxfifo.out=127, FIFO is empty. When a new char is
received, it is saved into rxfifo.buf[127 % 128=127] and rxfifo.in will
be increased to 128.
Now mainloop detect the new char (in != out), reads the new char at
rxfifo.buf[127 % 128=127] and increase out that will be 128.

The next byte will be saved into rxfifo.rxbuf[rxfifo.in % 128=128 % 128
= 0] and rxfifo.in will be 129. Again, the next byte will be saved to
rxbuf[rxfifo.in % 128=129 % 128=1] and rxfifo.in will be 130.

When the mainloop tries to pop data from fifo, it tests

�� rxfifo.in(130) !=rxfifo.out(128)

The test is true, so the code extracts chars from rxbuf[out % 128] that
is rxbuf[0]... and so on.

I hope that explanation is good.

�� // Reset interrupt flag
}

/* Called regularly from mainloop code */
int uart_task(void) {
�� int c = -1;
�� if (out != in) {
�� c = rxfifo.buf[out % RXBUF_SIZE];
�� out++;
�� }
�� return -1;
}

int uart_task(void) {
�� int c = -1;
�� uint8_t next = rxfifo.head;
�� if (next != rxfifo.tail) {
�� c = rxfifo.buf[next];
�� next++;
�� if (next >= RXBUF_SIZE) next -= RXBUF_SIZE;
�� rxfifo.head = next;
�� }
�� return c;
}

These don't track buffer overflow at all - you need to call uart_task()
often enough to avoid that.

Sure, with a good number for RXBUF_SIZE, buffer overflow shouldn't
happen ever. Anyway, if it happens, the higher level layers (protocol)
should detect a corrupted packet.

You risk getting seriously out of sync if there is an overflow.
Normally, on an overflow there will be a dropped character or two (which
as you say, must be caught at a higher level). Here you could end up
going round your buffer an extra time and /gaining/ RXBUF_SIZE extra characters.

Still, if you are sure that your functions are called fast enough so
that overflow is not a concern, then that's fine. Extra code to check
for a situation that can't occur is not helpful.

(I'm skipping volatiles so we don't get ahead of your point.)

�From a 20-years old article[1] by Nigle Jones, this seems a situation >>>> where volatile must be used for rxfifo.in, because is modified by an ISR >>>> and used in the mainloop code.

Certainly whenever data is shared between ISR's and mainloop code, or
different threads, then you need to think about how to make sure data is >>> synchronised and exchanged.� "volatile" is one method, atomics are
another, and memory barriers can be used.

I don't think so, rxfifo.in is read from memory only one time in
uart_task(), so there isn't the risk that compiler can optimize badly.

That is incorrect in two ways.� One - baring compiler bugs (which do
occur, but they are very rare compared to user bugs), there is no such
thing as "optimising badly".� If optimising changes the behaviour of the >>> code, other than its size and speed, the code is wrong.

Yes of course, but I don't think the absence of volatile for rxfifo.in,
even if it can change in ISR, could be a *real* problem with *modern and
current* compilers.

Personally, I am not satisfied with "it's unlikely to be a problem in practice" - I prefer "The language guarantees it is not a problem".
Remember, when you know the data needs to be read at this point, then
using a volatile read is free. Volatile does not make code less
efficient unless you use it incorrectly and force more accesses than are necessary. So using volatile accesses for "rxfifo.in" here turns
"probably safe" into "certainly safe" without cost. What's not to like?

voltile attribute needs to avoid compiler optimization (that would be a
bad thing, because of volatile nature of the variabile), but on that
code it's difficult to think of an optimization, caused by the absence
of volatile, that changes the behaviour erroneously... except reorering.

Two - it is a
very bad idea to imagine that having code inside a function somehow
"protects" it from re-ordering or other optimisation.

I didn't say this, at the contrary I was thinking exactly to reordering
issues.

Functions can be inlined, outlined, cloned, and shuffled about.
Link-time optimisation, code in headers, C++ modules, and other
cross-unit optimisations are becoming more and more common.� So while it >>> might be true /today/ that the compiler has no alternative but to read
rxfifo.in once per call to uart_task(), you cannot assume that will be
the case with later compilers or with more advanced optimisation
techniques enabled.� It is safer, more portable, and more future-proof
to avoid such assumptions.

Ok, you are talking of future scenarios. I don't think actually this
could be a real problem. Anyway your observation makes sense.

Even if ISR is fired immediately after the if statement, this doesn't
bring to a dangerous state: the just received data will be processed at >>>> the next call to uart_task().

So IMHO volatile isn't necessary here. And critical sections (i.e.
disabling interrupts) aren't useful too.

However I'm thinking about memory barrier. Suppose the compiler reorder >>>> the instructions in uart_task() as follows:

�� c = rxfifo.buf[out % RXBUF_SIZE]
�� if (out != in) {
�� out++;
�� return c;
�� } else {
�� return -1;
�� }

Here there's a big problem, because compiler decided to firstly read
rxfifo.buf[] and then test in and out equality. If the ISR is fired
immediately after moving data to c (most probably an internal register), >>>> the condition in the if statement will be true and the register value is >>>> returned. However the register value isn't correct.

You are absolutely correct.

I don't think any modern C compiler reorder uart_task() in this way, but >>>> we can't be sure. The result shouldn't change for the compiler, so it
can do this kind of things.

It is not an unreasonable re-arrangement.� On processors with
out-of-order execution (which does not apply to the AVR or Cortex-M),
compilers will often push loads as early as they can in the instruction
stream so that they start the cache loading process as quickly as
possible.� (But note that on such "big" processors, much of this
discussion on volatile and memory barriers is not sufficient, especially >>> if there is more than one core.� You need atomics and fences, but that's >>> a story for another day.)

How to fix this issue if I want to be extremely sure the compiler will >>>> not reorder this way? Applying volatile to rxfifo.in shouldn't help for >>>> this, because compiler is allowed to reorder access of non volatile
variables yet[2].

The important thing about "volatile" is that it is /accesses/ that are
volatile, not objects.� A volatile object is nothing more than an object >>> for which all accesses are volatile by default.� But you can use
volatile accesses on non-volatile objects.� This macro is your friend:

#define volatileAccess(v) *((volatile typeof((v)) *) &(v))

(Linux has the same macro, called ACCESS_ONCE.� It uses a gcc extension
- if you are using other compilers then you can make an uglier
equivalent using _Generic.� However, if you are using a C compiler that
supports C11, it is probably gcc or clang, and you can use the "typeof"
extension.)

That macro will let you make a volatile read or write to an object
without requiring that /all/ accesses to it are volatile.

This is a good point. The code in ISR can't be interrupted, so there's
no need to have volatile access in ISR.

Correct. (Well, /almost/ correct - bigger microcontrollers have
multiple interrupt priorities. But it should be correct in this case,
as no other interrupt would be messing with the same variables anyway.)

Yes, static variables defined in a uart driver aren't accessed from
other interrupts.

One solution is adding a memory barrier in this way:

int uart_task(void) {
�� int c = -1;
�� if (out != in) {
�� memory_barrier();
�� c = rxfifo.buf[out % RXBUF_SIZE];
�� out++;
�� }
�� return -1;
}

Note that you are forcing the compiler to read "out" twice here, as it
can't keep the value of "out" in a register across the memory barrier.

Yes, you're right. A small penalty to avoid the problem of reordering.

But an unnecessary penalty.

(And as I mentioned before, the compiler might be able to do larger
scale optimisation across compilation units or functions, and in that
way keep values across multiple calls to uart_task.)

However this approach appears to me dangerous. You have to check and
double check if, when and where memory barriers are necessary and it's >>>> simple to skip a barrier where it's nedded and add a barrier where it
isn't needed.

Memory barriers are certainly useful, but they are a shotgun approach -
they affect /everything/ involving reads and writes to memory.� (But
remember they don't affect ordering of calculations.)

So I'm thinking that a sub-optimal (regarding efficiency) but reliable >>>> (regarding the risk to skip a barrier where it is needed) could be to
enter a critical section (disabling interrupts) anyway, if it isn't
strictly needed.

int uart_task(void) {
�� ENTER_CRITICAL_SECTION();
�� int c = -1;
�� if (out != in) {
�� c = rxfifo.buf[out % RXBUF_SIZE];
�� out++;
�� }
�� EXIT_CRITICAL_SECTION();
�� return -1;
}

Critical sections for something like this are /way/ overkill.� And a
critical section with a division in the middle?� Not a good idea.

Another solution could be to apply volatile keyword to rxfifo.in *AND* >>>> rxfifo.buf too, so compiler can't change the order of accesses them.

Do you have other suggestions?

Marking "in" and "buf" as volatile is /far/ better than using a critical >>> section, and likely to be more efficient than a memory barrier.� You can >>> also use volatileAccess rather than making buf volatile, and it is often >>> slightly more efficient to cache volatile variables in a local variable
while working with them.

Yes, I think so too. Lastly I read many experts say volatile is often a
bad thing, so I'm re-thinking about its use compared with other approaches. >>

People who say "volatile is a bad thing" are often wrong. Remember, all generalisations are false :-)

Ok, I wrote "volatile is **often** a bad thing".

"volatile" is a tool. It doesn't do everything that some people think
it does, but it is a very useful tool nonetheless. It has little place
in big systems - Linus Torvalds wrote a rant against it as being both
too much and too little, and in the context of writing Linux code, he
was correct. For Linux programming, you should be using OS-specific
features (which rely on "volatile" for their implementation) or atomics, rather than using "volatile" directly.

But for small-systems embedded programming, it is very handy. Used
well, it is free - used excessively it has a cost, but an extra volatile
will not make an otherwise correct program fail.

Memory barriers are great for utility functions such as interrupt enable/disable inline functions, but are usually sub-optimal compared to specific and targeted volatile accesses.

Just to say what I read:

https://blog.regehr.org/archives/28

https://mcuoneclipse.com/2021/10/12/spilling-the-beans-volatile-qualifier/

[1] https://barrgroup.com/embedded-systems/how-to/c-volatile-keyword
[2] https://blog.regehr.org/archives/28

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to pozz on Sun Oct 24 18:37:16 2021

On 24/10/2021 17:39, pozz wrote:

Il 24/10/2021 13:02, David Brown ha scritto:

<snipping to save on electrons>

People who say "volatile is a bad thing" are often wrong.� Remember, all
generalisations are false :-)

Ok, I wrote "volatile is **often** a bad thing".

:-)

"volatile" is a tool.� It doesn't do everything that some people think
it does, but it is a very useful tool nonetheless.� It has little place
in big systems - Linus Torvalds wrote a rant against it as being both
too much and too little, and in the context of writing Linux code, he
was correct.� For Linux programming, you should be using OS-specific
features (which rely on "volatile" for their implementation) or atomics,
rather than using "volatile" directly.

But for small-systems embedded programming, it is very handy.� Used
well, it is free - used excessively it has a cost, but an extra volatile
will not make an otherwise correct program fail.

Memory barriers are great for utility functions such as interrupt
enable/disable inline functions, but are usually sub-optimal compared to
specific and targeted volatile accesses.

Just to say what I read:

https://blog.regehr.org/archives/28

Yes, you had that in your footnotes - and that is not a bad article.
(It's better than a lot of that guy's stuff - he also writes a lot of
crap about undefined behaviour and the evils of optimisation.)

https://mcuoneclipse.com/2021/10/12/spilling-the-beans-volatile-qualifier/

That article is mostly wrong - or at best, inappropriate for what you
are doing. Critical sections are a massive over-kill for a ring buffer
in most cases. If you intend to call your uart_task() from multiple
places in a re-entrant manner (e.g., multiple RTOS threads), then you
have a lot more challenges to deal with than just the ordering of the
accesses to "in", "out" and "buf" - you can't just throw in a critical
section and hope it's all okay. And if you are /not/ doing such a daft
thing, then critical sections are completely unnecessary.

[1] https://barrgroup.com/embedded-systems/how-to/c-volatile-keyword >>>>> [2] https://blog.regehr.org/archives/28

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Don Y@21:1/5 to All on Sun Oct 24 12:54:39 2021

On 10/24/2021 4:14 AM, Dimiter_Popoff wrote:

Disable interrupts while accessing the fifo. you really have to.
alternatively you'll often get away not using a fifo at all,
unless you're blocking for a long while in some part of the code.

Why would you do that. The fifo write pointer is only modified by
the interrupt handler, the read pointer is only modified by the
interrupted code. Has been done so for times immemorial.

The OPs code doesn't differentiate between FIFO full and empty.
So, *he* may gain some advantage from disabling interrupts to
ensure the character he is about to retrieve is n ot overwritten
by an incoming character, placed at that location (cuz he lets
his FIFO wrap indiscriminately).

And, if the offsets ever got larger (wider) -- or became actual
pointers -- then the possibility of PART of a value being updated
on either "side" of an ISR is also possible.

And, there's nothing to say the OP has disclosed EVERYTHING
that might be happening in his ISR (maintaining handshaking signals,
flow control, etc.) which could compound the references
(e.g., if you need to know that you have space for N characters
remaining so you can signal the remote device to stop sending,
then you're doing "pointer/offset arithmetic" and *acting* on the
result)

Although this thread is on how to wrestle a poor
language to do what you want, sort of how to use a hammer on a screw
instead of taking the screwdriver, there would be no need to
mask interrupts with C either.

The "problem" with the language is that it gives the compiler the freedom
to make EQUIVALENT changes to your code that you might not have foreseen
or that might not have been consistent with your "style" -- yet do not
alter the results.

For example, you might want to write:
x = <expr1>
y = <expr2>
just because of some "residual OCD" that makes you think in terms of
"x before y". Yet, there may be no dependencies in those statements
that *require* that ordering. So, why should the compiler be crippled
to implementing them in that order if it has found a way to alter
their order (or their actual content)?

A correctly written compiler will follow a set of rules that it *knows*
to be safe "code translations"; but many developers don't have a similar understanding of those; so the problem lies in the developer's skillset,
not the compiler or language.

After all, a programming language -- ANY programming language -- is just
a vehicle for conveying our desires to the machine in a semi-unambiguous manner. I'd much rather *SAY*, "What are the roots of ax^2 + bx + c?"
than have to implement an algorithmic solution, worry about cancellation, required precision, etc. (and, in some languages, you can do just that!)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Dimiter_Popoff@21:1/5 to Don Y on Sun Oct 24 23:27:43 2021

On 10/24/2021 22:54, Don Y wrote:

On 10/24/2021 4:14 AM, Dimiter_Popoff wrote:

Disable interrupts while accessing the fifo. you really have to.
alternatively you'll often get away not using a fifo at all,
unless you're blocking for a long while in some part of the code.

Why would you do that. The fifo write pointer is only modified by
the interrupt handler, the read pointer is only modified by the
interrupted code. Has been done so for times immemorial.

The OPs code doesn't differentiate between FIFO full and empty.

So he should fix that first, there is no sane reason why not.
Few things are simpler to do than that.

So, *he* may gain some advantage from disabling interrupts to
ensure the character he is about to retrieve is n ot overwritten
by an incoming character, placed at that location (cuz he lets
his FIFO wrap indiscriminately).

And, if the offsets ever got larger (wider) -- or became actual
pointers -- then the possibility of PART of a value being updated
on either "side" of an ISR is also possible.

And, there's nothing to say the OP has disclosed EVERYTHING
that might be happening in his ISR (maintaining handshaking signals,
flow control, etc.) which could compound the references
(e.g., if you need to know that you have space for N characters
remaining so you can signal the remote device to stop sending,
then you're doing "pointer/offset arithmetic" and *acting* on the
result)

Whatever handshakes he makes there is no problem knowing whether
the fifo is full - just check if the position the write pointer
will have after putting the next byte matches the read pointer
at the moment. Like I said before, few things are simpler than
that, can't imagine someone working as a programmer being
stuck at *that*.

Although this thread is on how to wrestle a poor
language to do what you want, sort of how to use a hammer on a screw
instead of taking the screwdriver, there would be no need to
mask interrupts with C either.

The "problem" with the language is that it gives the compiler the freedom
to make EQUIVALENT changes to your code that you might not have foreseen
or that might not have been consistent with your "style" -- yet do not
alter the results.

Don, let us not go into this. Just looking at the thread is enough to
see it is about wrestling the language so it can be made some use of.

After all, a programming language -- ANY programming language -- is just
a vehicle for conveying our desires to the machine in a semi-unambiguous manner.� I'd much rather *SAY*, "What are the roots of ax^2 + bx + c?"
than have to implement an algorithmic solution, worry about cancellation, required precision, etc.� (and, in some languages, you can do just that!)

Indeed you don't want to write how the equation is solved every time.
This is why you can call it once you have it available. This is language independent.
Then solving expressions etc. is well within 1% of the effort in
programming if the task at hand is going to take > 2 weeks; after that
the programmer's time is wasted on wrestling the language like
demonstrated by this thread. Sadly almost everybody has accepted
C as a standard - which makes it a very popular poor language.
Similar to say Chinese, very popular, spoken by billions, yet
where are its literary masterpieces. Being hieroglyph based there
are none; you will have to look at alphabet based languages to
find some.

======================================================
Dimiter Popoff, TGI http://www.tgi-sci.com ====================================================== http://www.flickr.com/photos/didi_tgi/

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Don Y@21:1/5 to All on Sun Oct 24 14:08:13 2021

On 10/24/2021 1:27 PM, Dimiter_Popoff wrote:

On 10/24/2021 22:54, Don Y wrote:

On 10/24/2021 4:14 AM, Dimiter_Popoff wrote:

Disable interrupts while accessing the fifo. you really have to.
alternatively you'll often get away not using a fifo at all,
unless you're blocking for a long while in some part of the code.

Why would you do that. The fifo write pointer is only modified by
the interrupt handler, the read pointer is only modified by the
interrupted code. Has been done so for times immemorial.

The OPs code doesn't differentiate between FIFO full and empty.

So he should fix that first, there is no sane reason why not.
Few things are simpler to do than that.

Yes, I pointed that out earlier, to him. Why worry about what the
compiler *might* do if you haven't sorted out what you really WANT to do?

So, *he* may gain some advantage from disabling interrupts to
ensure the character he is about to retrieve is n ot overwritten
by an incoming character, placed at that location (cuz he lets
his FIFO wrap indiscriminately).

And, if the offsets ever got larger (wider) -- or became actual
pointers -- then the possibility of PART of a value being updated
on either "side" of an ISR is also possible.

And, there's nothing to say the OP has disclosed EVERYTHING
that might be happening in his ISR (maintaining handshaking signals,
flow control, etc.) which could compound the references
(e.g., if you need to know that you have space for N characters
remaining so you can signal the remote device to stop sending,
then you're doing "pointer/offset arithmetic" and *acting* on the
result)

Whatever handshakes he makes there is no problem knowing whether
the fifo is full - just check if the position the write pointer
will have after putting the next byte matches the read pointer
at the moment. Like I said before, few things are simpler than
that, can't imagine someone working as a programmer being
stuck at *that*.

Yes, but if you want to implement flow control, you have to tell the
other end of the line BEFORE you've filled your buffer. There may be
a character being deserialized AS you are retrieving the "last"
character, another one (or more) preloaded into the transmitter on
the far device, etc. And, it will take some time for your
notification to reach the far end and be recognized as a desire
to suspend transmission. etc.

If you wait until you have no more space available, you are almost
certain to lose characters.

Although this thread is on how to wrestle a poor
language to do what you want, sort of how to use a hammer on a screw
instead of taking the screwdriver, there would be no need to
mask interrupts with C either.

The "problem" with the language is that it gives the compiler the freedom
to make EQUIVALENT changes to your code that you might not have foreseen
or that might not have been consistent with your "style" -- yet do not
alter the results.

Don, let us not go into this. Just looking at the thread is enough to
see it is about wrestling the language so it can be made some use of.

The language isn't the problem. Witness the *millions* (?) of programs
written in it, over the past 5 decades.

The problem is that it never was an assembly language -- even though it
was treated as such "in days gone by" (because the compiler's were
just "language translators" and didn't add any OTHER value to the
"programming process").

It's only recently that compilers have become "independent agents",
of a sort... adding their own "spin" on the developer's code.

And, with more capable hardware (multiple cores/threads) being "dirt cheap", it's a lot easier for a developer to find himself in a situation that
was previously pie-in-the-sky.

After all, a programming language -- ANY programming language -- is just
a vehicle for conveying our desires to the machine in a semi-unambiguous
manner. I'd much rather *SAY*, "What are the roots of ax^2 + bx + c?"
than have to implement an algorithmic solution, worry about cancellation,
required precision, etc. (and, in some languages, you can do just that!)

Indeed you don't want to write how the equation is solved every time.
This is why you can call it once you have it available. This is language independent.

For a simple quadratic, you can explore the coefficients to determine which algorithm is best suited to giving you *accurate* results.

What if I present *any* expression? Can you have your solution available
to handle any case? Did you even bother to develop such a solution if you
were only encountering quadratics?

Then solving expressions etc. is well within 1% of the effort in
programming if the task at hand is going to take > 2 weeks; after that
the programmer's time is wasted on wrestling the language like
demonstrated by this thread. Sadly almost everybody has accepted
C as a standard - which makes it a very popular poor language.

It makes it *popular* but concluding that it is "poor" is an overreach.

There are (and have been) many "safer" languages. Many that are more descriptive (for certain classes of problem). But, C has survived to
handle all-of-the-above... perhaps in a suboptimal way but at least
a manner that can get to the desired solution.

Look at how few applications SNOBOL handles. Write an OS in COBOL? Ada?

A tool is only effective if it solves real problems. Under real cost
and time constraints. There are lots of externalities that come into
play in that analysis.

I've made some syntactic changes to my code that make it much easier
to read -- yet mean that I have to EXPLAIN how they work and why they
are present as any other developer would frown on encountering them.
(But, it's my opinion that, once explained, that developer will see them
as an efficient addition to the language in line with other *existing* mechanisms that are already present, there).

Similar to say Chinese, very popular, spoken by billions, yet
where are its literary masterpieces. Being hieroglyph based there
are none; you will have to look at alphabet based languages to
find some.

One can say the same thing about Unangax̂ -- spoken by ~100!
Popularity and literary masterpieces are completely different
axis.

Hear much latin or ancient greek spoken, recently?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Dimiter_Popoff@21:1/5 to Don Y on Mon Oct 25 00:50:29 2021

On 10/25/2021 0:08, Don Y wrote:

On 10/24/2021 1:27 PM, Dimiter_Popoff wrote:
....

Whatever handshakes he makes there is no problem knowing whether
the fifo is full - just check if the position the write pointer
will have after putting the next byte matches the read pointer
at the moment. Like I said before, few things are simpler than
that, can't imagine someone working as a programmer being
stuck at *that*.

Yes, but if you want to implement flow control, you have to tell the
other end of the line BEFORE you've filled your buffer. There may be
a character being deserialized AS you are retrieving the "last"
character, another one (or more) preloaded into the transmitter on
the far device, etc. And, it will take some time for your
notification to reach the far end and be recognized as a desire
to suspend transmission. etc.

If you wait until you have no more space available, you are almost
certain to lose characters.

Well of course so, we have all done that sort of thing since the 80-s,
other people have done it before I suppose. Implementing fifo thresholds
is not (and has never been) rocket science.
The point is there is no point in throwing huge efforts at a
self-inflicted problem instead of just doing it the easy way which
is well, common knowledge.

Although this thread is on how to wrestle a poor
language to do what you want, sort of how to use a hammer on a screw
instead of taking the screwdriver, there would be no need to
mask interrupts with C either.

The "problem" with the language is that it gives the compiler the
freedom
to make EQUIVALENT changes to your code that you might not have foreseen >>> or that might not have been consistent with your "style" -- yet do not
alter the results.

Don, let us not go into this. Just looking at the thread is enough to
see it is about wrestling the language so it can be made some use of.

The language isn't the problem. Witness the *millions* (?) of programs written in it, over the past 5 decades.

This does not prove much, it has been the only language allowing
"everybody" to do what they did. I am not denying this is the best
language currently available to almost everybody. I just happened to
have been daring enough to explore my own way/language and have seen
how much is there to be gained if not having to wrestle a language
which is just a more complete phrase book than the rest of the
phrase books (aka high level languages).

Indeed you don't want to write how the equation is solved every time.
This is why you can call it once you have it available. This is language
independent.

For a simple quadratic, you can explore the coefficients to determine which algorithm is best suited to giving you *accurate* results. >
What if I present *any* expression? Can you have your solution available
to handle any case? Did you even bother to develop such a solution if you were only encountering quadratics?

Any expression solver has its limitations, why go into that? Mine (in
the dps environment) can do all arithmetic and logic for
integers, the fp can do all arithmetic, knows pi, e, haven't needed
to expand it for years. And again, solving expressions has never taken
me any significant part of the effort on a project.

I've made some syntactic changes to my code that make it much easier
to read -- yet mean that I have to EXPLAIN how they work and why they
are present as any other developer would frown on encountering them.

Oh I am well aware of the value of standardization and popularity,
these are the strongest points of C.

(But, it's my opinion that, once explained, that developer will see them
as an efficient addition to the language in line with other *existing* mechanisms that are already present, there).

Of course, but you have to have them on board first...

Similar to say Chinese, very popular, spoken by billions, yet
where are its literary masterpieces. Being hieroglyph based there
are none; you will have to look at alphabet based languages to
find some.

One can say the same thing about Unangax̂ -- spoken by ~100!
Popularity and literary masterpieces are completely different
axis.

Hear much latin or ancient greek spoken, recently?

The Latin alphabet looks pretty popular nowadays :-). Everything
evolves, including languages. And there are dead ends within them
which just die out - e.g. roman numbers. Can't see much future in
any hieroglyph based language though, inventing a symbol for each
word has been demonstrated to be a bad idea by history.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Don Y@21:1/5 to All on Sun Oct 24 15:47:39 2021

On 10/24/2021 2:50 PM, Dimiter_Popoff wrote:

On 10/25/2021 0:08, Don Y wrote:

On 10/24/2021 1:27 PM, Dimiter_Popoff wrote:
....

Whatever handshakes he makes there is no problem knowing whether
the fifo is full - just check if the position the write pointer
will have after putting the next byte matches the read pointer
at the moment. Like I said before, few things are simpler than
that, can't imagine someone working as a programmer being
stuck at *that*.

Yes, but if you want to implement flow control, you have to tell the
other end of the line BEFORE you've filled your buffer. There may be
a character being deserialized AS you are retrieving the "last"
character, another one (or more) preloaded into the transmitter on
the far device, etc. And, it will take some time for your
notification to reach the far end and be recognized as a desire
to suspend transmission. etc.

If you wait until you have no more space available, you are almost
certain to lose characters.

Well of course so, we have all done that sort of thing since the 80-s,
other people have done it before I suppose. Implementing fifo thresholds
is not (and has never been) rocket science.
The point is there is no point in throwing huge efforts at a
self-inflicted problem instead of just doing it the easy way which
is well, common knowledge.

*My* point (to the OP) was that you need to understand what you
will be doing before you can understand the "opportunities"
the compiler will have to catch you off guard.

Although this thread is on how to wrestle a poor
language to do what you want, sort of how to use a hammer on a screw >>>>> instead of taking the screwdriver, there would be no need to
mask interrupts with C either.

The "problem" with the language is that it gives the compiler the freedom >>>> to make EQUIVALENT changes to your code that you might not have foreseen >>>> or that might not have been consistent with your "style" -- yet do not >>>> alter the results.

Don, let us not go into this. Just looking at the thread is enough to
see it is about wrestling the language so it can be made some use of.

The language isn't the problem. Witness the *millions* (?) of programs
written in it, over the past 5 decades.

This does not prove much, it has been the only language allowing
"everybody" to do what they did.

ASM has always been available. Folks just found it too inefficient
to solve "big" problems, in reasonable effort.

I am not denying this is the best
language currently available to almost everybody. I just happened to
have been daring enough to explore my own way/language and have seen
how much is there to be gained if not having to wrestle a language
which is just a more complete phrase book than the rest of the
phrase books (aka high level languages).

But you only have yourself as a client. Most of us have to write code
(or modify already written code) that others will see/maintain. It
does no good to have a "great tool" if no one else uses it!

I use (scant!) ASM, a modified ("proprietary") C dialect, SQL and a scripting language in my current design. (not counting the tools that generate my documentation).

This is a LOT to expect a developer to have a firm grasp of. But, inventing
a new language that will address all of these needs would be even moreso!
At least one can find books/documentation describing each of these individual languages *and* likely find folks with proficiency in each of them. So, I
can spend my efforts describing "how things work" instead of the details
of how to TELL them to work.

I've made some syntactic changes to my code that make it much easier
to read -- yet mean that I have to EXPLAIN how they work and why they
are present as any other developer would frown on encountering them.

Oh I am well aware of the value of standardization and popularity,
these are the strongest points of C.

(But, it's my opinion that, once explained, that developer will see them
as an efficient addition to the language in line with other *existing*
mechanisms that are already present, there).

Of course, but you have to have them on board first...

Yes. They have to have incentive to want to use the codebase.
They'd not be keen on making any special effort to learn how
to modify a "program" that picks resistor values to form
voltage dividers.

And, even "well motivated", what you are asking them to
embrace has to be acceptable to their sense of reason.
E.g., expecting folks to adopt postfix notation just
because you chose to use it is probably a nonstarter
(i.e., "show me some OTHER reason that justifies its use!").

Or, the wonky operator set that APL uses...

Similar to say Chinese, very popular, spoken by billions, yet
where are its literary masterpieces. Being hieroglyph based there
are none; you will have to look at alphabet based languages to
find some.

One can say the same thing about Unangax̂ -- spoken by ~100!
Popularity and literary masterpieces are completely different
axis.

Hear much latin or ancient greek spoken, recently?

The Latin alphabet looks pretty popular nowadays :-). Everything
evolves, including languages. And there are dead ends within them
which just die out - e.g. roman numbers. Can't see much future in
any hieroglyph based language though, inventing a symbol for each
word has been demonstrated to be a bad idea by history.

Witness the rise of arabic numerals and their efficacy towards
advancing mathematics.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Dimiter_Popoff@21:1/5 to Don Y on Mon Oct 25 02:32:02 2021

On 10/25/2021 1:47, Don Y wrote:

...

ASM has always been available.

There is no such language as ASM, there is a wide variety of machines.

Folks just found it too inefficient
to solve "big" problems, in reasonable effort.

Especially with the advent of load/store machines (although C must have
been helped a lot by the clunky x86 architecture for its popularity), programming in the native assembler for any RISC machine would be
masochistic at best. Which is why I took the steps I took etc., no
need to go into that.

I am not denying this is the best
language currently available to almost everybody. I just happened to
have been daring enough to explore my own way/language and have seen
how much is there to be gained if not having to wrestle a language
which is just a more complete phrase book than the rest of the
phrase books (aka high level languages).

But you only have yourself as a client.

Yes, but this does not mean much. Looking at pieces I wrote 20 or
30 years ago - even 10 years ago sometimes - is like reading it
for the first time for many parts (tens of megabytes of sources, http://tgi-sci.com/misc/scnt21.gif ).

Most of us have to write code
(or modify already written code) that others will see/maintain. It
does no good to have a "great tool" if no one else uses it! >
I use (scant!) ASM, a modified ("proprietary") C dialect, SQL and a
scripting
language in my current design. (not counting the tools that generate my documentation).

Here comes the advantage of an "alphabet" rather than "hieroglyph" based approach/language. A lot less of lookup tables to memorize, you learn
while going etc. I am quite sure someone like you would get used to it
quite fast, much much faster than to an unknown high level language.
In fact it may take you very short to see it is something you have more
or less been familiar with forever.
Grasping the big picture of the entire environment and becoming
really good at writing within it would take longer, obviously.

....

Hear much latin or ancient greek spoken, recently?

The Latin alphabet looks pretty popular nowadays :-). Everything
evolves, including languages. And there are dead ends within them
which just die out - e.g. roman numbers. Can't see much future in
any hieroglyph based language though, inventing a symbol for each
word has been demonstrated to be a bad idea by history.

Witness the rise of arabic numerals and their efficacy towards
advancing mathematics.

Yes, another good example of how it is the foundation you step on
that really matters. Step on the Roman numbers and good luck with
your math...

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Don Y@21:1/5 to All on Sun Oct 24 18:34:07 2021

On 10/24/2021 4:32 PM, Dimiter_Popoff wrote:

On 10/25/2021 1:47, Don Y wrote:

...

ASM has always been available.

There is no such language as ASM, there is a wide variety of machines.

Of course there's a language called ASM! It's just target specific!
It is available for each different processor.

And highly NONportable!

Folks just found it too inefficient
to solve "big" problems, in reasonable effort.

Especially with the advent of load/store machines (although C must have
been helped a lot by the clunky x86 architecture for its popularity), programming in the native assembler for any RISC machine would be
masochistic at best. Which is why I took the steps I took etc., no
need to go into that.

I am not denying this is the best
language currently available to almost everybody. I just happened to
have been daring enough to explore my own way/language and have seen
how much is there to be gained if not having to wrestle a language
which is just a more complete phrase book than the rest of the
phrase books (aka high level languages).

But you only have yourself as a client.

Yes, but this does not mean much. Looking at pieces I wrote 20 or
30 years ago - even 10 years ago sometimes - is like reading it
for the first time for many parts (tens of megabytes of sources, http://tgi-sci.com/misc/scnt21.gif ).

Of course it means something! If someone else has to step into
your role *tomorrow*, there'd be little/no progress on your codebase
until they learned your toolchain/language.

An employer has to expect that any employee can "become unavailable"
at any time. And, with that, the labors for which they'd previously
paid, should still retain their value. I've had clients outright ask
me, "What happens if you get hit by a bus?"

Most of us have to write code
(or modify already written code) that others will see/maintain. It
does no good to have a "great tool" if no one else uses it! >
I use (scant!) ASM, a modified ("proprietary") C dialect, SQL and a scripting
language in my current design. (not counting the tools that generate my
documentation).

Here comes the advantage of an "alphabet" rather than "hieroglyph" based approach/language. A lot less of lookup tables to memorize, you learn
while going etc. I am quite sure someone like you would get used to it
quite fast, much much faster than to an unknown high level language.
In fact it may take you very short to see it is something you have more
or less been familiar with forever.
Grasping the big picture of the entire environment and becoming
really good at writing within it would take longer, obviously.

But that can be said of any HLL. That doesn't mean an employer
wants to pay you to *learn* (some *previous* employer was expected
to have done that!). They want to have to, at most, train you on
the needs of their applications/markets.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to All on Mon Oct 25 08:57:14 2021

On 24/10/2021 13:14, Dimiter_Popoff wrote:

On 10/24/2021 13:39, Johann Klammer wrote:

Disable interrupts while accessing the fifo. you really have to.
alternatively you'll often get away not using a fifo at all,
unless you're blocking for a long while in some part of the code.

Why would you do that. The fifo write pointer is only modified by
the interrupt handler, the read pointer is only modified by the
interrupted code. Has been done so for times immemorial.

Although this thread is on how to wrestle a poor
language to do what you want, sort of how to use a hammer on a screw
instead of taking the screwdriver, there would be no need to
mask interrupts with C either.

There's nothing wrong with the language here - C is perfectly capable of expressing what the OP needs. But getting the "volatile" usage optimal
here - enough to cover what you need, but not accidentally reducing the efficiency of the code - requires a bit of thought. "volatile" is often misunderstood in C, and it's good that the OP is asking to be sure. C
also has screwdrivers in its toolbox, they are just buried under all the hammers!

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Niklas Holsti@21:1/5 to Don Y on Mon Oct 25 10:56:08 2021

On 2021-10-25 0:08, Don Y wrote:

[snip]

There are (and have been) many "safer" languages. Many that are more descriptive (for certain classes of problem). But, C has survived to
handle all-of-the-above... perhaps in a suboptimal way but at least
a manner that can get to the desired solution.

Look at how few applications SNOBOL handles. Write an OS in COBOL? Ada?

I don't know about COBOL, but typically the real-time kernels ("run-time systems") associated with Ada compilers for bare-board embedded systems
are written in Ada, with a minor amount of assembly language for the
most HW-related bits like HW context saving and restoring. I'm pretty
sure that C-language OS kernels also use assembly for those things.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Niklas Holsti@21:1/5 to All on Mon Oct 25 11:09:07 2021

On 2021-10-24 23:27, Dimiter_Popoff wrote:

On 10/24/2021 22:54, Don Y wrote:

On 10/24/2021 4:14 AM, Dimiter_Popoff wrote:

Disable interrupts while accessing the fifo. you really have to.
alternatively you'll often get away not using a fifo at all,
unless you're blocking for a long while in some part of the code.

Why would you do that. The fifo write pointer is only modified by
the interrupt handler, the read pointer is only modified by the
interrupted code. Has been done so for times immemorial.

The OPs code doesn't differentiate between FIFO full and empty.

So he should fix that first, there is no sane reason why not.
Few things are simpler to do than that.

[snip]

Whatever handshakes he makes there is no problem knowing whether
the fifo is full - just check if the position the write pointer
will have after putting the next byte matches the read pointer
at the moment.� Like I said before, few things are simpler than
that, can't imagine someone working as a programmer being
stuck at *that*.

That simple check would require keeping a maximum of only N-1 entries in
the N-position FIFO buffer, and the OP explicitly said they did not want
to allocate an unused place in the buffer (which I think is unreasonable
of the OP, but that is only IMO).

The simple explanation for the N-1 limit is that the difference between
two wrap-around pointers into an N-place buffer has at most N different
values, while there are N+1 possible filling states of the buffer, from
empty (zero items) to full (N items).

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Don Y on Mon Oct 25 09:41:23 2021

On 24/10/2021 23:08, Don Y wrote:

The language isn't the problem. Witness the *millions* (?) of programs written in it, over the past 5 decades.

The problem is that it never was an assembly language -- even though it
was treated as such "in days gone by" (because the compiler's were
just "language translators" and didn't add any OTHER value to the "programming process").

No - the problem is that some people /thought/ it was supposed to be a
kind of assembly language. It's a people problem, not a language
problem. C has all you need to handle code such as the OP's - all it
takes is for people to understand that they need to use the right
features of the language.

It's only recently that compilers have become "independent agents",
of a sort... adding their own "spin" on the developer's code.

Baring bugs, compilers do what they are told - in the language
specified. If programmers don't properly understand the language they
are using, or think it means more than it does, that's the programmers
that are at fault - not the language or the compiler. If you go into a
French bakery and ask for horse dung instead of the end of a baguette,
that's /your/ fault - not the language's fault, and not the baker's fault.

Add to that, the idea that optimising compilers are new is equally silly.

The C language is defined in terms of an "abstract machine". The
generated code has the same effect "as if" it executed everything you
wrote - but the abstract machine and the real object code only
synchronise on the observable behaviour. In practice, that means
volatile accesses happen exactly as often, with exactly the values and
exactly the order that you gave in the code. Non-volatile accesses can
be re-ordered, re-arranged, combined, duplicated, or whatever.

This has been the situation since C was standardised and since more
advanced compilers arrived, perhaps 30 years ago.

C is what it is - a language designed long ago, but which turned out to
be surprisingly effective and long-lived. It's not perfect, but it is
pretty good and works well for many situations where you need low-level
coding or near-optimal efficiency. It's not as safe or advanced as many
new languages, and it is not a beginners' language - you have to know
what you are doing in order to write C code correctly. You have to
understand it and follow its rules, whether you like these rules or not.

Unfortunately, there are quite a few C programmers who /don't/ know
these rules. And there is a small but vocal fraction who /do/ know the
rules, but don't like them and feel the rules should therefore not apply
- and blame compilers, standards committees, and anyone else when things inevitably go wrong. Some people are always a problem, regardless of
the language!

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Don Y@21:1/5 to Niklas Holsti on Mon Oct 25 01:19:53 2021

On 10/25/2021 12:56 AM, Niklas Holsti wrote:

On 2021-10-25 0:08, Don Y wrote:

There are (and have been) many "safer" languages. Many that are more
descriptive (for certain classes of problem). But, C has survived to
handle all-of-the-above... perhaps in a suboptimal way but at least
a manner that can get to the desired solution.

Look at how few applications SNOBOL handles. Write an OS in COBOL? Ada?

I don't know about COBOL, but typically the real-time kernels ("run-time systems") associated with Ada compilers for bare-board embedded systems are written in Ada, with a minor amount of assembly language for the most HW-related bits like HW context saving and restoring. I'm pretty sure that C-language OS kernels also use assembly for those things.

Of course you *can* do these things. The question is how often
they are ACTUALLY done with these other languages.

"Suitability for a particular task" isn't often the criteria that is
used to make a selection -- for better or worse. There are countless
other factors that affect an implementation, depending on the environment
in which it is undertaken (e.g., designs from academia are considerably different than hobbyist designs which are different from commercial
designs which are...)

This is true of other disciplines, as well. How often do you think a hardware design follows a course heavily influenced by the "prejudices"/"preferences"
of the folks responsible for the design vs. the "best" approach to it?

Step back yet another level of abstraction and see that even the tools
chosen to perform those tasks are often not "optimally chosen".

If you are the sole entity involved in a decision making process, then
you've (typically) got /carte blanche/. But, in most cases, there are
other voices -- seats at the table -- that shape the final decisions. It
pays to lift one's head and see which way the wind is blowing, *today*...

[By the same token, expecting the past to mirror the present is equally
naive. People forget that tools and processes have evolved (in the 40+
years that I've been designing embedded products). And, that the isssues
folks now face often weren't issues when tools were "stupider" (I've
probably got $60K of obsolete compilers to prove this -- anyone written
any C on an 1802 recently? Or, a 2A03? 65816? Z180? 6809?) Don't
even *think* about finding an Ada compiler for them -- in the past!]

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Don Y@21:1/5 to Niklas Holsti on Mon Oct 25 01:28:08 2021

On 10/25/2021 1:09 AM, Niklas Holsti wrote:

On 2021-10-24 23:27, Dimiter_Popoff wrote:

On 10/24/2021 22:54, Don Y wrote:

On 10/24/2021 4:14 AM, Dimiter_Popoff wrote:

Disable interrupts while accessing the fifo. you really have to.
alternatively you'll often get away not using a fifo at all,
unless you're blocking for a long while in some part of the code.

Why would you do that. The fifo write pointer is only modified by
the interrupt handler, the read pointer is only modified by the
interrupted code. Has been done so for times immemorial.

The OPs code doesn't differentiate between FIFO full and empty.

So he should fix that first, there is no sane reason why not.
Few things are simpler to do than that.

[snip]

Whatever handshakes he makes there is no problem knowing whether
the fifo is full - just check if the position the write pointer
will have after putting the next byte matches the read pointer
at the moment. Like I said before, few things are simpler than
that, can't imagine someone working as a programmer being
stuck at *that*.

That simple check would require keeping a maximum of only N-1 entries in the N-position FIFO buffer, and the OP explicitly said they did not want to allocate an unused place in the buffer (which I think is unreasonable of the OP, but that is only IMO).

The simple explanation for the N-1 limit is that the difference between two wrap-around pointers into an N-place buffer has at most N different values, while there are N+1 possible filling states of the buffer, from empty (zero items) to full (N items).

But, again, that just deals with the "full check". The easiest way to do
this is just to check ".in" *after* advancement and inhibit the store if
it coincides with the ".out" value.

Checking for a "high water mark" to enable flow control requires more computation (albeit simple) as you have to accommodate the delays in
that notification reaching the remote sender (lest he continue
sending and overrun your buffer).

And, later noting when you've consumed enough of the FIFO's contents
to reach a "low water mark" and reenable the remote's transmissions.

[And, if you ever have to deal with more "established" protocols
that require the sequencing of specific control signals DURING
a transfer, the ISR quickly becomes very complex!]

When you start "fleshing out" an ISR in this way, you see the code
quickly becomes more involved than just pushing bytes into a buffer.
(and, this should give you pause to rethink what you are doing *in*
the ISR and what can best be handled out of that "precious"
environment)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Niklas Holsti@21:1/5 to Don Y on Mon Oct 25 11:52:02 2021

On 2021-10-25 11:19, Don Y wrote:

On 10/25/2021 12:56 AM, Niklas Holsti wrote:

On 2021-10-25 0:08, Don Y wrote:

There are (and have been) many "safer" languages. Many that are more
descriptive (for certain classes of problem). But, C has survived to
handle all-of-the-above... perhaps in a suboptimal way but at least
a manner that can get to the desired solution.

Look at how few applications SNOBOL handles. Write an OS in COBOL?
Ada?

I don't know about COBOL, but typically the real-time kernels
("run-time systems") associated with Ada compilers for bare-board
embedded systems are written in Ada, with a minor amount of assembly
language for the most HW-related bits like HW context saving and
restoring. I'm pretty sure that C-language OS kernels also use
assembly for those things.

Of course you *can* do these things.

Then I misunderstood your (rhetorical?) question.

The question is how often
they are ACTUALLY done with these other languages.

I don't find that question very interesting.

It is a typical chicken-and-egg, first-to-market conundrum. There is an enormous amount of status-quo-favouring friction in awareness,
education, tool availability, and legacy code.

[By the same token, expecting the past to mirror the present is equally naive. People forget that tools and processes have evolved (in the 40+ years that I've been designing embedded products). And, that the isssues folks now face often weren't issues when tools were "stupider" (I've
probably got $60K of obsolete compilers to prove this -- anyone written
any C on an 1802 recently? Or, a 2A03? 65816? Z180? 6809?) Don't even *think* about finding an Ada compiler for them -- in the past!]

Well, the Janus/Ada compiler was available for Z80 in its day. There are
also Ada compilers that use C as an intermediate language, with
applications for example on TI MSP430's, but those were probably not
available in the past ages you refer to.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Niklas Holsti@21:1/5 to Don Y on Mon Oct 25 12:06:04 2021

On 2021-10-25 11:28, Don Y wrote:

On 10/25/2021 1:09 AM, Niklas Holsti wrote:

On 2021-10-24 23:27, Dimiter_Popoff wrote:

On 10/24/2021 22:54, Don Y wrote:

On 10/24/2021 4:14 AM, Dimiter_Popoff wrote:

Disable interrupts while accessing the fifo. you really have to.
alternatively you'll often get away not using a fifo at all,
unless you're blocking for a long while in some part of the code.

Why would you do that. The fifo write pointer is only modified by
the interrupt handler, the read pointer is only modified by the
interrupted code. Has been done so for times immemorial.

The OPs code doesn't differentiate between FIFO full and empty.

So he should fix that first, there is no sane reason why not.
Few things are simpler to do than that.

�� [snip]

Whatever handshakes he makes there is no problem knowing whether
the fifo is full - just check if the position the write pointer
will have after putting the next byte matches the read pointer
at the moment.� Like I said before, few things are simpler than
that, can't imagine someone working as a programmer being
stuck at *that*.

That simple check would require keeping a maximum of only N-1 entries
in the N-position FIFO buffer, and the OP explicitly said they did not
want to allocate an unused place in the buffer (which I think is
unreasonable of the OP, but that is only IMO).

The simple explanation for the N-1 limit is that the difference
between two wrap-around pointers into an N-place buffer has at most N
different values, while there are N+1 possible filling states of the
buffer, from empty (zero items) to full (N items).

But, again, that just deals with the "full check".� The easiest way to do this is just to check ".in" *after* advancement and inhibit the store if
it coincides with the ".out" value.

Checking for a "high water mark" to enable flow control requires more computation (albeit simple) as you have to accommodate the delays in
that notification reaching the remote sender (lest he continue
sending and overrun your buffer).

And, later noting when you've consumed enough of the FIFO's contents
to reach a "low water mark" and reenable the remote's transmissions.

[And, if you ever have to deal with more "established" protocols
that require the sequencing of specific control signals DURING
a transfer, the ISR quickly becomes very complex!]

Of course. Perhaps you (Don) did not see that I was agreeing with your
position and objecting to the "it is very simple" stance of Dimiter (considering the OP's expressed constraints).

Personally I would use critical sections to avoid relying on delicate
reasoning about interleaved executions. And to allow for easy future complexification of the concurrent activities. The overhead of interrupt disabling and enabling is seldom significant when that can be done
directly without kernel calls.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Don Y@21:1/5 to Niklas Holsti on Mon Oct 25 02:35:17 2021

On 10/25/2021 2:06 AM, Niklas Holsti wrote:

On 2021-10-25 11:28, Don Y wrote:

On 10/25/2021 1:09 AM, Niklas Holsti wrote:

On 2021-10-24 23:27, Dimiter_Popoff wrote:

On 10/24/2021 22:54, Don Y wrote:

On 10/24/2021 4:14 AM, Dimiter_Popoff wrote:

Disable interrupts while accessing the fifo. you really have to. >>>>>>> alternatively you'll often get away not using a fifo at all,
unless you're blocking for a long while in some part of the code. >>>>>>

Why would you do that. The fifo write pointer is only modified by
the interrupt handler, the read pointer is only modified by the
interrupted code. Has been done so for times immemorial.

The OPs code doesn't differentiate between FIFO full and empty.

So he should fix that first, there is no sane reason why not.
Few things are simpler to do than that.

[snip]

Whatever handshakes he makes there is no problem knowing whether
the fifo is full - just check if the position the write pointer
will have after putting the next byte matches the read pointer
at the moment. Like I said before, few things are simpler than
that, can't imagine someone working as a programmer being
stuck at *that*.

That simple check would require keeping a maximum of only N-1 entries in the
N-position FIFO buffer, and the OP explicitly said they did not want to
allocate an unused place in the buffer (which I think is unreasonable of the
OP, but that is only IMO).

The simple explanation for the N-1 limit is that the difference between two >>> wrap-around pointers into an N-place buffer has at most N different values, >>> while there are N+1 possible filling states of the buffer, from empty (zero >>> items) to full (N items).

But, again, that just deals with the "full check". The easiest way to do
this is just to check ".in" *after* advancement and inhibit the store if
it coincides with the ".out" value.

Checking for a "high water mark" to enable flow control requires more
computation (albeit simple) as you have to accommodate the delays in
that notification reaching the remote sender (lest he continue
sending and overrun your buffer).

And, later noting when you've consumed enough of the FIFO's contents
to reach a "low water mark" and reenable the remote's transmissions.

[And, if you ever have to deal with more "established" protocols
that require the sequencing of specific control signals DURING
a transfer, the ISR quickly becomes very complex!]

Of course. Perhaps you (Don) did not see that I was agreeing with your position
and objecting to the "it is very simple" stance of Dimiter (considering the OP's expressed constraints).

Yes, but I was afraid the emphasis would shift away from the "more involved" case (by trivializing the "full" case).

Personally I would use critical sections to avoid relying on delicate reasoning
about interleaved executions. And to allow for easy future complexification of
the concurrent activities. The overhead of interrupt disabling and enabling is
seldom significant when that can be done directly without kernel calls.

We (developers, in general) tend to forget how often we cobble
together solutions from past implementations. And, as those past implementations tend to be lax when it comes to enumerating the
assumptions under which they were created, we end up propagating
a bunch of dubious qualifiers that ultimately affect the code's
performance and "correctness".

Someone (including ourselves) trying to pilfer code from THIS
project might incorrectly expect the ISR to protect against buffer
wrap. Or, implement flow control. Or, be designed for a higher
data rate than it actually sees (saw!) -- will the buffer size -- and
task() timing -- be adequate to handle burst transmissions at 115Kbaud?
If not, where is the upper bound? What if the CPU clock is changed?
Or, the processor load? ...

"Steal" several bits of code -- possibly from different projects -- and
you've got an assortment of such hidden assumptions, all willing to eat
your lunch! While you remain convinced that none of those things
can happen!

My first UART driver had to manage about a dozen control signals as
the standard had a different intent and interpretation in the mid 70's,
early 80's (anyone remember TWX? TELEX? DB25s?). Porting it forward
ended up with a bunch of "issues" that no longer applied. (e.g.,
RTS/CTS weren't originally used as flow control/handshaking signals as
they are commonly used, now). *Assuming* a letter revision of the standard
was benign wrt the timing of signal transitions was folly.

You only see these things when you lay out all of your assumptions
in the codebase. And, hope the next guy actually READS what you took
the time to WRITE!

[When I wrote my 9 track tape driver, I had ~200 lines of commentary
just explaining the role of the interface wrt the formatter, transports,
etc. E.g., when you can read reverse, seek forward, rewind, etc.
with multiple transports hung off that same interface. Otherwise,
an observant developer would falsely conclude that the driver was
riddled with bugs -- as it *facilitated* a multiplicity of
concurrent operations]

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Don Y@21:1/5 to Niklas Holsti on Mon Oct 25 02:50:04 2021

On 10/25/2021 1:52 AM, Niklas Holsti wrote:

On 2021-10-25 11:19, Don Y wrote:

On 10/25/2021 12:56 AM, Niklas Holsti wrote:

On 2021-10-25 0:08, Don Y wrote:

There are (and have been) many "safer" languages. Many that are more
descriptive (for certain classes of problem). But, C has survived to
handle all-of-the-above... perhaps in a suboptimal way but at least
a manner that can get to the desired solution.

Look at how few applications SNOBOL handles. Write an OS in COBOL? Ada? >>>

I don't know about COBOL, but typically the real-time kernels ("run-time >>> systems") associated with Ada compilers for bare-board embedded systems are >>> written in Ada, with a minor amount of assembly language for the most
HW-related bits like HW context saving and restoring. I'm pretty sure that >>> C-language OS kernels also use assembly for those things.

Of course you *can* do these things.

Then I misunderstood your (rhetorical?) question.

The question is how often
they are ACTUALLY done with these other languages.

I don't find that question very interesting.

Why not? If a tool isn't used for a purpose for which it *should*
be "ideal", you have to start wondering "why not?" Was it NOT
suited to the task? Was it too costly (money and/or experience)?
How do we not repeat that problem, going forward? I.e., is it
better to EVOLVE a language to acquire the characteristics of the
"better" one -- rather than trying to encourage people to
"jump ship"?

It is a typical chicken-and-egg, first-to-market conundrum. There is an enormous amount of status-quo-favouring friction in awareness, education, tool
availability, and legacy code.

Of course!

And, there is also the pressure of the market. Do *you* want to be The Guy
who tries something new and sinks a product's development or market release?
If your approach proves to be a big hit, will you benefit as much as you'd
LOSE if it was a flop?

[By the same token, expecting the past to mirror the present is equally
naive. People forget that tools and processes have evolved (in the 40+
years that I've been designing embedded products). And, that the isssues
folks now face often weren't issues when tools were "stupider" (I've
probably got $60K of obsolete compilers to prove this -- anyone written
any C on an 1802 recently? Or, a 2A03? 65816? Z180? 6809?) Don't
even *think* about finding an Ada compiler for them -- in the past!]

Well, the Janus/Ada compiler was available for Z80 in its day. There are also Ada compilers that use C as an intermediate language, with applications for example on TI MSP430's, but those were probably not available in the past ages
you refer to.

I recall JRT Pascal and PL/M as the "high level" languages, back then.
C compilers were notoriously bad. You could literally predict the
code that would be generated for any statement. The whole idea of
"peephole optimizers" looking for:
STORE A
LOAD A
sequences to elide is testament to how little global knowledge they
had of the code they were processing.

Performance? A skilled ASM coder could beat the generated code
(in time AND space) without breaking into a sweat.

And, you bought a compiler/assembler/linker/debugger for *each*
processor -- not just a simple command line switch to alter the
code generation, etc. Vendors might have a common codebase
for the tools but built each variant conditionally.

The limits of the language were largely influenced by the targeted
hardware -- "helper routines" to support longs, floats, etc.
("Oh, did you want that support to be *reentrant*? We assumed
there would be a single floating point accumulator used throughout
the code, not one per thread!") Different sizes of addresses
(e.g., for the Z180, you could have 16b "logical" addresses
and 24b physical addresses -- mapped into that logical space
by the compiler's runtime support and linkage editor.)

Portable code? Maybe -- with quite a bit of work!

Fast/small? Meh...

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Niklas Holsti on Mon Oct 25 13:49:02 2021

On 25/10/2021 10:52, Niklas Holsti wrote:

Well, the Janus/Ada compiler was available for Z80 in its day. There are
also Ada compilers that use C as an intermediate language, with
applications for example on TI MSP430's, but those were probably not available in the past ages you refer to.

Presumably there is gcc-based Ada for the msp430 (as there is for the
8-bit AVR)? There might not be a full library available, or possibly
some missing features in the language.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Niklas Holsti@21:1/5 to David Brown on Mon Oct 25 15:16:10 2021

On 2021-10-25 14:49, David Brown wrote:

On 25/10/2021 10:52, Niklas Holsti wrote:

Well, the Janus/Ada compiler was available for Z80 in its day. There are
also Ada compilers that use C as an intermediate language, with
applications for example on TI MSP430's, but those were probably not
available in the past ages you refer to.

Presumably there is gcc-based Ada for the msp430 (as there is for the
8-bit AVR)?

Indeed there seems to be one, or at least work towards one: https://sourceforge.net/p/msp430ada/wiki/Home/.

There might not be a full library available, or possibly
some missing features in the language.

Certainly. I think that Janus/Ada for the Z80 was limited to the
original Ada (Ada 83), and may well have also had some significant
missing features. But I believe it was self-hosted on CP/M, quite a feat.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Dimiter_Popoff@21:1/5 to Niklas Holsti on Mon Oct 25 16:04:44 2021

On 10/25/2021 11:09, Niklas Holsti wrote:

On 2021-10-24 23:27, Dimiter_Popoff wrote:

On 10/24/2021 22:54, Don Y wrote:

On 10/24/2021 4:14 AM, Dimiter_Popoff wrote:

Disable interrupts while accessing the fifo. you really have to.
alternatively you'll often get away not using a fifo at all,
unless you're blocking for a long while in some part of the code.

Why would you do that. The fifo write pointer is only modified by
the interrupt handler, the read pointer is only modified by the
interrupted code. Has been done so for times immemorial.

The OPs code doesn't differentiate between FIFO full and empty.

So he should fix that first, there is no sane reason why not.
Few things are simpler to do than that.

�� [snip]

Whatever handshakes he makes there is no problem knowing whether
the fifo is full - just check if the position the write pointer
will have after putting the next byte matches the read pointer
at the moment.� Like I said before, few things are simpler than
that, can't imagine someone working as a programmer being
stuck at *that*.

That simple check would require keeping a maximum of only N-1 entries in
the N-position FIFO buffer, and the OP explicitly said they did not want
to allocate an unused place in the buffer (which I think is unreasonable
of the OP, but that is only IMO).

Well it might be reasonable if the fifo has a size of two, you know :-).

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Niklas Holsti@21:1/5 to All on Mon Oct 25 18:34:48 2021

On 2021-10-25 16:04, Dimiter_Popoff wrote:

On 10/25/2021 11:09, Niklas Holsti wrote:

On 2021-10-24 23:27, Dimiter_Popoff wrote:

On 10/24/2021 22:54, Don Y wrote:

On 10/24/2021 4:14 AM, Dimiter_Popoff wrote:

Disable interrupts while accessing the fifo. you really have to.
alternatively you'll often get away not using a fifo at all,
unless you're blocking for a long while in some part of the code.

Why would you do that. The fifo write pointer is only modified by
the interrupt handler, the read pointer is only modified by the
interrupted code. Has been done so for times immemorial.

The OPs code doesn't differentiate between FIFO full and empty.

So he should fix that first, there is no sane reason why not.
Few things are simpler to do than that.

�� [snip]

Whatever handshakes he makes there is no problem knowing whether
the fifo is full - just check if the position the write pointer
will have after putting the next byte matches the read pointer
at the moment.� Like I said before, few things are simpler than
that, can't imagine someone working as a programmer being
stuck at *that*.

That simple check would require keeping a maximum of only N-1 entries
in the N-position FIFO buffer, and the OP explicitly said they did not
want to allocate an unused place in the buffer (which I think is
unreasonable of the OP, but that is only IMO).

Well it might be reasonable if the fifo has a size of two, you know :-).

And if each of those two items is large, yes. But here we have a FIFO of
8-bit characters... few programs are so tight on memory that they cannot
stand one unused octet.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Don Y@21:1/5 to All on Mon Oct 25 11:02:03 2021

On 10/25/2021 10:53 AM, Dimiter_Popoff wrote:

On 10/25/2021 20:43, Don Y wrote:

On 10/25/2021 8:34 AM, Niklas Holsti wrote:

And if each of those two items is large, yes. But here we have a FIFO of >>> 8-bit characters... few programs are so tight on memory that they cannot >>> stand one unused octet.

It's not "unused". Rather, it's roll is that of indicating "full/overrun". >> The OP seems to have decided that this is of no concern -- in *one* app?

Oh come on, I joked about the fifo of two bytes only because this whole thread is a joke

My comment applies regardless of the size of the FIFO.

- pages and pages of C to maintain a fifo, what can be
more of a joke than this.

Where do you see "pages and pages of C to maintain a FIFO"?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Don Y@21:1/5 to Niklas Holsti on Mon Oct 25 10:43:38 2021

On 10/25/2021 8:34 AM, Niklas Holsti wrote:

And if each of those two items is large, yes. But here we have a FIFO of 8-bit
characters... few programs are so tight on memory that they cannot stand one unused octet.

It's not "unused". Rather, it's roll is that of indicating "full/overrun".
The OP seems to have decided that this is of no concern -- in *one* app?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Dimiter_Popoff@21:1/5 to Don Y on Mon Oct 25 20:53:18 2021

On 10/25/2021 20:43, Don Y wrote:

On 10/25/2021 8:34 AM, Niklas Holsti wrote:

And if each of those two items is large, yes. But here we have a FIFO
of 8-bit characters... few programs are so tight on memory that they
cannot stand one unused octet.

It's not "unused".� Rather, it's roll is that of indicating "full/overrun". The OP seems to have decided that this is of no concern -- in *one* app?

Oh come on, I joked about the fifo of two bytes only because this whole
thread is a joke - pages and pages of C to maintain a fifo, what can be
more of a joke than this.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From pozz@21:1/5 to All on Mon Oct 25 19:52:52 2021

Il 25/10/2021 17:34, Niklas Holsti ha scritto:

On 2021-10-25 16:04, Dimiter_Popoff wrote:

On 10/25/2021 11:09, Niklas Holsti wrote:

On 2021-10-24 23:27, Dimiter_Popoff wrote:

On 10/24/2021 22:54, Don Y wrote:

On 10/24/2021 4:14 AM, Dimiter_Popoff wrote:

Disable interrupts while accessing the fifo. you really have to. >>>>>>> alternatively you'll often get away not using a fifo at all,
unless you're blocking for a long while in some part of the code. >>>>>>

Why would you do that. The fifo write pointer is only modified by
the interrupt handler, the read pointer is only modified by the
interrupted code. Has been done so for times immemorial.

The OPs code doesn't differentiate between FIFO full and empty.

So he should fix that first, there is no sane reason why not.
Few things are simpler to do than that.

�� [snip]

Whatever handshakes he makes there is no problem knowing whether
the fifo is full - just check if the position the write pointer
will have after putting the next byte matches the read pointer
at the moment.� Like I said before, few things are simpler than
that, can't imagine someone working as a programmer being
stuck at *that*.

That simple check would require keeping a maximum of only N-1 entries
in the N-position FIFO buffer, and the OP explicitly said they did
not want to allocate an unused place in the buffer (which I think is
unreasonable of the OP, but that is only IMO).

Well it might be reasonable if the fifo has a size of two, you know :-).

And if each of those two items is large, yes. But here we have a FIFO of 8-bit characters... few programs are so tight on memory that they cannot stand one unused octet.

When I have a small (<256) power-of-two (16, 32, 64, 128) buffer (and
this is the case for a UART receiving ring-buffer), I like to use this implementation that works and doesn't waste any element.

However I know this isn't the best implementation ever and it's a pity
the thread emphasis has been against this implementation (that was used
as *one* implementation just to have an example to discuss on).

The main point was the use of volatile (and other techniques) to
guarantee a correct compiler output, whatever legal (respect the C
standard) optimizations the compiler thinks to do.

It seems to me the arguments againts or for volatile are completely
indipendent from the implementation of ring-buffer.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Don Y@21:1/5 to pozz on Mon Oct 25 11:10:33 2021

On 10/25/2021 10:52 AM, pozz wrote:

However I know this isn't the best implementation ever and it's a pity the thread emphasis has been against this implementation (that was used as *one* implementation just to have an example to discuss on).

The point is that you need a COMPLETE implementation before you start
thinking about the amount of "license" the compiler can take with your code.

Here's *part* of an implementation:

a = 37;

Now, should I declare A as volatile? Use the register qualifier?
What size should the integer A be? Can the optimizer elide this
statement from my code?

All sorts of questions whose answers depend on the REST of the
implementation -- not shown!

The main point was the use of volatile (and other techniques) to guarantee a correct compiler output, whatever legal (respect the C standard) optimizations
the compiler thinks to do.

It seems to me the arguments againts or for volatile are completely indipendent
from the implementation of ring-buffer.

It has to do with indicating how YOU (the developer) see the object
being used (accessed). You, in theory, know more about the role of
the object than the compiler (because it may be accessed in other modules,
or, have "stuff" tied to it -- like special hardware, etc.) You need a way
to tell the compiler that "you know what you are doing" in your use
of the object and that it should restrain itself from making assumptions
that might not be true.

If your example doesn't bring to light those various issues, then
the decision as to its applicability is moot.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From pozz@21:1/5 to All on Mon Oct 25 20:15:03 2021

Il 23/10/2021 18:09, David Brown ha scritto:
[...]

Marking "in" and "buf" as volatile is /far/ better than using a critical section, and likely to be more efficient than a memory barrier. You can
also use volatileAccess rather than making buf volatile, and it is often slightly more efficient to cache volatile variables in a local variable
while working with them.

I think I got your point, but I'm wondering why there are plenty of
examples of ring-buffer implementations that don't use volatile at all,
even if the author explicitly refers to interrupts and multithreading.

Just an example[1] by Quantum Leaps. It promises to be a *lock-free* (I
think thread-safe) ring-buffer implementation in the scenario of single producer/single consumer (that is my scenario too).

In the source code there's no use of volatile. I could call
RingBuf_put() in my rx uart ISR and call RingBuf_get() in my mainloop code.

From what I learned from you, this code usually works, but the standard doesn't guarantee it will work with every old, current and future compilers.

[1] https://github.com/QuantumLeaps/lock-free-ring-buffer

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Niklas Holsti on Mon Oct 25 20:58:25 2021

On 25/10/2021 17:34, Niklas Holsti wrote:

And if each of those two items is large, yes. But here we have a FIFO of 8-bit characters... few programs are so tight on memory that they cannot stand one unused octet.

I remember a program I worked with where the main challenge for the
final features was not figuring out the implementation, but finding a
few spare bytes of code space and a couple of spare bits of ram to use.
And that was with 32 KB ROM and 512 bytes RAM (plus some bits in the
registers of peripherals that weren't used). That was probably the last
big assembly program I wrote - non-portability was a killer.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to pozz on Mon Oct 25 20:54:26 2021

On 25/10/2021 20:15, pozz wrote:

Il 23/10/2021 18:09, David Brown ha scritto:
[...]

Marking "in" and "buf" as volatile is /far/ better than using a critical
section, and likely to be more efficient than a memory barrier.� You can
also use volatileAccess rather than making buf volatile, and it is often
slightly more efficient to cache volatile variables in a local variable
while working with them.

I think I got your point, but I'm wondering why there are plenty of
examples of ring-buffer implementations that don't use volatile at all,
even if the author explicitly refers to interrupts and multithreading.

You don't have to use "volatile". You can make correct code here using critical sections - it's just a lot less efficient. (If you have a
queue where more than one context can be reading it or writing it, then
you /do/ need some kind of locking mechanism.)

You can also use memory barriers instead of volatile, but it is likely
to be slightly less efficient.

You can also use atomics instead of volatiles, but it is also quite
likely to be slightly less efficient. If you have an SMP system, on the
other hand, then you need something more than volatile and compiler
memory barriers - atomics are quite possibly the most efficient solution
in that case.

And sometimes you can make code that doesn't need any special treatment
at all, because you know the way it is being called. If the two ends of
your buffer are handled by tasks in a cooperative multi-tasking
scenario, then there is no problem - you don't need to worry about
volatile or any alternatives. If you know your interrupt can't occur
while the other end of the buffer is being handled, that can reduce your
need for volatile. (In particular, that can also avoid complications if
you have counter variables that are bigger than the processor can handle atomically - usually not a problem for a 32-bit Cortex-M, but often
important on an 8-bit AVR.)

If you know, for a fact, that the code will be compiled by a weak
compiler or with weak optimisation, or that the "get" and "put"
implementations will always be in a separately compiled unit from code
calling these functions and you'll never use any kind of cross-unit optimisations, then you can get often away without using volatile.

Just an example[1] by Quantum Leaps. It promises to be a *lock-free* (I
think thread-safe) ring-buffer implementation in the scenario of single producer/single consumer (that is my scenario too).

It's lock-free, but not safe in the face of modern optimisation (gcc has
had LTO for many years, and a lot of high-end commercial embedded
compilers have used such techniques for decades). And I'd want to study
it in detail and think a lot before accepting that it is safe to use its
16-bit counters on an 8-bit AVR. That could be fixed by just changing
the definition of the RingBufCtr type, which is a nice feature in the code.

In the source code there's no use of volatile. I could call
RingBuf_put() in my rx uart ISR and call RingBuf_get() in my mainloop code.

You don't want to call functions from an ISR if you can avoid it, unless
the functions are defined in the same unit and can be inlined. On many processors (less so on the Cortex-M) calling an external function from
an ISR means a lot of overhead to save and restore the so-called
"volatile" registers (no relation to the C keyword "volatile"), usually completely unnecessarily.

From what I learned from you, this code usually works, but the standard doesn't guarantee it will work with every old, current and future
compilers.

Yes, that's a fair summary.

It might be good enough for some purposes. But since "volatile" will
cost nothing in code efficiency but greatly increase the portability and
safety of the code, I'd recommend using it. And I am certainly in
favour of thinking carefully about these things - as you did in the
first place, which is why we have this thread.

[1] https://github.com/QuantumLeaps/lock-free-ring-buffer

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Niklas Holsti@21:1/5 to pozz on Mon Oct 25 21:33:44 2021

On 2021-10-25 20:52, pozz wrote:

Il 25/10/2021 17:34, Niklas Holsti ha scritto:

On 2021-10-25 16:04, Dimiter_Popoff wrote:

On 10/25/2021 11:09, Niklas Holsti wrote:

On 2021-10-24 23:27, Dimiter_Popoff wrote:

On 10/24/2021 22:54, Don Y wrote:

On 10/24/2021 4:14 AM, Dimiter_Popoff wrote:

Disable interrupts while accessing the fifo. you really have to. >>>>>>>> alternatively you'll often get away not using a fifo at all,
unless you're blocking for a long while in some part of the code. >>>>>>>

Why would you do that. The fifo write pointer is only modified by >>>>>>> the interrupt handler, the read pointer is only modified by the
interrupted code. Has been done so for times immemorial.

The OPs code doesn't differentiate between FIFO full and empty.

(I suspect something is not quite right with the attributions of the
quotations above -- Dimiter probably did not suggest disabling
interrupts -- but no matter.)

[snip]

When I have a small (<256) power-of-two (16, 32, 64, 128) buffer (and
this is the case for a UART receiving ring-buffer), I like to use this implementation that works and doesn't waste any element.

However I know this isn't the best implementation ever and it's a pity
the thread emphasis has been against this implementation (that was used
as *one* implementation just to have an example to discuss on).

The main point was the use of volatile (and other techniques) to
guarantee a correct compiler output, whatever legal (respect the C
standard) optimizations the compiler thinks to do.

It seems to me the arguments againts or for volatile are completely indipendent from the implementation of ring-buffer.

Of course "volatile" is needed, in general, whenever anything is written
in one thread and read in another. The issue, I think, is when
"volatile" is _enough_.

I feel that detection of a full buffer (FIFO overflow) is required for a
proper ring buffer implementation, and that has implications for the
data structure needed, and that has implications for whether critical
sections are needed.

If the FIFO implementation is based on just two pointers (read and
write), and each pointer is modified by just one of the two threads
(main thread = reader, and interrupt handler = writer), and those
modifications are both "volatile" AND atomic (which has not been
discussed so far, IIRC...), then one can do without a critical region.
But then detection of a full buffer needs one "wasted" element in the
buffer.

To avoid the wasted element, one could add a "full"/"not full" Boolean
flag. But that flag would be modified by both threads, and should be
modified atomically together with the pointer modifications, which (I
think) means that a critical section is needed.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Dimiter_Popoff@21:1/5 to Niklas Holsti on Mon Oct 25 22:09:17 2021

On 10/25/2021 21:33, Niklas Holsti wrote:

....

If the FIFO implementation is based on just two pointers (read and
write), and each pointer is modified by just one of the two threads
(main thread = reader, and interrupt handler = writer), and those modifications are both "volatile" AND atomic (which has not been
discussed so far, IIRC...), then one can do without a critical region.
But then detection of a full buffer needs one "wasted" element in the
buffer.

Why atomic? No need for that unless more than one interrupted task would
want to read from the fifo at the same time, which is nonsense. [I once
wasted a day looking at a garbled input from an auxiliary HV to a netMCA
only to discover I had left a shell to start via the same UART during
boot, through an outdated (but available) driver accessing the same
UART the standard driver through which the HV was used....
This across half the planet, customer was in South Africa. Not
an experience anybody would ask for, I can tell you :)].
Just like there is no need to mask interrupts, as you mentioned I
had said before.

To avoid the wasted element, one could add a "full"/"not full" Boolean
flag. But that flag would be modified by both threads, and should be
modified atomically together with the pointer modifications, which (I
think) means that a critical section is needed.

Now this is where atomic access is necessary - for no good reason in
this case, as mentioned before, but if one wants to bang their head
in the wall this is the proper way to do it.
As for "volatile" I can't say much, but if this is the way to make
the compiler access every time the address declared such instead of
using some stale data it has then it would be needed of
course.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Niklas Holsti@21:1/5 to All on Mon Oct 25 22:53:08 2021

On 2021-10-25 22:09, Dimiter_Popoff wrote:

On 10/25/2021 21:33, Niklas Holsti wrote:

....

If the FIFO implementation is based on just two pointers (read and
write), and each pointer is modified by just one of the two threads
(main thread = reader, and interrupt handler = writer), and those
modifications are both "volatile" AND atomic (which has not been
discussed so far, IIRC...), then one can do without a critical region.
But then detection of a full buffer needs one "wasted" element in the
buffer.

Why atomic?

If the read/write pointers/indices are, say, 16 bits, but the processor
has only 8-bit store/load instructions, updating a pointer/index happens non-atomically, 8 bits at a time, and the interrupt handler can read a half-updated value if the interrupt happens in the middle of an update.
That would certainly mess up the comparison between the read and write
points in the interrupt handler.

In the OP's code, I suppose (but I don't recall) that the indices are 8
bits, so probably atomically readable and writable.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Dimiter_Popoff@21:1/5 to Niklas Holsti on Mon Oct 25 23:02:34 2021

On 10/25/2021 22:53, Niklas Holsti wrote:

On 2021-10-25 22:09, Dimiter_Popoff wrote:

On 10/25/2021 21:33, Niklas Holsti wrote:

....

If the FIFO implementation is based on just two pointers (read and
write), and each pointer is modified by just one of the two threads
(main thread = reader, and interrupt handler = writer), and those
modifications are both "volatile" AND atomic (which has not been
discussed so far, IIRC...), then one can do without a critical
region. But then detection of a full buffer needs one "wasted"
element in the buffer.

Why atomic?

If the read/write pointers/indices are, say, 16 bits, but the processor
has only 8-bit store/load instructions, updating a pointer/index happens non-atomically, 8 bits at a time, and the interrupt handler can read a half-updated value if the interrupt happens in the middle of an update.
That would certainly mess up the comparison between the read and write
points in the interrupt handler.

In the OP's code, I suppose (but I don't recall) that the indices are 8
bits, so probably atomically readable and writable.

Ah, well, this is a possible scenario in a multicore system (or single
core if the two bytes are written by separate opcodes).

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From antispam@math.uni.wroc.pl@21:1/5 to Don Y on Mon Oct 25 21:32:25 2021

Don Y <blockedofcourse@foo.invalid> wrote:

On 10/24/2021 4:14 AM, Dimiter_Popoff wrote:

Disable interrupts while accessing the fifo. you really have to.
alternatively you'll often get away not using a fifo at all,
unless you're blocking for a long while in some part of the code.

Why would you do that. The fifo write pointer is only modified by
the interrupt handler, the read pointer is only modified by the
interrupted code. Has been done so for times immemorial.

The OPs code doesn't differentiate between FIFO full and empty.

If you read carefuly what he wrote you would know that he does.
The trick he uses is that his indices may point outside buffer:
empty is equal indices, full is difference equal to buffer
size. Of course his approach has its own limitations, like
buffer size being power of 2 and with 8 bit indices maximal
buffer size is 128.

--
Waldek Hebisch

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Clifford Heath@21:1/5 to Niklas Holsti on Tue Oct 26 08:43:12 2021

On 25/10/21 7:09 pm, Niklas Holsti wrote:

On 2021-10-24 23:27, Dimiter_Popoff wrote:

On 10/24/2021 22:54, Don Y wrote:

On 10/24/2021 4:14 AM, Dimiter_Popoff wrote:

Disable interrupts while accessing the fifo. you really have to.
alternatively you'll often get away not using a fifo at all,
unless you're blocking for a long while in some part of the code.

Why would you do that. The fifo write pointer is only modified by
the interrupt handler, the read pointer is only modified by the
interrupted code. Has been done so for times immemorial.

The OPs code doesn't differentiate between FIFO full and empty.

So he should fix that first, there is no sane reason why not.
Few things are simpler to do than that.

�� [snip]

Whatever handshakes he makes there is no problem knowing whether
the fifo is full - just check if the position the write pointer
will have after putting the next byte matches the read pointer
at the moment.� Like I said before, few things are simpler than
that, can't imagine someone working as a programmer being
stuck at *that*.

That simple check would require keeping a maximum of only N-1 entries in
the N-position FIFO buffer, and the OP explicitly said they did not want
to allocate an unused place in the buffer (which I think is unreasonable
of the OP, but that is only IMO).

In my opinion too. If you're going to waste a memory cell, why not use
it for a count variable instead of an unused element?

CH

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From pozz@21:1/5 to All on Mon Oct 25 23:46:23 2021

Il 25/10/2021 20:33, Niklas Holsti ha scritto:

On 2021-10-25 20:52, pozz wrote:

Il 25/10/2021 17:34, Niklas Holsti ha scritto:

On 2021-10-25 16:04, Dimiter_Popoff wrote:

On 10/25/2021 11:09, Niklas Holsti wrote:

On 2021-10-24 23:27, Dimiter_Popoff wrote:

On 10/24/2021 22:54, Don Y wrote:

On 10/24/2021 4:14 AM, Dimiter_Popoff wrote:

Disable interrupts while accessing the fifo. you really have to. >>>>>>>>> alternatively you'll often get away not using a fifo at all, >>>>>>>>> unless you're blocking for a long while in some part of the code. >>>>>>>>

Why would you do that. The fifo write pointer is only modified by >>>>>>>> the interrupt handler, the read pointer is only modified by the >>>>>>>> interrupted code. Has been done so for times immemorial.

The OPs code doesn't differentiate between FIFO full and empty.

(I suspect something is not quite right with the attributions of the quotations above -- Dimiter probably did not suggest disabling
interrupts -- but no matter.)

�� [snip]

When I have a small (<256) power-of-two (16, 32, 64, 128) buffer (and
this is the case for a UART receiving ring-buffer), I like to use this
implementation that works and doesn't waste any element.

However I know this isn't the best implementation ever and it's a pity
the thread emphasis has been against this implementation (that was
used as *one* implementation just to have an example to discuss on).

The main point was the use of volatile (and other techniques) to
guarantee a correct compiler output, whatever legal (respect the C
standard) optimizations the compiler thinks to do.

It seems to me the arguments againts or for volatile are completely
indipendent from the implementation of ring-buffer.

Of course "volatile" is needed, in general, whenever anything is written
in one thread and read in another. The issue, I think, is when
"volatile" is _enough_.

I feel that detection of a full buffer (FIFO overflow) is required for a proper ring buffer implementation, and that has implications for the
data structure needed, and that has implications for whether critical sections are needed.

If the FIFO implementation is based on just two pointers (read and
write), and each pointer is modified by just one of the two threads
(main thread = reader, and interrupt handler = writer), and those modifications are both "volatile" AND atomic (which has not been
discussed so far, IIRC...), then one can do without a critical region.
But then detection of a full buffer needs one "wasted" element in the
buffer.

Yeah, this is exactly the topic of my original post. Anyway it seems
what you say isn't always correct. As per C standard, the compiler could reorder instructions that involve non-volatile data. So, even in your simplified scenario (atomic access for indexes), volatile for the head
only (that ISR changes) is not sufficient.

The function called in the mainloop and that get data from the buffer
access three variables: head (changed in ISR), tail (not changed in ISR)
and buf[] (written in ISR ad read in mainloop).

The get function firstly check if some data is available in the FIFO and
*next* read from buf[]. However compiler could rearrange instructions so reading from buf[] at first and then checking FIFO empty condition.
If the compiler goes this way, errors could occur during execution.

My original question was exactly if this could happen (without breaking
C specifications) and, if yes, how to avoid this: volatile? critical
section? memory barrier?

David Brown said this is possible and suggested to access both head and
buf[] as volatile in get() function, forcing the compiler to respect the
order of instructions.

To avoid the wasted element, one could add a "full"/"not full" Boolean
flag. But that flag would be modified by both threads, and should be
modified atomically together with the pointer modifications, which (I
think) means that a critical section is needed.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to All on Tue Oct 26 00:05:43 2021

On 25/10/2021 22:02, Dimiter_Popoff wrote:

On 10/25/2021 22:53, Niklas Holsti wrote:

On 2021-10-25 22:09, Dimiter_Popoff wrote:

On 10/25/2021 21:33, Niklas Holsti wrote:

....

If the FIFO implementation is based on just two pointers (read and
write), and each pointer is modified by just one of the two threads
(main thread = reader, and interrupt handler = writer), and those
modifications are both "volatile" AND atomic (which has not been
discussed so far, IIRC...), then one can do without a critical
region. But then detection of a full buffer needs one "wasted"
element in the buffer.

Why atomic?

If the read/write pointers/indices are, say, 16 bits, but the
processor has only 8-bit store/load instructions, updating a
pointer/index happens non-atomically, 8 bits at a time, and the
interrupt handler can read a half-updated value if the interrupt
happens in the middle of an update. That would certainly mess up the
comparison between the read and write points in the interrupt handler.

In the OP's code, I suppose (but I don't recall) that the indices are
8 bits, so probably atomically readable and writable.

Ah, well, this is a possible scenario in a multicore system (or single
core if the two bytes are written by separate opcodes).

For the AVR, writing 16-bit values is not atomic - but the OP used 8-bit counters (which are more appropriate for the AVR anyway, as they don't
have enough memory to spend on big buffers).

For multi-core systems you can have added complications. Memory
accesses are always seen in assembly-code order on one core, regardless
of how they may be re-ordered by buffers, caches, out-of-order
execution, speculative execution, etc. (CPU designers sometimes have to
work quite hard to achieve this, but anything else would be impossible
to work with.) However, the order seen by other cores could be
different. So "volatile" accesses are no longer enough - you need to
use C11/C++11 atomics, or the equivalent.

(Don't use C11/C++11 atomics on gcc for the Cortex-M or AVR, at least
not with anything that can't be done with a single read or write
instruction - the library that comes with gcc is deeply flawed.)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Don Y@21:1/5 to antispam@math.uni.wroc.pl on Mon Oct 25 15:24:38 2021

On 10/25/2021 2:32 PM, antispam@math.uni.wroc.pl wrote:

Don Y <blockedofcourse@foo.invalid> wrote:

On 10/24/2021 4:14 AM, Dimiter_Popoff wrote:

Disable interrupts while accessing the fifo. you really have to.
alternatively you'll often get away not using a fifo at all,
unless you're blocking for a long while in some part of the code.

Why would you do that. The fifo write pointer is only modified by
the interrupt handler, the read pointer is only modified by the
interrupted code. Has been done so for times immemorial.

The OPs code doesn't differentiate between FIFO full and empty.

If you read carefuly what he wrote you would know that he does.
The trick he uses is that his indices may point outside buffer:
empty is equal indices, full is difference equal to buffer

Doesn't matter as any index can increase by any amount and
invalidate the "reality" of the buffer's contents (i.e.
actual number of characters that have been tranfered to
that region of memory).

Buffer size is 128, for example. in is 127, out is 127.
What's that mean? Can you tell me what has happened prior
to this point in time? Have 127 characters been received?
Or, 383? Or, 1151?

How many characters have been removed from the buffer?
(same numeric examples).

Repeat for .in being 129, 227, 255, etc.

Remember, there is nothing that GUARANTEES that the uart
task is keeping up with the input data rate. So, the buffer
could wrap 50 times and it's instantaneous state (visible
via .in and .out) would appear unchanged to that task!

If you wanted to rely on .in != .out to indicate the
presence of data REGARDLESS OF WRAPS, then you'd need the
size of each to be "significantly larger" than the maximum
fill rate of the buffer to ensure THEY can't wrap.

size. Of course his approach has its own limitations, like
buffer size being power of 2 and with 8 bit indices maximal
buffer size is 128.

The biggest practical limitation is that of expectations of
other developers who may inherit (or copy) his code expecting
the FIFO to be "well behaved".

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Richard Damon@21:1/5 to pozz on Mon Oct 25 20:31:12 2021

On 10/25/21 2:15 PM, pozz wrote:

Il 23/10/2021 18:09, David Brown ha scritto:
[...]

Marking "in" and "buf" as volatile is /far/ better than using a critical
section, and likely to be more efficient than a memory barrier. You can
also use volatileAccess rather than making buf volatile, and it is often
slightly more efficient to cache volatile variables in a local variable
while working with them.

I think I got your point, but I'm wondering why there are plenty of
examples of ring-buffer implementations that don't use volatile at all,
even if the author explicitly refers to interrupts and multithreading.

Just an example[1] by Quantum Leaps. It promises to be a *lock-free* (I
think thread-safe) ring-buffer implementation in the scenario of single producer/single consumer (that is my scenario too).

In the source code there's no use of volatile. I could call
RingBuf_put() in my rx uart ISR and call RingBuf_get() in my mainloop code.

From what I learned from you, this code usually works, but the standard doesn't guarantee it will work with every old, current and future
compilers.

[1] https://github.com/QuantumLeaps/lock-free-ring-buffer

The issue with not using 'volatile' (or some similar memory barrier) is
that without it, the implementation is allowed to delay the actual write
of the results into the variable.

If optimization is limited to just within a single translation unit, you
can force it to work by having the execution path leave the translation
unit, but with whole program optimization, it is theoretically possible
that the implementation sees that the thread of execution NEVER needs it
to be spilled out of the registers to memory, so the ISR will never see
the change.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Don Y@21:1/5 to antispam@math.uni.wroc.pl on Tue Oct 26 17:52:59 2021

On 10/26/2021 5:20 PM, antispam@math.uni.wroc.pl wrote:

Don Y <blockedofcourse@foo.invalid> wrote:

On 10/25/2021 2:32 PM, antispam@math.uni.wroc.pl wrote:

Don Y <blockedofcourse@foo.invalid> wrote:

On 10/24/2021 4:14 AM, Dimiter_Popoff wrote:

Disable interrupts while accessing the fifo. you really have to.
alternatively you'll often get away not using a fifo at all,
unless you're blocking for a long while in some part of the code.

Why would you do that. The fifo write pointer is only modified by
the interrupt handler, the read pointer is only modified by the
interrupted code. Has been done so for times immemorial.

The OPs code doesn't differentiate between FIFO full and empty.

If you read carefuly what he wrote you would know that he does.
The trick he uses is that his indices may point outside buffer:
empty is equal indices, full is difference equal to buffer

Doesn't matter as any index can increase by any amount and
invalidate the "reality" of the buffer's contents (i.e.
actual number of characters that have been tranfered to
that region of memory).

AFAIK OP considers this not a problem in his application.

And I don't think I have to test for division by zero -- as
*my* code is the code that is passing numerator and denominator
to that operator, right?

Can you remember all of the little assumptions you've made in
any non-trivial piece of code -- a week later? a month later?
6 months later (when a bug manifests or a feature upgrade
is requested)?

Do not check the inputs of routines for validity -- assume everything is correct (cuz YOU wrote it to be so, right?).

Do not handle error conditions -- because they can't exist (because
you wrote the code and feel confident that you've anticipated
every contingency -- including those for future upgrades).

Ignore compiler warnings -- surely you know better than a silly
"generic" program!

Would you hire someone who viewed your product's quality (and
your reputation) in this regard?

Of course, if such changes were a problem he would need to
add test preventing writing to full buffer (he already have
test preventing reading from empty buffer).

Buffer size is 128, for example. in is 127, out is 127.
What's that mean?

Empty buffer.

No, it means you can't sort out *if* there have been any characters
received, based solely on this fact (and, what other facts are there
to observe?)

Can you tell me what has happened prior
to this point in time? Have 127 characters been received?
Or, 383? Or, 1151?

Does not matter.

Of course it does! Something has happened that the code MIGHT have
detected in other circumstances (e.g., if uart_task had been invoked
more frequently). The world has changed and the code doesn't know it.
Why write code that only *sometimes* works?

How many characters have been removed from the buffer?
(same numeric examples).

The same as has been stored. Point is that received is
always bigger or equal to removed and does not exceed
removed by more than 128. So you can exactly recover
difference between received and removed.

If it can wrap, then "some data" can look like "no data".
If "no data", then NOTHING has been received -- from the
viewpoint of the code.

Tell me what prevents 256 characters from being received
after .in (and .out) are initially 0 -- without any
indication of their presence. What "limits" the difference
to "128"? Do you see any conditionals in the code that
do so? Is there some magic in the hardware that enforces
this?

This is how you end up with bugs in your code. The sorts
of bugs that you can witness -- with your own eyes -- and
never reproduce (until the code has been released and
lots of customers' eyes witness it as well).

The biggest practical limitation is that of expectations of
other developers who may inherit (or copy) his code expecting
the FIFO to be "well behaved".

Well, personally I would avoid storing to full buffer. And
even on small MCU it is not clear for me if his "savings"
are worth it. But his core design is sound.

Concerning other developers, I always working on assumption
that code is "as is" and any claim what it is doing are of
limited value unless there is convincing argument (proof
or outline of proof) what it is doing.

Ever worked on 100KLoC projects? 500KLoC? Do you personally examine
the entire codebase before you get started? Do you purchase source
licenses for every library that you rely upon in your design?
(or, do you just assume software vendors are infallible?)

How would you feel if a fellow worker told you "yeah, the previous
guy had a habit of cutting corners in his FIFO management code"?
Or, "the previous guy always assumed malloc would succeed and
didn't even build an infrastructure to address the possibility
of it failing"

You could, perhaps, grep(1) for "malloc" or "FIFO" and manually
examine those code fragments. What about division operators?
Or, verifying that data types never overflow their limits? Or...

Fact that code
worked well in past system(s) is rather unconvincing.
I have seen small (few lines) pieces of code that contained
multiple bugs. And that code was in "production" use
for several years and passed its tests.

Certainly code like FIFO-s where there are multiple tradeofs
and actual code tends to be relatively small deserves
examination before re-use.

It's not "FIFO code". It's a UART driver. Do you examine every piece
of code that might *contain* a FIFO? How do you know that there *is* a FIFO
in a piece of code -- without manually inspecting it? What if it is a
FIFO mechanism but not explicitly named as a FIFO?

One wants to be able to move towards the goal of software *components*.
You don't want to have to inspect the design of every *diode* that
you use; you want to look at it's overall specifications and decide
if those fit your needs.

Unlikely that this code will describe itself as "works well enough
SOME of the time..."

And, when/if you stumble on such faults, good luck explaining to
your customer why it's going to take longer to fix and retest the
*existing* codebase before you can get on with your modifications...

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From antispam@math.uni.wroc.pl@21:1/5 to Don Y on Wed Oct 27 00:20:21 2021

Don Y <blockedofcourse@foo.invalid> wrote:

On 10/25/2021 2:32 PM, antispam@math.uni.wroc.pl wrote:

Don Y <blockedofcourse@foo.invalid> wrote:

On 10/24/2021 4:14 AM, Dimiter_Popoff wrote:

Disable interrupts while accessing the fifo. you really have to.
alternatively you'll often get away not using a fifo at all,
unless you're blocking for a long while in some part of the code.

Why would you do that. The fifo write pointer is only modified by
the interrupt handler, the read pointer is only modified by the
interrupted code. Has been done so for times immemorial.

The OPs code doesn't differentiate between FIFO full and empty.

If you read carefuly what he wrote you would know that he does.
The trick he uses is that his indices may point outside buffer:
empty is equal indices, full is difference equal to buffer

Doesn't matter as any index can increase by any amount and
invalidate the "reality" of the buffer's contents (i.e.
actual number of characters that have been tranfered to
that region of memory).

AFAIK OP considers this not a problem in his application.
Of course, if such changes were a problem he would need to
add test preventing writing to full buffer (he already have
test preventing reading from empty buffer).

Buffer size is 128, for example. in is 127, out is 127.
What's that mean?

Empty buffer.

Can you tell me what has happened prior
to this point in time? Have 127 characters been received?
Or, 383? Or, 1151?

Does not matter.

How many characters have been removed from the buffer?
(same numeric examples).

The same as has been stored. Point is that received is
always bigger or equal to removed and does not exceed
removed by more than 128. So you can exactly recover
difference between received and removed.

The biggest practical limitation is that of expectations of
other developers who may inherit (or copy) his code expecting
the FIFO to be "well behaved".

Well, personally I would avoid storing to full buffer. And
even on small MCU it is not clear for me if his "savings"
are worth it. But his core design is sound.

Concerning other developers, I always working on assumption
that code is "as is" and any claim what it is doing are of
limited value unless there is convincing argument (proof
or outline of proof) what it is doing. Fact that code
worked well in past system(s) is rather unconvincing.
I have seen small (few lines) pieces of code that contained
multiple bugs. And that code was in "production" use
for several years and passed its tests.

Certainly code like FIFO-s where there are multiple tradeofs
and actual code tends to be relatively small deserves
examination before re-use.

--
Waldek Hebisch

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From antispam@math.uni.wroc.pl@21:1/5 to Don Y on Wed Oct 27 05:22:55 2021

Don Y <blockedofcourse@foo.invalid> wrote:

On 10/26/2021 5:20 PM, antispam@math.uni.wroc.pl wrote:

Don Y <blockedofcourse@foo.invalid> wrote:

On 10/25/2021 2:32 PM, antispam@math.uni.wroc.pl wrote:

Don Y <blockedofcourse@foo.invalid> wrote:

On 10/24/2021 4:14 AM, Dimiter_Popoff wrote:

Disable interrupts while accessing the fifo. you really have to. >>>>>> alternatively you'll often get away not using a fifo at all,
unless you're blocking for a long while in some part of the code. >>>>>

Why would you do that. The fifo write pointer is only modified by
the interrupt handler, the read pointer is only modified by the
interrupted code. Has been done so for times immemorial.

The OPs code doesn't differentiate between FIFO full and empty.

If you read carefuly what he wrote you would know that he does.
The trick he uses is that his indices may point outside buffer:
empty is equal indices, full is difference equal to buffer

Doesn't matter as any index can increase by any amount and
invalidate the "reality" of the buffer's contents (i.e.
actual number of characters that have been tranfered to
that region of memory).

AFAIK OP considers this not a problem in his application.

And I don't think I have to test for division by zero -- as
*my* code is the code that is passing numerator and denominator
to that operator, right?

Well, I do not test for zero if I know that divisor must be
nonzero. To put it differently, having zero in such place
is a bug and there is already enough machinery so that
such bug will not remain undetected. Having extra test
adds no value.

OTOH is zero is possible, then handling it is part of program
logic and test is needed to take correct action.

Can you remember all of the little assumptions you've made in
any non-trivial piece of code -- a week later? a month later?
6 months later (when a bug manifests or a feature upgrade
is requested)?

Well, my normal practice is that there are no "little assumptions".
To put it differently, code is structured to make things clear,
even if this requires more code than some "clever" solution.
There may be "big assumptions", that is highly nontrivial facts
used by the code. Some of them are considered "well known",
with proper naming in code it is easy to recall them years later.
Some deserve comments/referece. In most of may coding I have
pretty comfortable situation: for human there is quite clear
what is valid and what is invalid. So code makes a lot of
effort to handle valid (but possibly quite unusual) cases

Do not check the inputs of routines for validity -- assume everything is correct (cuz YOU wrote it to be so, right?).

Well, correct inputs are part of contract. Some things (like
array indices inside bounds) are checked, but in general you can
expect garbage if you pass incorrect input. Most of my code is
of sort that called routine can not really check validity of input
(there are complex invariants). Note: here I am talking mostly
about my non-embedded code (which is majority of my coding).
In most of may coding I have pretty comfortable situation: for
human there is quite clear what is valid and what is invalid.
So code makes a lot of effort to handle valid (but possibly quite
unusual) cases. User input is normally checked to give sensible
error message, but some things are deemed to tricky/expensive
to check. Other routines are deemed "system level", and here
there us up to user/caller to respect the contract.

My embedded code consists of rather small systems, and normally
there are no explicit validity checks. To clarify: when system
receives commands it recognizes and handles valid commands.
So there is implicit check: anything not recognized as valid
is invalid. OTOH frequently there is nothing to do in case
of errors: if there are no display to print error message,
no persistent store to log erreor and shuting down is not helpful,
then what else potential error handler would do?

I do not check if 12-bit ADC really returns numbers in range.
My 'print_byte' routine takes integer argument and blindly
truncates it to 8-bit without worring about possible
spurious upper bits. "Safety critical" folks my be worried
by such practice, but my embedded code is fairly non-critical.

Do not handle error conditions -- because they can't exist (because
you wrote the code and feel confident that you've anticipated
every contingency -- including those for future upgrades).

Ignore compiler warnings -- surely you know better than a silly
"generic" program!

Would you hire someone who viewed your product's quality (and
your reputation) in this regard?

Well, you do not know what OP code is doing. I would prefer
my code to be robust and I feel that I am doing resonably
well here. OTOH, coming back to serial comunication, it
is not hard to design communication protocal such that in
normal operation there is no possibility for buffer
overflow. It would still make sense to add a single line
to say drop excess characters. But it does not make
sense to make big story of lack of this line. In particular
issue that OP wanted to discuss is still valid.

Of course, if such changes were a problem he would need to
add test preventing writing to full buffer (he already have
test preventing reading from empty buffer).

Buffer size is 128, for example. in is 127, out is 127.
What's that mean?

Empty buffer.

No, it means you can't sort out *if* there have been any characters
received, based solely on this fact (and, what other facts are there
to observe?)

Of course you can connect to system and change values of variables
in debugger, so specific values mean nothing. I am telling
you what to protocal is. If all part of system (including parts
that OP skipped) obey the protocal, then you have meaning above.
If something misbehaves (say cosmic ray flipped a bit), it does
not mean that protocal is incorrect. Simply _if_ probability
of misbehaviour is too high you need to fix the system (add
radiation shielding, appropiate seal to avoid tampering with
internals, extra checks inside, etc). But what/if to fix
something is for OP to decide.

Can you tell me what has happened prior
to this point in time? Have 127 characters been received?
Or, 383? Or, 1151?

Does not matter.

Of course it does! Something has happened that the code MIGHT have
detected in other circumstances (e.g., if uart_task had been invoked
more frequently). The world has changed and the code doesn't know it.
Why write code that only *sometimes* works?

All code works only sometimes. Parafrazing famous answer to
Napoleon: fisrt you need a processor. There are a lot of
conditons so that code works as intended. Granted, I would
not skip needed check in real code. But this is obvious
thing to add. You are somewhat making OP code as "broken
beyond repair". Well, as discussion showed, OP had problem
using "volatile" and that IMHO is much more important to
fix.

How many characters have been removed from the buffer?
(same numeric examples).

The same as has been stored. Point is that received is
always bigger or equal to removed and does not exceed
removed by more than 128. So you can exactly recover
difference between received and removed.

If it can wrap, then "some data" can look like "no data".
If "no data", then NOTHING has been received -- from the
viewpoint of the code.

Tell me what prevents 256 characters from being received
after .in (and .out) are initially 0 -- without any
indication of their presence. What "limits" the difference
to "128"? Do you see any conditionals in the code that
do so? Is there some magic in the hardware that enforces
this?

That is the protocol. How to avoid violation is different
matter: dropping characters _may_ be solution. But dropping
characters means that some data is lost, and how to deal
with lost data is different issue. As is OP code will loose
some old data. It is OP problem to decide which failure
mode is more problematic and how much extra checks are
needed.

This is how you end up with bugs in your code. The sorts
of bugs that you can witness -- with your own eyes -- and
never reproduce (until the code has been released and
lots of customers' eyes witness it as well).

IME it is issues that you can not prodict that catch you.
The above is obvious issue, and should not be a problem
(unless designer is seriously incompenent and misjudged
what can happen).

The biggest practical limitation is that of expectations of
other developers who may inherit (or copy) his code expecting
the FIFO to be "well behaved".

Well, personally I would avoid storing to full buffer. And
even on small MCU it is not clear for me if his "savings"
are worth it. But his core design is sound.

Concerning other developers, I always working on assumption
that code is "as is" and any claim what it is doing are of
limited value unless there is convincing argument (proof
or outline of proof) what it is doing.

Ever worked on 100KLoC projects? 500KLoC? Do you personally examine
the entire codebase before you get started?

Of course I do not read all code before start. But I accept
risc that code may turn out to be faulty and I may be forced
to fix or abandon it. My main project has 450K wc lines.
I know that parts are wrong and I am working on fixing that
(which will probably involve substantial rewrite). I worked
a little on gcc and I can tell you that only sure thing in
such projects is that there are bugs. Of course, despite
bugs gcc is quite useful. But I also met Modula 2 compiler
that carefuly checked programs for violation of language
rules, but miscompiled nested function calls.

Do you purchase source
licenses for every library that you rely upon in your design?
(or, do you just assume software vendors are infallible?)

Well, for several years I work exclusively with open source code.
I see a lot of defects. While my experience with comercial codes
is limited I do not think that commercial codes have less defects
than open source ones. In fact, there are reasons to suspect
that there are more defects in commercial codes.

How would you feel if a fellow worker told you "yeah, the previous
guy had a habit of cutting corners in his FIFO management code"?
Or, "the previous guy always assumed malloc would succeed and
didn't even build an infrastructure to address the possibility
of it failing"

Well, there is a lot of bad code. Sometimes best solution is simply
to throw it out. In other cases (likely in your malloc scenario above)
there may be simple workaround (replace malloc by checking version).

You could, perhaps, grep(1) for "malloc" or "FIFO" and manually
examine those code fragments.

Yes, that one of possible appraches.

What about division operators?

I have a C parser. In desperation I could try to search parse
tree or transform program. Or, more likely decide that program
is broken beyond repair.

Or, verifying that data types never overflow their limits? Or...

Well, one thing is to look at structure of program. Code may
look complicated, but some programs are reasonably testable:
few random inputs can give some confidence that "main"
execution path computes correct values. Then you look if
you can hit limits. Actually, much of my coding is in
arbitrary precision, so overflow is impossible. Instead
program may run out of memory. But there parts for speed
use fixed precision. If I correctly computed limits
overflow is impossible. But this is big if.

Fact that code
worked well in past system(s) is rather unconvincing.
I have seen small (few lines) pieces of code that contained
multiple bugs. And that code was in "production" use
for several years and passed its tests.

Certainly code like FIFO-s where there are multiple tradeofs
and actual code tends to be relatively small deserves
examination before re-use.

It's not "FIFO code". It's a UART driver. Do you examine every piece
of code that might *contain* a FIFO? How do you know that there *is* a FIFO in a piece of code -- without manually inspecting it? What if it is a
FIFO mechanism but not explicitly named as a FIFO?

One wants to be able to move towards the goal of software *components*.
You don't want to have to inspect the design of every *diode* that
you use; you want to look at it's overall specifications and decide
if those fit your needs.

Sure, I would love to see really reusable components. But IMHO we
are quite far from that. There are some things which are reusable
if you accept modest to severe overhead. For example things tends
to compose nicely if you dynamically allocate everything and use
garbage collection. But performace cost may be substantial.
And in embedded setting garbage collection may be unacceptable.
In some cases I have found out that I can get much better
speed joing things that could be done as composition of library
operations into single big routine. In other cases I fixed
bugs by replacing composition of library routines by a single
routine: there were interactions making simple composition
incorrect. Correct alterantive was single routine.

As I wrote my embedded programs are simple and small. But I
use almost no external libraries. Trying some existing libraries
I have found out that some produce rather large programs, linking
in a lot of unneeded stuff. Of course, writing for scratch
will not scale to bigger programs. OTOH, I feel that with
proper tooling it would be possible to retain efficiency and
small code size at least for large class of microntroller
programs (but existing tools and libraries do not support this).

Unlikely that this code will describe itself as "works well enough
SOME of the time..."

And, when/if you stumble on such faults, good luck explaining to
your customer why it's going to take longer to fix and retest the
*existing* codebase before you can get on with your modifications...

Commercial vendors like to say how good their progam are. But
market reality is that program my be quite bad and still sell.

--
Waldek Hebisch

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Don Y@21:1/5 to antispam@math.uni.wroc.pl on Fri Oct 29 15:36:49 2021

On 10/26/2021 10:22 PM, antispam@math.uni.wroc.pl wrote:

One wants to be able to move towards the goal of software *components*.
You don't want to have to inspect the design of every *diode* that
you use; you want to look at it's overall specifications and decide
if those fit your needs.

Sure, I would love to see really reusable components. But IMHO we
are quite far from that.

Do you use the standard libraries? Aren't THEY components?
You rely on the compiler to decide how to divide X by Y -- instead
of writing your own division routine. How often do you reimplement
?printf() to avoid all of the bloat that typically accompanies it?
(when was the last time you needed ALL of those format specifiers
in an application? And modifiers?

There are some things which are reusable
if you accept modest to severe overhead.

What you need is components with varying characteristics.
You can buy diodes with all sorts of current carrying capacities,
PIVs, package styles, etc. But, they all still perform the
same function. Why so many different part numbers? Why not
just use the biggest, baddest diode in ALL your circuits?

I.e., we readily accept differences in "standard components"
in other disciplines; why not when it comes to software
modules?

For example things tends
to compose nicely if you dynamically allocate everything and use
garbage collection. But performace cost may be substantial.
And in embedded setting garbage collection may be unacceptable.
In some cases I have found out that I can get much better
speed joing things that could be done as composition of library
operations into single big routine.

Sure, but now you're tuning a solution to a specific problem.
I've designed custom chips to solve particular problems.
But, they ONLY solve those particular problems! OTOH,
I use lots of OTC components in my designs because those have
been designed (for the most part) with an eye towards
meeting a variety of market needs.

In other cases I fixed
bugs by replacing composition of library routines by a single
routine: there were interactions making simple composition
incorrect. Correct alterantive was single routine.

As I wrote my embedded programs are simple and small. But I
use almost no external libraries. Trying some existing libraries
I have found out that some produce rather large programs, linking
in a lot of unneeded stuff.

Because they try to address a variety of solution spaces without
trying to be "optimal" for any. You trade flexibility/capability
for speed/performance/etc.

Of course, writing for scratch
will not scale to bigger programs. OTOH, I feel that with
proper tooling it would be possible to retain efficiency and
small code size at least for large class of microntroller
programs (but existing tools and libraries do not support this).

Templates are an attempt in this direction. Allowing a class of
problems to be solved once and then tailored to the specific
application.

But, personal experience is where you win the most. You write
your second or third UART driver and start realizing that you
could leverage a previous design if you'd just thought it out
more fully -- instead of tailoring it to the specific needs
of the original application.

And, as you EXPECT to be reusing it in other applications (as
evidenced by the fact that it's your third time writing the same
piece of code!), you anticipate what those *might* need and
think about how to implement those features "economically".

It's rare that an application is *so* constrained that it can't
afford a couple of extra lines of code, here and there. If
you've considered efficiency in the design of your algorithms,
then these little bits of inefficiency will be below the noise floor.

Unlikely that this code will describe itself as "works well enough
SOME of the time..."

And, when/if you stumble on such faults, good luck explaining to
your customer why it's going to take longer to fix and retest the
*existing* codebase before you can get on with your modifications...

Commercial vendors like to say how good their progam are. But
market reality is that program my be quite bad and still sell.

The same is true of FOSS -- despite the claim that many eyes (may)
have looked at it (suggesting that bugs would have been caught!)

From "KLEE: Unassisted and Automatic Generation of High-Coverage
Tests for Complex Systems Programs":

KLEE finds important errors in heavily-tested code. It
found ten fatal errors in COREUTILS (including three
that had escaped detection for 15 years), which account
for more crashing bugs than were reported in 2006, 2007
and 2008 combined. It further found 24 bugs in BUSYBOX, 21
bugs in MINIX, and a security vulnerability in HISTAR– a
total of 56 serious bugs.

Ooops! I wonder how many FOSS *eyes* missed those errors?

Every time you reinvent a solution, you lose much of the benefit
of the previous TESTED solution.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From antispam@math.uni.wroc.pl@21:1/5 to Don Y on Sun Oct 31 22:54:22 2021

Don Y <blockedofcourse@foo.invalid> wrote:

On 10/26/2021 10:22 PM, antispam@math.uni.wroc.pl wrote:

One wants to be able to move towards the goal of software *components*.
You don't want to have to inspect the design of every *diode* that
you use; you want to look at it's overall specifications and decide
if those fit your needs.

Sure, I would love to see really reusable components. But IMHO we
are quite far from that.

Do you use the standard libraries?

Yes, I uses libraries when appropriate.

Aren't THEY components?

Well, some folks expect more from components than from
traditional libraries. Some evan claim to deliver.
However, libraries have limitations and ATM I see nothing
that fundamentally change situation.

You rely on the compiler to decide how to divide X by Y -- instead
of writing your own division routine.

Well, normally in C code I relay on compiler provied division.
To say the truth, my MCU code uses division sparingly, only
when I can not avoid it. OTOH I also use languages with
multiprecision integers. In one case I use complier provided
routines, but I am provider of modifed compiler and modification
includes replacement of division routine. In other case I
override compiler supplied division routine by my own (which
in turn sends real work to external library).

How often do you reimplement
?printf() to avoid all of the bloat that typically accompanies it?

I did that once (for OS kernel where standard library would not
work). If needed I can reuse it. On PC-s I am not worried by
bloat due to printf. OTOH, on MCU-s I am not sure if I ever used
printf. Rather, printing was done by specialized routines
either library provided or my own.

(when was the last time you needed ALL of those format specifiers
in an application? And modifiers?

There are some things which are reusable
if you accept modest to severe overhead.

What you need is components with varying characteristics.
You can buy diodes with all sorts of current carrying capacities,
PIVs, package styles, etc. But, they all still perform the
same function. Why so many different part numbers? Why not
just use the biggest, baddest diode in ALL your circuits?

I heard such electronic analogies many times. But they miss
important point: there is no way for me to make my own diode,
I am stuck with what is available on the market. And diode
is logically pretty simple component, yet we need many kinds.

I.e., we readily accept differences in "standard components"
in other disciplines; why not when it comes to software
modules?

Well, software is _much_ more compilcated than physical
engineering artifacts. Physical thing may have 10000 joints,
but if joints are identical, then this is moral equivalent of
simple loop that just iterates fixed number of times.
At software level number of possible pre-composed blocks
is so large that it is infeasible to deliver all of them.
Classic trick it to parametrize. However even if you
parametrize there are hundreds of design decisions going
into relatively small piece of code. If you expose all
design decisions then user as well may write his/her own
code because complexity will be similar. So normaly
parametrization is limited and there will be users who
find hardcoded desion choices inadequate.

Another things is that current tools are rather weak
at supporting parametrization.

For example things tends
to compose nicely if you dynamically allocate everything and use
garbage collection. But performace cost may be substantial.
And in embedded setting garbage collection may be unacceptable.
In some cases I have found out that I can get much better
speed joing things that could be done as composition of library
operations into single big routine.

Sure, but now you're tuning a solution to a specific problem.
I've designed custom chips to solve particular problems.
But, they ONLY solve those particular problems! OTOH,
I use lots of OTC components in my designs because those have
been designed (for the most part) with an eye towards
meeting a variety of market needs.

Maybe I made wrong impression, I think some explanation is in
place here. I am trying to make my code reusable. For my
problems performance is important part of reusablity: our
capability to solve problem is limited by performance and with
better perfomance users can solve bigger problems. I am
re-using code that I can and I would re-use more if I could
but there there are technical obstacles. Also, while I am
trying to make my code reusable, there are intrusive
design decision which may interfere with your possiobility
and willingness to re-use.

In slightly different spirit: in another thread you wrote
about accessing disc without OS file cache. Here I
normaly depend on OS and OS file caching is big thing.
It is not perfect, but OS (OK, at least Linux) is doing
this resonably well I have no temptation to avoid it.
And I appreciate that with OS cache performance is
usually much better that would be "without cache".
OTOH, I routinly avoid stdio for I/O critical things
(so no printf in I/O critical code).

In other cases I fixed
bugs by replacing composition of library routines by a single
routine: there were interactions making simple composition
incorrect. Correct alterantive was single routine.

As I wrote my embedded programs are simple and small. But I
use almost no external libraries. Trying some existing libraries
I have found out that some produce rather large programs, linking
in a lot of unneeded stuff.

Because they try to address a variety of solution spaces without
trying to be "optimal" for any. You trade flexibility/capability
for speed/performance/etc.

I think that this is more subtle: libraries frequently force some
way of doing things. Which may be good if you try to quickly roll
solution and are within capabilities of library. But if you
need/want different design, then library may be too inflexible
to deliver it.

Of course, writing for scratch
will not scale to bigger programs. OTOH, I feel that with
proper tooling it would be possible to retain efficiency and
small code size at least for large class of microntroller
programs (but existing tools and libraries do not support this).

Templates are an attempt in this direction. Allowing a class of
problems to be solved once and then tailored to the specific
application.

Yes, templates could help. But they also have problems. One
of them is that (among others) I would like to target STM8
and I have no C++ compiler for STM8. My idea is to create
custom "optimizer/generator" for (annotated) C code.
ATM it is vapourware, but I think it is feasible with
reasonable effort.

But, personal experience is where you win the most. You write
your second or third UART driver and start realizing that you
could leverage a previous design if you'd just thought it out
more fully -- instead of tailoring it to the specific needs
of the original application.

And, as you EXPECT to be reusing it in other applications (as
evidenced by the fact that it's your third time writing the same
piece of code!), you anticipate what those *might* need and
think about how to implement those features "economically".

It's rare that an application is *so* constrained that it can't
afford a couple of extra lines of code, here and there. If
you've considered efficiency in the design of your algorithms,
then these little bits of inefficiency will be below the noise floor.

Well, I am not talking about "couple of extra lines". Rather
about IMO substantial fixed overhead. As I wrote, one of my
targets is STM8 with 8k flash, another is MSP430 with 16k flash,
another is STM32 with 16k flash (there are also bigger targets).
One of libraries/frameworks for STM32 after activating few featurs
pulled in about 16k code, this is substantial overhead given
how little features I needed. Other folks reported that for
trivial programs vendor supplied frameworks pulled close to 30k
code. That may be fine if you have bigger device and need features,
but for smaller MCU-s it may be difference between not fitting into
device or (without library) having plenty of free space.

When I tried it Free RTOS for STM32 needed about 8k flash. Which
is fine if you need RTOS. But ATM my designs run without RTOS.

I have found libopencm3 to have small overhead. But is routines
are doing so little that direct register access may give simpler
code.

Unlikely that this code will describe itself as "works well enough
SOME of the time..."

And, when/if you stumble on such faults, good luck explaining to
your customer why it's going to take longer to fix and retest the
*existing* codebase before you can get on with your modifications...

Commercial vendors like to say how good their progam are. But
market reality is that program my be quite bad and still sell.

The same is true of FOSS -- despite the claim that many eyes (may)
have looked at it (suggesting that bugs would have been caught!)

From "KLEE: Unassisted and Automatic Generation of High-Coverage
Tests for Complex Systems Programs":

KLEE finds important errors in heavily-tested code. It
found ten fatal errors in COREUTILS (including three
that had escaped detection for 15 years), which account
for more crashing bugs than were reported in 2006, 2007
and 2008 combined. It further found 24 bugs in BUSYBOX, 21
bugs in MINIX, and a security vulnerability in HISTAR? a
total of 56 serious bugs.

Ooops! I wonder how many FOSS *eyes* missed those errors?

Open source folks tend to be more willing to talk about bugs.
And the above nicely shows that there is a lot of bugs, most
waiting to by discovered.

Every time you reinvent a solution, you lose much of the benefit
of the previous TESTED solution.

TESTED part works for simple repeatable tasks. But if you have
complex task it is quite likely that you will be the first
person with given use case. gcc is borderline case: if you
throw really new code at it you can expect to see bugs.
gcc user community it large and there is resonable chance that
sombody wrote earlier code which is sufficiently similar to
yours to catch troubles. But there are domains that are at
least as complicated as compilation and have much smaller
user community. You may find out that there are _no_ code
that could be reasonably re-used. Were you ever in situation
when you looked how some "standard library" solves a tricky
problem and realized that in fact library does not solve
the problem?

--
Waldek Hebisch

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Don Y@21:1/5 to antispam@math.uni.wroc.pl on Sun Oct 31 20:37:25 2021

On 10/31/2021 3:54 PM, antispam@math.uni.wroc.pl wrote:

Aren't THEY components?

Well, some folks expect more from components than from
traditional libraries. Some evan claim to deliver.
However, libraries have limitations and ATM I see nothing
that fundamentally change situation.

A component is something that you can use as a black box,
without having to reinvent it. It is the epitome of reuse.

How often do you reimplement
?printf() to avoid all of the bloat that typically accompanies it?

I did that once (for OS kernel where standard library would not
work). If needed I can reuse it. On PC-s I am not worried by
bloat due to printf. OTOH, on MCU-s I am not sure if I ever used
printf. Rather, printing was done by specialized routines
either library provided or my own.

You can also create a ?printf() that you can configure at build time to
support the modifiers and specifiers that you know you will need.

Just like you can configure a UART driver to support a FIFO size defined
at configuration, hardware handshaking, software flowcontrol, the
high and low water marks for each of those (as they can be different),
the character to send to request the remote to stop transmitting,
the character you send to request resumption of transmission, which
character YOU will recognize as requesting your Tx channel to pause,
the character (or condition) you will recognize to resume your Tx,
whether or not you will sample the condition codes in the UART, how
you read/write the data register, how you read/write the status register,
etc.

While these sound like lots of options, they are all relatively
trivial additions to the code.

(when was the last time you needed ALL of those format specifiers
in an application? And modifiers?

There are some things which are reusable
if you accept modest to severe overhead.

What you need is components with varying characteristics.
You can buy diodes with all sorts of current carrying capacities,
PIVs, package styles, etc. But, they all still perform the
same function. Why so many different part numbers? Why not
just use the biggest, baddest diode in ALL your circuits?

I heard such electronic analogies many times. But they miss
important point: there is no way for me to make my own diode,

Sure there is! It is just not an efficient way of spending your
resources when you have so many OTS offerings available.

You can design your own processor. Why do you "settle" for an
OTS device (ANS: because there is so little extra added value
you will typically gain from rolling your own vs. the "inefficiency"
of using a COTS offering)

I am stuck with what is available on the market. And diode
is logically pretty simple component, yet we need many kinds.

I.e., we readily accept differences in "standard components"
in other disciplines; why not when it comes to software
modules?

Well, software is _much_ more compilcated than physical
engineering artifacts. Physical thing may have 10000 joints,
but if joints are identical, then this is moral equivalent of
simple loop that just iterates fixed number of times.

This is the argument in favor of components. You'd much rather
read a comprehensive specification ("datasheet") for a software
component than have to read through all of the code that implements
it. What if it was implemented in some programming language in
which you aren't expert? What if it was a binary "BLOB" and
couldn't be inspected?

At software level number of possible pre-composed blocks
is so large that it is infeasible to deliver all of them.

You don't have to deliver all of them. When you wire a circuit,
you still have to *solder* connections, don't you? The
components don't magically glue themselves together...

Classic trick it to parametrize. However even if you
parametrize there are hundreds of design decisions going
into relatively small piece of code. If you expose all
design decisions then user as well may write his/her own
code because complexity will be similar. So normaly
parametrization is limited and there will be users who
find hardcoded desion choices inadequate.

Another things is that current tools are rather weak
at supporting parametrization.

Look at a fleshy UART driver and think about how you would decompose
it into N different variants that could be "compile time configurable".
You'll be surprised as to how easy it is. Even if the actual UART
hardware differs from instance to instance.

For example things tends
to compose nicely if you dynamically allocate everything and use
garbage collection. But performace cost may be substantial.
And in embedded setting garbage collection may be unacceptable.
In some cases I have found out that I can get much better
speed joing things that could be done as composition of library
operations into single big routine.

Sure, but now you're tuning a solution to a specific problem.
I've designed custom chips to solve particular problems.
But, they ONLY solve those particular problems! OTOH,
I use lots of OTC components in my designs because those have
been designed (for the most part) with an eye towards
meeting a variety of market needs.

Maybe I made wrong impression, I think some explanation is in
place here. I am trying to make my code reusable. For my
problems performance is important part of reusablity: our
capability to solve problem is limited by performance and with
better perfomance users can solve bigger problems. I am
re-using code that I can and I would re-use more if I could
but there there are technical obstacles. Also, while I am
trying to make my code reusable, there are intrusive
design decision which may interfere with your possiobility
and willingness to re-use.

If you don't know where the design is headed, then you can't
pick the components that it will need.

I approach a design from the top (down) and bottom (up). This
lets me gauge the types of information that I *may* have
available from the hardware -- so I can sort out how to
approach those limitations from above. E.g., if I can't
control the data rate of a comm channel, then I either have
to ensure I can catch every (complete) message *or* design a
protocol that lets me detect when I've missed something.

There are costs to both approaches. If I dedicate resource to
ensuring I don't miss anything, then some other aspect of the
design will bear that cost. If I rely on detecting missed
messages, then I have to put a figure on their relative
likelihood so my device doesn't fail to provide its desired
functionality (because it is always missing one or two characters
out of EVERY message -- and, thus, sees NO messages).

In slightly different spirit: in another thread you wrote
about accessing disc without OS file cache. Here I
normaly depend on OS and OS file caching is big thing.
It is not perfect, but OS (OK, at least Linux) is doing
this resonably well I have no temptation to avoid it.
And I appreciate that with OS cache performance is
usually much better that would be "without cache".
OTOH, I routinly avoid stdio for I/O critical things
(so no printf in I/O critical code).

My point about the cache was that it is of no value in my case;
I'm not going to revisit a file once I've seen it the first
time (so why hold onto that data?)

In other cases I fixed
bugs by replacing composition of library routines by a single
routine: there were interactions making simple composition
incorrect. Correct alterantive was single routine.

As I wrote my embedded programs are simple and small. But I
use almost no external libraries. Trying some existing libraries
I have found out that some produce rather large programs, linking
in a lot of unneeded stuff.

Because they try to address a variety of solution spaces without
trying to be "optimal" for any. You trade flexibility/capability
for speed/performance/etc.

I think that this is more subtle: libraries frequently force some
way of doing things. Which may be good if you try to quickly roll
solution and are within capabilities of library. But if you
need/want different design, then library may be too inflexible
to deliver it.

Use a different diode.

Of course, writing for scratch
will not scale to bigger programs. OTOH, I feel that with
proper tooling it would be possible to retain efficiency and
small code size at least for large class of microntroller
programs (but existing tools and libraries do not support this).

Templates are an attempt in this direction. Allowing a class of
problems to be solved once and then tailored to the specific
application.

Yes, templates could help. But they also have problems. One
of them is that (among others) I would like to target STM8
and I have no C++ compiler for STM8. My idea is to create
custom "optimizer/generator" for (annotated) C code.
ATM it is vapourware, but I think it is feasible with
reasonable effort.

But, personal experience is where you win the most. You write
your second or third UART driver and start realizing that you
could leverage a previous design if you'd just thought it out
more fully -- instead of tailoring it to the specific needs
of the original application.

And, as you EXPECT to be reusing it in other applications (as
evidenced by the fact that it's your third time writing the same
piece of code!), you anticipate what those *might* need and
think about how to implement those features "economically".

It's rare that an application is *so* constrained that it can't
afford a couple of extra lines of code, here and there. If
you've considered efficiency in the design of your algorithms,
then these little bits of inefficiency will be below the noise floor.

Well, I am not talking about "couple of extra lines". Rather
about IMO substantial fixed overhead. As I wrote, one of my
targets is STM8 with 8k flash, another is MSP430 with 16k flash,
another is STM32 with 16k flash (there are also bigger targets).
One of libraries/frameworks for STM32 after activating few featurs
pulled in about 16k code, this is substantial overhead given
how little features I needed. Other folks reported that for
trivial programs vendor supplied frameworks pulled close to 30k

A "framework" is considerably more than a set of individually
selectable components. I've designed products with 2KB of code and
128 bytes of RAM. The "components" were ASM modules instead of
HLL modules. Each told me how big it was, how much RAM it required,
how deep the stack penetration when invoked, how many T-states
(worst case) to execute, etc.

So, before I designed the hardware, I knew what I would need
by way of ROM/RAM (before the days of FLASH) and could commit
the hardware to foil without fear of running out of "space" or
"time".

code. That may be fine if you have bigger device and need features,
but for smaller MCU-s it may be difference between not fitting into
device or (without library) having plenty of free space.

Sure. But a component will have a datasheet that tells you what
it provides and at what *cost*.

When I tried it Free RTOS for STM32 needed about 8k flash. Which
is fine if you need RTOS. But ATM my designs run without RTOS.

RTOS is a commonly misused term. Many are more properly called
MTOSs (they provide no real timeliness guarantees, just multitasking primitives).

IMO, the advantages of writing in a multitasking environment so
far outweigh the "costs" of an MTOS that it behooves one to consider
how to shoehorn that functionality into EVERY design.

When writing in a HLL, there are complications that impose
constraints on how the MTOS provides its services. But, for small
projects written in ASM, you can gain the benefits of an MTOS
for very few bytes of code (and effectively zero RAM).

I have found libopencm3 to have small overhead. But is routines
are doing so little that direct register access may give simpler
code.

Unlikely that this code will describe itself as "works well enough
SOME of the time..."

And, when/if you stumble on such faults, good luck explaining to
your customer why it's going to take longer to fix and retest the
*existing* codebase before you can get on with your modifications...

Commercial vendors like to say how good their progam are. But
market reality is that program my be quite bad and still sell.

The same is true of FOSS -- despite the claim that many eyes (may)
have looked at it (suggesting that bugs would have been caught!)

From "KLEE: Unassisted and Automatic Generation of High-Coverage
Tests for Complex Systems Programs":

KLEE finds important errors in heavily-tested code. It
found ten fatal errors in COREUTILS (including three
that had escaped detection for 15 years), which account
for more crashing bugs than were reported in 2006, 2007
and 2008 combined. It further found 24 bugs in BUSYBOX, 21
bugs in MINIX, and a security vulnerability in HISTAR? a
total of 56 serious bugs.

Ooops! I wonder how many FOSS *eyes* missed those errors?

Open source folks tend to be more willing to talk about bugs.
And the above nicely shows that there is a lot of bugs, most
waiting to by discovered.

Part of the problem is ownership of the codebase. You are
more likely to know where your own bugs lie -- and, more
willing to fix them ("pride of ownership"). When a piece
of code is shared, over time, there seems to be less incentive
for folks to tackle big -- often dubious -- issues as the
"reward" is minimal (i.e., you may not own the code when the bug
eventually becomes a problem)

Every time you reinvent a solution, you lose much of the benefit
of the previous TESTED solution.

TESTED part works for simple repeatable tasks. But if you have
complex task it is quite likely that you will be the first
person with given use case. gcc is borderline case: if you
throw really new code at it you can expect to see bugs.
gcc user community it large and there is resonable chance that
sombody wrote earlier code which is sufficiently similar to
yours to catch troubles. But there are domains that are at
least as complicated as compilation and have much smaller
user community. You may find out that there are _no_ code
that could be reasonably re-used. Were you ever in situation
when you looked how some "standard library" solves a tricky
problem and realized that in fact library does not solve
the problem?

As I said, your *personal* experience tells you where YOU will
likely benefit. I did a stint with a company that manufactured telecommunications kit. We had all sorts of bizarre interface
protocols with which we had to contend (e.g., using RLSD as
a hardware "pacing" signal). So, it was worthwhile to spend
time developing a robust UART driver (and handler, above it)
as you *knew* the next project would likely have need of it,
in some form or other.

If you're working free-lance and client A needs a BITBLTer
for his design, you have to decide how likely client B
(that you haven't yet met) will be to need the same sort
of module/component.

For example, I've never (until recently) needed to interface
to a disk controller in a product. So, I don't have a
ready-made "component" in my bag-of-tricks. When I look
at a new project, I "take inventory" of what I am likely to
need... and compare that to what I know I have "in stock".
If there's a lot of overlap, then my confidence in my bid
goes up. If there'a a lot of new ground that I'll have to
cover, then it goes down (and the price goes up!).

Reuse helps you better estimate new projects, especially as
projects grow in complexity.

[There's nothing worse than having to upgrade someone else's
design that didn't plan for the future. It's as if you
have to redesign the entire product from scratch --- despite
the fact that it *seems* to work, "as is" (but, not "as desired"!]

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From antispam@math.uni.wroc.pl@21:1/5 to Don Y on Thu Nov 11 04:34:05 2021

Don Y <blockedofcourse@foo.invalid> wrote:

On 10/31/2021 3:54 PM, antispam@math.uni.wroc.pl wrote:

Aren't THEY components?

Well, some folks expect more from components than from
traditional libraries. Some evan claim to deliver.
However, libraries have limitations and ATM I see nothing
that fundamentally change situation.

A component is something that you can use as a black box,
without having to reinvent it. It is the epitome of reuse.

How often do you reimplement
?printf() to avoid all of the bloat that typically accompanies it?

I did that once (for OS kernel where standard library would not
work). If needed I can reuse it. On PC-s I am not worried by
bloat due to printf. OTOH, on MCU-s I am not sure if I ever used
printf. Rather, printing was done by specialized routines
either library provided or my own.

You can also create a ?printf() that you can configure at build time to support the modifiers and specifiers that you know you will need.

Just like you can configure a UART driver to support a FIFO size defined
at configuration, hardware handshaking, software flowcontrol, the
high and low water marks for each of those (as they can be different),
the character to send to request the remote to stop transmitting,
the character you send to request resumption of transmission, which
character YOU will recognize as requesting your Tx channel to pause,
the character (or condition) you will recognize to resume your Tx,
whether or not you will sample the condition codes in the UART, how
you read/write the data register, how you read/write the status register, etc.

While these sound like lots of options, they are all relatively
trivial additions to the code.

(when was the last time you needed ALL of those format specifiers
in an application? And modifiers?

There are some things which are reusable
if you accept modest to severe overhead.

What you need is components with varying characteristics.
You can buy diodes with all sorts of current carrying capacities,
PIVs, package styles, etc. But, they all still perform the
same function. Why so many different part numbers? Why not
just use the biggest, baddest diode in ALL your circuits?

<snip>

I am stuck with what is available on the market. And diode
is logically pretty simple component, yet we need many kinds.

I.e., we readily accept differences in "standard components"
in other disciplines; why not when it comes to software
modules?

Well, software is _much_ more compilcated than physical
engineering artifacts. Physical thing may have 10000 joints,
but if joints are identical, then this is moral equivalent of
simple loop that just iterates fixed number of times.

This is the argument in favor of components. You'd much rather
read a comprehensive specification ("datasheet") for a software
component than have to read through all of the code that implements
it.

Well, if there is simple to use component that performs what
you need, then using it is fine. However, for many tasks
once component is flexible enough to cover both your and
my needs its specification may be longer and more tricky
than code doing task at hand.

What if it was implemented in some programming language in
which you aren't expert? What if it was a binary "BLOB" and
couldn't be inspected?

There are many reasons when existing code can not be reused.
Concerning BLOB-s, I am trying to avoid them and in first
order approximation I am not using them. One (serious IMO)
problem with BLOB-s is that sooner or later they will be
incompatible with other things (OS/other libraries/my code).
Very old source code usually can be run on modern systems
with modest effort. BLOB-s normally would require much
more effort.

At software level number of possible pre-composed blocks
is so large that it is infeasible to deliver all of them.

You don't have to deliver all of them. When you wire a circuit,
you still have to *solder* connections, don't you? The
components don't magically glue themselves together...

Yes, one needs to make connections. In fact, in programming
most work is "making connections". So you want something
which is simple to connect. In other words, you can all
parts of your design to play nicely together. With code
deliverd by other folks that is not always the case.

Classic trick it to parametrize. However even if you
parametrize there are hundreds of design decisions going
into relatively small piece of code. If you expose all
design decisions then user as well may write his/her own
code because complexity will be similar. So normaly
parametrization is limited and there will be users who
find hardcoded desion choices inadequate.

Another things is that current tools are rather weak
at supporting parametrization.

Look at a fleshy UART driver and think about how you would decompose
it into N different variants that could be "compile time configurable". You'll be surprised as to how easy it is. Even if the actual UART
hardware differs from instance to instance.

UART-s are simple. And yet some things are tricky: in C to have
"compile time configurable" buffer size you need to use macros.
Works, but in a sense UART implementation "leaks" to user code.

For example things tends
to compose nicely if you dynamically allocate everything and use
garbage collection. But performace cost may be substantial.
And in embedded setting garbage collection may be unacceptable.
In some cases I have found out that I can get much better
speed joing things that could be done as composition of library
operations into single big routine.

Sure, but now you're tuning a solution to a specific problem.
I've designed custom chips to solve particular problems.
But, they ONLY solve those particular problems! OTOH,
I use lots of OTC components in my designs because those have
been designed (for the most part) with an eye towards
meeting a variety of market needs.

Maybe I made wrong impression, I think some explanation is in
place here. I am trying to make my code reusable. For my
problems performance is important part of reusablity: our
capability to solve problem is limited by performance and with
better perfomance users can solve bigger problems. I am
re-using code that I can and I would re-use more if I could
but there there are technical obstacles. Also, while I am
trying to make my code reusable, there are intrusive
design decision which may interfere with your possiobility
and willingness to re-use.

If you don't know where the design is headed, then you can't
pick the components that it will need.

Well, there are routine tasks, for them it is natural to
re-use existing code. There are new tasks that are "almost"
routine, than one can come with good design at the start.
But in a sense "interesting" tasks are when at start you
have only limited understanding. In such case it is hard
to know "where the design is headed", except that it is
likely to change. Of course, customer may be dissatisfied
if you tell "I will look at the problem and maybe I will
find solution". But lack of understanding is normal
in research (at starting point), and I think that software
houses also do risky projects hoping that big win on succesful
ones will cover losses on failures.

I approach a design from the top (down) and bottom (up). This
lets me gauge the types of information that I *may* have
available from the hardware -- so I can sort out how to
approach those limitations from above. E.g., if I can't
control the data rate of a comm channel, then I either have
to ensure I can catch every (complete) message *or* design a
protocol that lets me detect when I've missed something.

Well, with UART there will be some fixed transmission rate
(with wrong clock frequency UART would be unable to receive
anything). I would expect MCU to be able to receive all
incoming characters (OK, assuming hardware UART with drivier
using high priority interrupt). So, detecting that you got too
much should not be too hard. OTOH, sensibly handling
excess input is different issue: if characters are coming
faster than you can process them, then either your CPU is
underpowered or there is some failure causing excess transmission.
In either case specific application will dictate what
should be avoided.

There are costs to both approaches. If I dedicate resource to
ensuring I don't miss anything, then some other aspect of the
design will bear that cost. If I rely on detecting missed
messages, then I have to put a figure on their relative
likelihood so my device doesn't fail to provide its desired
functionality (because it is always missing one or two characters
out of EVERY message -- and, thus, sees NO messages).

My thinking goes toward using relatively short messages and
buffer big enough for two messages. If there is need for
high speed I would go for continous messages and DMA
transfers (using break interrupt to discover end of message
in case of variable length messages). So device should
be able to get all messages and in case of excess message
trafic whole message could be dropped (possibly looking
first for some high priority messages). Of course, there
may be some externaly mandated message format and/or
communitation protocal making DMA inappropriate.
Still, assuming interrupts, all characters should reach
interrupt handler, causing possibly some extra CPU
load. The only possiblity of unnoticed loss of characters
would be blocking interrupts too long. If interrupts can
be blocked for too long, then I would expect loss of whole
messages. In such case protocol should have something like
"dont talk to me for next 100 miliseconds, I will be busy"
to warn other nodes and request silence. Now, if you
need to faithfully support sillyness like Modbus RTU timeouts,
then I hope that you are adequatly paid...

In slightly different spirit: in another thread you wrote
about accessing disc without OS file cache. Here I
normaly depend on OS and OS file caching is big thing.
It is not perfect, but OS (OK, at least Linux) is doing
this resonably well I have no temptation to avoid it.
And I appreciate that with OS cache performance is
usually much better that would be "without cache".
OTOH, I routinly avoid stdio for I/O critical things
(so no printf in I/O critical code).

My point about the cache was that it is of no value in my case;
I'm not going to revisit a file once I've seen it the first
time (so why hold onto that data?)

Well, OS "cache" has many functions. One of them is read-ahead,
another is scheduling of requests to minimize seek time.
And beside data there is also meta-data. OS functions need
access to meta-data and OS-es are designed under assumption
that there is decent cache hit rate on meta-data access.

In other cases I fixed
bugs by replacing composition of library routines by a single
routine: there were interactions making simple composition
incorrect. Correct alterantive was single routine.

As I wrote my embedded programs are simple and small. But I
use almost no external libraries. Trying some existing libraries
I have found out that some produce rather large programs, linking
in a lot of unneeded stuff.

Because they try to address a variety of solution spaces without
trying to be "optimal" for any. You trade flexibility/capability
for speed/performance/etc.

I think that this is more subtle: libraries frequently force some
way of doing things. Which may be good if you try to quickly roll
solution and are within capabilities of library. But if you
need/want different design, then library may be too inflexible
to deliver it.

Use a different diode.

Well, when needed I use my own library.

Of course, writing for scratch
will not scale to bigger programs. OTOH, I feel that with
proper tooling it would be possible to retain efficiency and
small code size at least for large class of microntroller
programs (but existing tools and libraries do not support this).

Templates are an attempt in this direction. Allowing a class of
problems to be solved once and then tailored to the specific
application.

Yes, templates could help. But they also have problems. One
of them is that (among others) I would like to target STM8
and I have no C++ compiler for STM8. My idea is to create
custom "optimizer/generator" for (annotated) C code.
ATM it is vapourware, but I think it is feasible with
reasonable effort.

But, personal experience is where you win the most. You write
your second or third UART driver and start realizing that you
could leverage a previous design if you'd just thought it out
more fully -- instead of tailoring it to the specific needs
of the original application.

And, as you EXPECT to be reusing it in other applications (as
evidenced by the fact that it's your third time writing the same
piece of code!), you anticipate what those *might* need and
think about how to implement those features "economically".

It's rare that an application is *so* constrained that it can't
afford a couple of extra lines of code, here and there. If
you've considered efficiency in the design of your algorithms,
then these little bits of inefficiency will be below the noise floor.

Well, I am not talking about "couple of extra lines". Rather
about IMO substantial fixed overhead. As I wrote, one of my
targets is STM8 with 8k flash, another is MSP430 with 16k flash,
another is STM32 with 16k flash (there are also bigger targets).
One of libraries/frameworks for STM32 after activating few featurs
pulled in about 16k code, this is substantial overhead given
how little features I needed. Other folks reported that for
trivial programs vendor supplied frameworks pulled close to 30k

A "framework" is considerably more than a set of individually
selectable components. I've designed products with 2KB of code and
128 bytes of RAM. The "components" were ASM modules instead of
HLL modules. Each told me how big it was, how much RAM it required,
how deep the stack penetration when invoked, how many T-states
(worst case) to execute, etc.

Nice, but I am not sure how practical this would be in modern
times. I have C code and can resonably estimate resource use.
But there are changable parameters which may enable/disable
some parts. And size/speed/stack use depends on compiler
optimizations. So there is variation. And there are traps.
Linker transitively pulls dependencies, it there are "false"
dependencies, they can pull much more than strictly needed.
One example of "false" dependence are (or maybe were) C++
VMT-s. Namely, any use of object/class pulled VMT which in
turn pulled all ancestors and methods. If unused methods
referenced other classes that could easily cascade. In both
cases authors of libraries probably thought that provided
"goodies" justified size (intended targets were larger).

So, before I designed the hardware, I knew what I would need
by way of ROM/RAM (before the days of FLASH) and could commit
the hardware to foil without fear of running out of "space" or
"time".

code. That may be fine if you have bigger device and need features,
but for smaller MCU-s it may be difference between not fitting into
device or (without library) having plenty of free space.

Sure. But a component will have a datasheet that tells you what
it provides and at what *cost*.

My 16x2 text LCD routine may pull I2C driver. If I2C is not needed
anyway, this is additional cost, otherwise cost is shared.
LCD routine depends also on timer. Both timer and I2C affect
MCU initialization. So even in very simple situations total
cost is rather complex. And libraries that I tried presumably
were not "components" in your sense, you had to link the program
to learn total size. Documentation mentioned dependencies,
when they affected correctness but otherwise not. To say
the truth, when library supports hundreds or thousends of different
targets (combinations of CPU core, RAM/ROM sizes, peripherial
configurations) with different compilers, then there is hard
to make exact statements.

IMO, in ideal world for "standard" MCU functionality we would
have configuration tool where user can specify needed
functionality and tool would generate semi-custom code
and estimate its resource use. MCU vendor tools attempt to
offer something like this, but reports I heard were rather
unfavourable, in particular it seems that vendors simply
deliver thick library that supports "everything", and
linking to this library causes code bloat.

When I tried it Free RTOS for STM32 needed about 8k flash. Which
is fine if you need RTOS. But ATM my designs run without RTOS.

RTOS is a commonly misused term. Many are more properly called
MTOSs (they provide no real timeliness guarantees, just multitasking primitives).

Well, Free RTOS comes with "no warranty", but AFAICS they make
honest effort to have good real time behaviour. In particular,
code paths trough Free RTOS from events to user code are of
bounded and rather short length. User code still may be
delayed by interrupts/process priorities, but they give resonable
explanation. So it is up to user to code things in way that gives
needed real-time behaviour, but Free RTOS normally will not spoil it
and may help.

IMO, the advantages of writing in a multitasking environment so
far outweigh the "costs" of an MTOS that it behooves one to consider
how to shoehorn that functionality into EVERY design.

When writing in a HLL, there are complications that impose
constraints on how the MTOS provides its services. But, for small
projects written in ASM, you can gain the benefits of an MTOS
for very few bytes of code (and effectively zero RAM).

Well, looking at books and articles I did not find convincing
argument/example showing that one really need multitasking for
small systems. I tend to think rather in terms of collection
of coupled finite state machines (or if you prefer Petri net).
State machines transition in response to events and may generate
events. Each finite state machine could be a task. But it is
not clear if it should. Some transitions are simple and should
be fast and that I would do in interrupt handlers. Some
other are triggered in regular way from other machines and
are naturally handled by function calls. Some need queues.
The whole thing fits resonably well in "super loop" paradigm.

I have found one issue that at first glance "requires"
multitasking. Namely, when one wants to put system in
sleep mode when there is no work natural "super loop"
approach looks like

if (work_to_do) {
do_work();
} else {
wait_for_interrupt();
}

where 'work_to_do' is flag which may be set by interrupt handlers.
But there is nasty race condition, if interrupt comes between
test for 'work_to_do' and 'wait_for_interrupt': despite
having work to do system will go to sleep and only wake on
next interrupt (which depending on specific requirements may
be harmless or disaster). I was unable to find simple code
that avoids this race. With multitasking kernel race vanishes:
there is idle task which is only doing 'wait_for_interrupt'
and OS scheduler passes control to worker tasks when there is
work to do. But when one looks how multitasker avoids race,
then it is clear that crucial point is doing control transfer
via return from interrupt. More precisely, variables are
tested with interrupts disabled and after decision is made
return from interrupt transfers control. Important point is
that if interrupt comes after control transfer interrupt handler
will re-do test before returning to user code. So what is needed
is piece of low-level code that uses return from interrupt for
control transfer and all interrupt handlers need to jump to
this code when finished. The rest (usually majority) of
multitasker is not needed...

Unlikely that this code will describe itself as "works well enough
SOME of the time..."

And, when/if you stumble on such faults, good luck explaining to
your customer why it's going to take longer to fix and retest the
*existing* codebase before you can get on with your modifications...

Commercial vendors like to say how good their progam are. But
market reality is that program my be quite bad and still sell.

The same is true of FOSS -- despite the claim that many eyes (may)
have looked at it (suggesting that bugs would have been caught!)

From "KLEE: Unassisted and Automatic Generation of High-Coverage
Tests for Complex Systems Programs":

KLEE finds important errors in heavily-tested code. It
found ten fatal errors in COREUTILS (including three
that had escaped detection for 15 years), which account
for more crashing bugs than were reported in 2006, 2007
and 2008 combined. It further found 24 bugs in BUSYBOX, 21
bugs in MINIX, and a security vulnerability in HISTAR? a
total of 56 serious bugs.

Ooops! I wonder how many FOSS *eyes* missed those errors?

Open source folks tend to be more willing to talk about bugs.
And the above nicely shows that there is a lot of bugs, most
waiting to by discovered.

Part of the problem is ownership of the codebase. You are
more likely to know where your own bugs lie -- and, more
willing to fix them ("pride of ownership"). When a piece
of code is shared, over time, there seems to be less incentive
for folks to tackle big -- often dubious -- issues as the
"reward" is minimal (i.e., you may not own the code when the bug
eventually becomes a problem)

Ownership may cause problems: there is tendency to "solve"
problems locally, that is in code that given person "owns".
This is good if there is easy local solution. However, this
may also lead to ugly workarounds that really do not work
well, while problem is easily solvable in different part
("owned" by different programmer). I have seen such thing
several times, looking at whole codebase after some effort
it was possible to do simple fix, while there were workarounds
in different ("wrong") places. I had no contact with
original authors, but it seems that workarounds were due to
"ownership".

--
Waldek Hebisch

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Don Y@21:1/5 to antispam@math.uni.wroc.pl on Fri Nov 19 16:21:49 2021

On 11/10/2021 9:34 PM, antispam@math.uni.wroc.pl wrote:

Don Y <blockedofcourse@foo.invalid> wrote:

Classic trick it to parametrize. However even if you
parametrize there are hundreds of design decisions going
into relatively small piece of code. If you expose all
design decisions then user as well may write his/her own
code because complexity will be similar. So normaly
parametrization is limited and there will be users who
find hardcoded desion choices inadequate.

Another things is that current tools are rather weak
at supporting parametrization.

Look at a fleshy UART driver and think about how you would decompose
it into N different variants that could be "compile time configurable".
You'll be surprised as to how easy it is. Even if the actual UART
hardware differs from instance to instance.

UART-s are simple. And yet some things are tricky: in C to have
"compile time configurable" buffer size you need to use macros.
Works, but in a sense UART implementation "leaks" to user code.

You can configure using manifest constants, conditional compilation,
or even run-time switches. Or, by linking against different
"support" routines. How and where the configuration "leaks"
into user code is a function of the configuration mechanisms that
you decide to employ.

E.g., You'd likely NOT design your network stack to be tightly integrated
with your choice of NIC (all else being equal) -- simply because you'd
want to be able to reuse the stack with some *other* NIC without having
to rewrite it.

OTOH, it's not unexpected to want to isolate the caching of ARP results
in an "application specific" manner as you'll likely know the sorts (and number!) of clients/services with which the device in question will be connecting. So, that (sub)module can be replaced with something most appropriate to the application yet with a "standardized" interface to
the stack itself (*YOU* define that standard)

All of these require decisions up-front; you can't expect to be able to retrofit an existing piece of code (cheaply) to support a more modular/configurable implementation in the future.

But, personal experience teaches you what you are likely to need
by way of flexibility/configurability. Most folks tend to eork
in a very narrow set of application domains. Chances are, the
network stack you design for an embedded product will be considerably
different than one for a desktop OS. If you plan to straddle
both domains, then the configurability challenge is greater!

There are costs to both approaches. If I dedicate resource to
ensuring I don't miss anything, then some other aspect of the
design will bear that cost. If I rely on detecting missed
messages, then I have to put a figure on their relative
likelihood so my device doesn't fail to provide its desired
functionality (because it is always missing one or two characters
out of EVERY message -- and, thus, sees NO messages).

My thinking goes toward using relatively short messages and
buffer big enough for two messages.

You can also design with the intent of parsing messages before they are complete and "reducing" them along the way. This is particularly
important if messages can have varying length *or* there is a possibility
for the ass end of a message to get dropped (how do you know when the
message is complete? Imagine THE USER misconfiguring your device
to expect CRLFs and the traffic only contains newlines; the terminating
CRLF never arrives!)

[At the limit case, a message reduces to a concept -- that is represented
in some application specific manner: "Start the motor", "Clear the screen", etc.]

Barcodes are messages (character sequences) of a sort. I typically
process a barcode at several *concurrent* levels:
- an ISR that captures the times of transitions (black->white->black)
- a task that reduces the data captured by the ISR into "bar widths"
- a task that aggregates bar widths to form characters
- a task that parses character sequences to determine valid messages
- an application layer interpretation (or discard) of that message
This allows each layer to decide when the data on which it relies
does not represent a valid barcode and discard some (or all) of it...
without waiting for a complete message to be present. So, the
resources that were consumed by that (partial?) message are
freed earlier.

As such, there is never a "start time" nor "end time" for a barcode
message -- because you don't want the user to have to "do something"
to tell you that he is now going to scan a barcode (otherwise, the
efficiency of using barcodes is subverted).

[Think about the sorts of applications that use barcodes; how many
require the user to tell the device "here comes a barcode, please start
your decoder algorithm NOW!"]

As users can abuse the barcode reader (there is nothing preventing them
from continuously scanning barcodes, in violation of any "protocol"
that the product may *intend*), you have to tolerate the case where
the data arrives faster than it can be consumed. *Knowing* where
(in the event stream) you may have "lost" some data (transitions,
widths, characters or messages) lets you resync to a less pathological
event stream later (when the user starts "behaving properly")

If there is need for
high speed I would go for continous messages and DMA
transfers (using break interrupt to discover end of message
in case of variable length messages). So device should
be able to get all messages and in case of excess message
trafic whole message could be dropped (possibly looking
first for some high priority messages). Of course, there
may be some externaly mandated message format and/or
communitation protocal making DMA inappropriate.
Still, assuming interrupts, all characters should reach
interrupt handler, causing possibly some extra CPU
load. The only possiblity of unnoticed loss of characters
would be blocking interrupts too long. If interrupts can
be blocked for too long, then I would expect loss of whole
messages. In such case protocol should have something like
"dont talk to me for next 100 miliseconds, I will be busy"
to warn other nodes and request silence. Now, if you
need to faithfully support sillyness like Modbus RTU timeouts,
then I hope that you are adequatly paid...

IMO, the advantages of writing in a multitasking environment so
far outweigh the "costs" of an MTOS that it behooves one to consider
how to shoehorn that functionality into EVERY design.

When writing in a HLL, there are complications that impose
constraints on how the MTOS provides its services. But, for small
projects written in ASM, you can gain the benefits of an MTOS
for very few bytes of code (and effectively zero RAM).

Well, looking at books and articles I did not find convincing argument/example showing that one really need multitasking for
small systems.

The advantages of multitasking lie in problem decomposition.
Smaller problems are easier to "get right", in isolation.
The *challenge* of multitasking is coordinating the interactions
between these semi-concurrent actors. Experience teaches you how
to partition a "job".

I want to blink a light at 1 Hz and check for a button to be
pressed which will start some action that may be lengthy. I
can move the light blink into an ISR (which GENERALLY is a ridiculous
use of that "resource") to ensure the 1Hz timeliness is maintained
regardless of what the "lengthy" task may be doing, at the time.

Or, I can break the lengthy task into smaller chunks that
are executed sequentially with "peeks" at the "light timer"
between each of those segments.

sequence1 := sequence2 := sequence3 := sequence4 := 0;
while (FOREVER) {
task1:
case sequence1++ {
0 => do_task1_step0;
1 => do_task1_step1;
2 => do_task1_step2;
...
}

do_light;

task2:
case sequence2++ {
0 => do_task2_step0;
1 => do_task2_step1;
2 => do_task2_step2;
...
}

do_light;

task3:
switch sequence3++ {
0 => do_task3_step0;
1 => do_task3_step1;
2 => do_task3_step2;
...
}

do_light;

...
}

When you need to do seven (or fifty) other "lengthy actions"
concurrently (each of which may introduce other "blinking
lights" or timeliness constraints), its easier (less brittle)
to put a structure in place that lets those competing actions
share the processor without requiring the developer to
micromanage at this level.

[50 tasks isn't an unusual load in a small system; video arcade
games from the early 80's -- 8 bit processors, kilobytes of
ROM+RAM -- would typically treat each object on the screen
(including bullets!) as a separate process]

The above example has low overhead for the apparent concurrency.
But, pushes all of the work onto the developer's lap. He has
to carefully size each "step" of each "task" to ensure the
overall system is responsive.

A nicer approach is to just let an MTOS handle the switching
between tasks. But, this comes at a cost of additional run-time
overhead (e.g., arbitrary context switches).

I tend to think rather in terms of collection
of coupled finite state machines (or if you prefer Petri net).
State machines transition in response to events and may generate
events. Each finite state machine could be a task. But it is
not clear if it should. Some transitions are simple and should
be fast and that I would do in interrupt handlers. Some
other are triggered in regular way from other machines and
are naturally handled by function calls. Some need queues.
The whole thing fits resonably well in "super loop" paradigm.

I use FSMs for UIs and message parsing. They let the structure
of the code "rise to the top" where it is more visible (to another
developer) instead of burying it in subroutines and function calls.

"Event sources" create events which are consumed by FSMs, as
needed. So, a "power monitor" could generate POWER_FAIL, LOW_BATTERY, POWER_RESTORED, etc. events while a "keypad decoder" could put out
ENTER, CLEAR, ALPHA_M, NUMERIC_5, etc. events.

Because there is nothing *special* about an "event", *ANY* piece of
code can generate them. Their significance assigns based on where
they are "placed" (in memory) and who/what can "see" them. So,
you can use an FSM to parse a message (using "received characters"
as an ordered stream of events) and "signal" MESSAGE_COMPLETE to
another FSM that is awaiting "messages" (along with a pointer to the
completed message)

From "KLEE: Unassisted and Automatic Generation of High-Coverage
Tests for Complex Systems Programs":

KLEE finds important errors in heavily-tested code. It
found ten fatal errors in COREUTILS (including three
that had escaped detection for 15 years), which account
for more crashing bugs than were reported in 2006, 2007
and 2008 combined. It further found 24 bugs in BUSYBOX, 21
bugs in MINIX, and a security vulnerability in HISTAR? a
total of 56 serious bugs.

Ooops! I wonder how many FOSS *eyes* missed those errors?

Open source folks tend to be more willing to talk about bugs.
And the above nicely shows that there is a lot of bugs, most
waiting to by discovered.

Part of the problem is ownership of the codebase. You are
more likely to know where your own bugs lie -- and, more
willing to fix them ("pride of ownership"). When a piece
of code is shared, over time, there seems to be less incentive
for folks to tackle big -- often dubious -- issues as the
"reward" is minimal (i.e., you may not own the code when the bug
eventually becomes a problem)

Ownership may cause problems: there is tendency to "solve"
problems locally, that is in code that given person "owns".
This is good if there is easy local solution. However, this
may also lead to ugly workarounds that really do not work
well, while problem is easily solvable in different part
("owned" by different programmer). I have seen such thing
several times, looking at whole codebase after some effort
it was possible to do simple fix, while there were workarounds
in different ("wrong") places. I had no contact with
original authors, but it seems that workarounds were due to
"ownership".

You are *always* at the mercy of the code's owner. Just as folks
are at YOUR mercy for the code that you (currently) exert ownership
over. The best compliments you'll receive are from folks who
inherit your codebase and can appreciate its structure and
consistency. Conversely, your worst nightmares will be inheriting
a codebase that was "hacked together", willy-nilly, by some number
of predecessors with no real concern over their "product" (code).

E.g., For FOSS projects, ownership isn't just a matter of who takes "responsibility" for coordinating/merging diffs into the
codebase but, also, who has a compatible "vision" for the
codebase, going forward. You'd not want a radically different
vision from one owner to the next as this leads to gyrations in
the codebase that will be seen as instability by its users
(i.e., other developers).

I use PostgreSQL in my current design. I have no desire to
*develop* the RDBMS software -- let folks who understand that
sort of thing work their own magic on the codebase. I can add
value *elsewhere* in my designs.

But, I eventually have to take ownership of *a* version of the
software as I can't expect the "real owners" to maintain some
version that *I* find useful, possibly years from now. Once
I assume ownership of that chosen release, it will be my
priorities and skillset that drive how it evolves. I can
choose to cherry pick "fixes" from the main branch and back-port
them into the version I've adopted. Or, decide to live with
some particular set of problems/bugs/shortcomings.

If I am prudent, I will attempt to adopt the "style" of the
original developers in fitting any changes that I make to
that codebase. I'd want my changes to "blend in" and seem
consistent with that which preceded them.

Folks following the main distribution would likely NOT be interested
in the changes that I choose to embrace as they'll likely have
different goals than I. But that doesn't say my ownership is
"toxic", just that it doesn't suit the needs of (most) others.

---

I've got to bow out of this conversation. I made a commitment to
release 6 designs to manufacturing before year end. As it stands,
now, it looks like I'll only have time enough for four of them as
I got "distracted", spending the past few weeks gallavanting (but
it was wicked fun!).

OTOH, It won't be fun starting the new year two weeks "behind"... :<

[Damn holidays eat into my work time. And, no excuse on my part;
it's not like I didn't KNOW they were coming!! :< ]

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	293
Nodes:	16 (2 / 14)
Uptime:	241:07:32
Calls:	6,624
Files:	12,173
Messages:	5,320,138

How to write a simple driver in bare metal systems: volatile, memory ba

Who's Online

System Info