Even I write software for embedded systems for more than 10 years,
there's an argument that from time to time let me think for hours and
leave me with many doubts.
Consider a simple embedded system based on a MCU (AVR8 or Cortex-Mx).
The software is bare metal, without any OS. The main pattern is the well known mainloop (background code) that is interrupted by ISR.
Interrupts are used mainly for timings and for low-level driver. For
example, the UART reception ISR move the last received char in a FIFO
buffer, while the mainloop code pops new data from the FIFO.
static struct {
unsigned char buf[RXBUF_SIZE];
uint8_t in;
uint8_t out;
} rxfifo;
/* ISR */
void uart_rx_isr(void) {
unsigned char c = UART->DATA;
rxfifo.buf[in % RXBUF_SIZE] = c;
rxfifo.in++;
// Reset interrupt flag
}
/* Called regularly from mainloop code */
int uart_task(void) {
int c = -1;
if (out != in) {
c = rxfifo.buf[out % RXBUF_SIZE];
out++;
}
return -1;
}
From a 20-years old article[1] by Nigle Jones, this seems a situation
where volatile must be used for rxfifo.in, because is modified by an ISR
and used in the mainloop code.
I don't think so, rxfifo.in is read from memory only one time in
uart_task(), so there isn't the risk that compiler can optimize badly.
Even if ISR is fired immediately after the if statement, this doesn't
bring to a dangerous state: the just received data will be processed at
the next call to uart_task().
So IMHO volatile isn't necessary here. And critical sections (i.e.
disabling interrupts) aren't useful too.
However I'm thinking about memory barrier. Suppose the compiler reorder
the instructions in uart_task() as follows:
c = rxfifo.buf[out % RXBUF_SIZE]
if (out != in) {
out++;
return c;
} else {
return -1;
}
Here there's a big problem, because compiler decided to firstly read rxfifo.buf[] and then test in and out equality. If the ISR is fired immediately after moving data to c (most probably an internal register),
the condition in the if statement will be true and the register value is returned. However the register value isn't correct.
I don't think any modern C compiler reorder uart_task() in this way, but
we can't be sure. The result shouldn't change for the compiler, so it
can do this kind of things.
How to fix this issue if I want to be extremely sure the compiler will
not reorder this way? Applying volatile to rxfifo.in shouldn't help for
this, because compiler is allowed to reorder access of non volatile
variables yet[2].
One solution is adding a memory barrier in this way:
int uart_task(void) {
int c = -1;
if (out != in) {
memory_barrier();
c = rxfifo.buf[out % RXBUF_SIZE];
out++;
}
return -1;
}
However this approach appears to me dangerous. You have to check and
double check if, when and where memory barriers are necessary and it's
simple to skip a barrier where it's nedded and add a barrier where it
isn't needed.
So I'm thinking that a sub-optimal (regarding efficiency) but reliable (regarding the risk to skip a barrier where it is needed) could be to
enter a critical section (disabling interrupts) anyway, if it isn't
strictly needed.
int uart_task(void) {
ENTER_CRITICAL_SECTION();
int c = -1;
if (out != in) {
c = rxfifo.buf[out % RXBUF_SIZE];
out++;
}
EXIT_CRITICAL_SECTION();
return -1;
}
Another solution could be to apply volatile keyword to rxfifo.in *AND* rxfifo.buf too, so compiler can't change the order of accesses them.
Do you have other suggestions?
[1] https://barrgroup.com/embedded-systems/how-to/c-volatile-keyword
[2] https://blog.regehr.org/archives/28
static struct {
unsigned char buf[RXBUF_SIZE];
uint8_t in;
uint8_t out;
} rxfifo;
/* ISR */
void uart_rx_isr(void) {
unsigned char c = UART->DATA;
rxfifo.buf[in % RXBUF_SIZE] = c;
rxfifo.in++;
// Reset interrupt flag
}
/* Called regularly from mainloop code */
int uart_task(void) {
int c = -1;
if (out != in) {
c = rxfifo.buf[out % RXBUF_SIZE];
out++;
}
return -1;
}
Even I write software for embedded systems for more than 10 years,
there's an argument that from time to time let me think for hours and
leave me with many doubts.
Consider a simple embedded system based on a MCU (AVR8 or Cortex-Mx).
The software is bare metal, without any OS. The main pattern is the well known mainloop (background code) that is interrupted by ISR.
Interrupts are used mainly for timings and for low-level driver. For
example, the UART reception ISR move the last received char in a FIFO
buffer, while the mainloop code pops new data from the FIFO.
static struct {
unsigned char buf[RXBUF_SIZE];
uint8_t in;
uint8_t out;
} rxfifo;
/* ISR */
void uart_rx_isr(void) {
unsigned char c = UART->DATA;
rxfifo.buf[in % RXBUF_SIZE] = c;
rxfifo.in++;
// Reset interrupt flag
}
/* Called regularly from mainloop code */
int uart_task(void) {
int c = -1;
if (out != in) {
c = rxfifo.buf[out % RXBUF_SIZE];
out++;
}
return -1;
}
From a 20-years old article[1] by Nigle Jones, this seems a situation
where volatile must be used for rxfifo.in, because is modified by an ISR
and used in the mainloop code.
I don't think so, rxfifo.in is read from memory only one time in
uart_task(), so there isn't the risk that compiler can optimize badly.
Even if ISR is fired immediately after the if statement, this doesn't
bring to a dangerous state: the just received data will be processed at
the next call to uart_task().
So IMHO volatile isn't necessary here. And critical sections (i.e.
disabling interrupts) aren't useful too.
However I'm thinking about memory barrier. Suppose the compiler reorder
the instructions in uart_task() as follows:
c = rxfifo.buf[out % RXBUF_SIZE]
if (out != in) {
out++;
return c;
} else {
return -1;
}
Here there's a big problem, because compiler decided to firstly read rxfifo.buf[] and then test in and out equality. If the ISR is fired immediately after moving data to c (most probably an internal register),
the condition in the if statement will be true and the register value is returned. However the register value isn't correct.
I don't think any modern C compiler reorder uart_task() in this way, but
we can't be sure. The result shouldn't change for the compiler, so it
can do this kind of things.
How to fix this issue if I want to be extremely sure the compiler will
not reorder this way? Applying volatile to rxfifo.in shouldn't help for
this, because compiler is allowed to reorder access of non volatile
variables yet[2].
One solution is adding a memory barrier in this way:
int uart_task(void) {
int c = -1;
if (out != in) {
memory_barrier();
c = rxfifo.buf[out % RXBUF_SIZE];
out++;
}
return -1;
}
However this approach appears to me dangerous. You have to check and
double check if, when and where memory barriers are necessary and it's
simple to skip a barrier where it's nedded and add a barrier where it
isn't needed.
So I'm thinking that a sub-optimal (regarding efficiency) but reliable (regarding the risk to skip a barrier where it is needed) could be to
enter a critical section (disabling interrupts) anyway, if it isn't
strictly needed.
int uart_task(void) {
ENTER_CRITICAL_SECTION();
int c = -1;
if (out != in) {
c = rxfifo.buf[out % RXBUF_SIZE];
out++;
}
EXIT_CRITICAL_SECTION();
return -1;
}
Another solution could be to apply volatile keyword to rxfifo.in *AND* rxfifo.buf too, so compiler can't change the order of accesses them.
Do you have other suggestions?
[1] https://barrgroup.com/embedded-systems/how-to/c-volatile-keyword
[2] https://blog.regehr.org/archives/28
On 10/22/2021 3:07 PM, pozz wrote:
static struct {
unsigned char buf[RXBUF_SIZE];
uint8_t in;
uint8_t out;
} rxfifo;
/* ISR */
void uart_rx_isr(void) {
unsigned char c = UART->DATA;
rxfifo.buf[in % RXBUF_SIZE] = c;
rxfifo.in++;
// Reset interrupt flag
}
/* Called regularly from mainloop code */
int uart_task(void) {
int c = -1;
Why? And why a retval from uart_task -- if it is always "-1"?
if (out != in) {
c = rxfifo.buf[out % RXBUF_SIZE];
out++;
}
return -1;
}
This is a bug(s) waiting to happen.
How is RXBUF_SIZE defined?
How does it reflect the data rate (and,
thus, interrupt rate) as well as the maximum latency between "main
loop" accesses?
I.e., what happens when the buffer is *full* -- and,
thus, appears EMPTY?
What stops the "in" member from growing to the
maximum size of a uint8 -- and then wrapping?
How do you convey this
to the upper level code ("Hey, we just lost a whole RXBUF_SIZE of
characters so if the character stream doesn't make sense, that might
be a cause...")?
What if RXBUF_SIZE is relatively prime wrt uint8max?
When writing UART handlers, I fetch the received datum along with
the uart's flags and stuff *both* of those things in the FIFO.
If the FIFO would be full, I, instead, modify the flags of the
preceeding datum to reflect this fact ("Some number of characters
have been lost AFTER this one...") and discard the current character.
I then signal an event and let a task waiting for that specific event
wake up and retrieve the contents of the FIFO (which may include more
than one character, at that time as characters can arrive after the
initial event has been signaled).
This lets me move the line discipline out of the ISR and still keep
the system "responsive".
Figure out everything that you need to do before you start sorting out
how the compiler can "shaft" you...
On 23/10/2021 00:07, pozz wrote:
Even I write software for embedded systems for more than 10 years,
there's an argument that from time to time let me think for hours and
leave me with many doubts.
It's nice to see a thread like this here - the group needs such discussions!
Consider a simple embedded system based on a MCU (AVR8 or Cortex-Mx).
The software is bare metal, without any OS. The main pattern is the well
known mainloop (background code) that is interrupted by ISR.
Interrupts are used mainly for timings and for low-level driver. For
example, the UART reception ISR move the last received char in a FIFO
buffer, while the mainloop code pops new data from the FIFO.
static struct {
unsigned char buf[RXBUF_SIZE];
uint8_t in;
uint8_t out;
} rxfifo;
/* ISR */
void uart_rx_isr(void) {
unsigned char c = UART->DATA;
rxfifo.buf[in % RXBUF_SIZE] = c;
rxfifo.in++;
Unless you are sure that RXBUF_SIZE is a power of two, this is going to
be quite slow on an AVR. Modulo means division, and while division by a constant is usually optimised to a multiplication by the compiler, you
still have a multiply, a shift, and some compensation for it all being
done as signed integer arithmetic.
It's also wrong, for non-power of two sizes, since the wrapping of your increment and your modulo RXBUF_SIZE get out of sync.
The usual choice is to track "head" and "tail", and use something like:
void uart_rx_isr(void) {
unsigned char c = UART->DATA;
// Reset interrupt flag
uint8_t next = rxfifo.tail;
rxfifo.buf[next] = c;
next++;
if (next >= RXBUF_SIZE) next -= RXBUF_SIZE;
rxfifo.tail = next;
}
// Reset interrupt flag
}
/* Called regularly from mainloop code */
int uart_task(void) {
int c = -1;
if (out != in) {
c = rxfifo.buf[out % RXBUF_SIZE];
out++;
}
return -1;
}
int uart_task(void) {
int c = -1;
uint8_t next = rxfifo.head;
if (next != rxfifo.tail) {
c = rxfifo.buf[next];
next++;
if (next >= RXBUF_SIZE) next -= RXBUF_SIZE;
rxfifo.head = next;
}
return c;
}
These don't track buffer overflow at all - you need to call uart_task()
often enough to avoid that.
(I'm skipping volatiles so we don't get ahead of your point.)
From a 20-years old article[1] by Nigle Jones, this seems a situation
where volatile must be used for rxfifo.in, because is modified by an ISR
and used in the mainloop code.
Certainly whenever data is shared between ISR's and mainloop code, or different threads, then you need to think about how to make sure data is synchronised and exchanged. "volatile" is one method, atomics are
another, and memory barriers can be used.
I don't think so, rxfifo.in is read from memory only one time in
uart_task(), so there isn't the risk that compiler can optimize badly.
That is incorrect in two ways. One - baring compiler bugs (which do
occur, but they are very rare compared to user bugs), there is no such
thing as "optimising badly". If optimising changes the behaviour of the code, other than its size and speed, the code is wrong.
Two - it is a
very bad idea to imagine that having code inside a function somehow "protects" it from re-ordering or other optimisation.
Functions can be inlined, outlined, cloned, and shuffled about.
Link-time optimisation, code in headers, C++ modules, and other
cross-unit optimisations are becoming more and more common. So while it might be true /today/ that the compiler has no alternative but to read rxfifo.in once per call to uart_task(), you cannot assume that will be
the case with later compilers or with more advanced optimisation
techniques enabled. It is safer, more portable, and more future-proof
to avoid such assumptions.
Even if ISR is fired immediately after the if statement, this doesn't
bring to a dangerous state: the just received data will be processed at
the next call to uart_task().
So IMHO volatile isn't necessary here. And critical sections (i.e.
disabling interrupts) aren't useful too.
However I'm thinking about memory barrier. Suppose the compiler reorder
the instructions in uart_task() as follows:
c = rxfifo.buf[out % RXBUF_SIZE]
if (out != in) {
out++;
return c;
} else {
return -1;
}
Here there's a big problem, because compiler decided to firstly read
rxfifo.buf[] and then test in and out equality. If the ISR is fired
immediately after moving data to c (most probably an internal register),
the condition in the if statement will be true and the register value is
returned. However the register value isn't correct.
You are absolutely correct.
I don't think any modern C compiler reorder uart_task() in this way, but
we can't be sure. The result shouldn't change for the compiler, so it
can do this kind of things.
It is not an unreasonable re-arrangement. On processors with
out-of-order execution (which does not apply to the AVR or Cortex-M), compilers will often push loads as early as they can in the instruction stream so that they start the cache loading process as quickly as
possible. (But note that on such "big" processors, much of this
discussion on volatile and memory barriers is not sufficient, especially
if there is more than one core. You need atomics and fences, but that's
a story for another day.)
How to fix this issue if I want to be extremely sure the compiler will
not reorder this way? Applying volatile to rxfifo.in shouldn't help for
this, because compiler is allowed to reorder access of non volatile
variables yet[2].
The important thing about "volatile" is that it is /accesses/ that are volatile, not objects. A volatile object is nothing more than an object
for which all accesses are volatile by default. But you can use
volatile accesses on non-volatile objects. This macro is your friend:
#define volatileAccess(v) *((volatile typeof((v)) *) &(v))
(Linux has the same macro, called ACCESS_ONCE. It uses a gcc extension
- if you are using other compilers then you can make an uglier
equivalent using _Generic. However, if you are using a C compiler that supports C11, it is probably gcc or clang, and you can use the "typeof" extension.)
That macro will let you make a volatile read or write to an object
without requiring that /all/ accesses to it are volatile.
One solution is adding a memory barrier in this way:
int uart_task(void) {
int c = -1;
if (out != in) {
memory_barrier();
c = rxfifo.buf[out % RXBUF_SIZE];
out++;
}
return -1;
}
Note that you are forcing the compiler to read "out" twice here, as it
can't keep the value of "out" in a register across the memory barrier.
(And as I mentioned before, the compiler might be able to do larger
scale optimisation across compilation units or functions, and in that
way keep values across multiple calls to uart_task.)
However this approach appears to me dangerous. You have to check and
double check if, when and where memory barriers are necessary and it's
simple to skip a barrier where it's nedded and add a barrier where it
isn't needed.
Memory barriers are certainly useful, but they are a shotgun approach -
they affect /everything/ involving reads and writes to memory. (But
remember they don't affect ordering of calculations.)
So I'm thinking that a sub-optimal (regarding efficiency) but reliable
(regarding the risk to skip a barrier where it is needed) could be to
enter a critical section (disabling interrupts) anyway, if it isn't
strictly needed.
int uart_task(void) {
ENTER_CRITICAL_SECTION();
int c = -1;
if (out != in) {
c = rxfifo.buf[out % RXBUF_SIZE];
out++;
}
EXIT_CRITICAL_SECTION();
return -1;
}
Critical sections for something like this are /way/ overkill. And a
critical section with a division in the middle? Not a good idea.
Another solution could be to apply volatile keyword to rxfifo.in *AND*
rxfifo.buf too, so compiler can't change the order of accesses them.
Do you have other suggestions?
Marking "in" and "buf" as volatile is /far/ better than using a critical section, and likely to be more efficient than a memory barrier. You can
also use volatileAccess rather than making buf volatile, and it is often slightly more efficient to cache volatile variables in a local variable
while working with them.
[1] https://barrgroup.com/embedded-systems/how-to/c-volatile-keyword
[2] https://blog.regehr.org/archives/28
This is a bug(s) waiting to happen.
How is RXBUF_SIZE defined?
Power of two.
How does it reflect the data rate (and,
thus, interrupt rate) as well as the maximum latency between "main
loop" accesses?
Rx FIFO filled by interrupt is needed to face a burst (a packet?) of incoming characters.
If the baudrate is 9600bps 8n1, interrupt would be fired every 10/9600=1ms. If
maximum interval between two successive uart_task() calls is 10ms, it is sufficient a buffer of 10 bytes, so RXBUF_SIZE could be 16 or 32.
I.e., what happens when the buffer is *full* -- and,
thus, appears EMPTY?
These are good questions, but I didn't want to discuss about them. Of course ISR is not complete, because before pushing a new byte, we must check if FIFO is full. For example:
How do you convey this
to the upper level code ("Hey, we just lost a whole RXBUF_SIZE of
characters so if the character stream doesn't make sense, that might
be a cause...")?
FIFO full is event is extremely rare if I'm able to size rx FIFO correctly, i.e. on the worst case.
Anyway I usually ignore incoming chars when the FIFO is full. The high level protocols are usually defined in such a way the absence of chars are detected,
mostly thanks to CRC.
What if RXBUF_SIZE is relatively prime wrt uint8max?
When writing UART handlers, I fetch the received datum along with
the uart's flags and stuff *both* of those things in the FIFO.
If the FIFO would be full, I, instead, modify the flags of the
preceeding datum to reflect this fact ("Some number of characters
have been lost AFTER this one...") and discard the current character.
I then signal an event and let a task waiting for that specific event
wake up and retrieve the contents of the FIFO (which may include more
than one character, at that time as characters can arrive after the
initial event has been signaled).
Signal an event? Task waiting for a specific event? Maybe you are thinking of a
full RTOS. I was thinking of bare metal systems.
Even I write software for embedded systems for more than 10 years, there's an argument that from time to time let me think for hours and leave me with many doubts.
Consider a simple embedded system based on a MCU (AVR8 or Cortex-Mx). The software is bare metal, without any OS. The main pattern is the well known mainloop (background code) that is interrupted by ISR.
Interrupts are used mainly for timings and for low-level driver. For example, the UART reception ISR move the last received char in a FIFO buffer, while the mainloop code pops new data from the FIFO.
static struct {
unsigned char buf[RXBUF_SIZE];
uint8_t in;
uint8_t out;
} rxfifo;
/* ISR */
void uart_rx_isr(void) {
unsigned char c = UART->DATA;
rxfifo.buf[in % RXBUF_SIZE] = c;
rxfifo.in++;
// Reset interrupt flag
}
/* Called regularly from mainloop code */
int uart_task(void) {
int c = -1;
if (out != in) {
c = rxfifo.buf[out % RXBUF_SIZE];
out++;
}
return -1;
}
From a 20-years old article[1] by Nigle Jones, this seems a situation where volatile must be used for rxfifo.in, because is modified by an ISR and used in the mainloop code.data will be processed at the next call to uart_task().
I don't think so, rxfifo.in is read from memory only one time in uart_task(), so there isn't the risk that compiler can optimize badly. Even if ISR is fired immediately after the if statement, this doesn't bring to a dangerous state: the just received
So IMHO volatile isn't necessary here. And critical sections (i.e. disabling interrupts) aren't useful too.
However I'm thinking about memory barrier. Suppose the compiler reorder the instructions in uart_task() as follows:
c = rxfifo.buf[out % RXBUF_SIZE]
if (out != in) {
out++;
return c;
} else {
return -1;
}
Here there's a big problem, because compiler decided to firstly read rxfifo.buf[] and then test in and out equality. If the ISR is fired immediately after moving data to c (most probably an internal register), the condition in the if statement will betrue and the register value is returned. However the register value isn't correct.
I don't think any modern C compiler reorder uart_task() in this way, but we can't be sure. The result shouldn't change for the compiler, so it can do this kind of things.
How to fix this issue if I want to be extremely sure the compiler will not reorder this way? Applying volatile to rxfifo.in shouldn't help for this, because compiler is allowed to reorder access of non volatile variables yet[2].
One solution is adding a memory barrier in this way:
int uart_task(void) {
int c = -1;
if (out != in) {
memory_barrier();
c = rxfifo.buf[out % RXBUF_SIZE];
out++;
}
return -1;
}
However this approach appears to me dangerous. You have to check and double check if, when and where memory barriers are necessary and it's simple to skip a barrier where it's nedded and add a barrier where it isn't needed.
So I'm thinking that a sub-optimal (regarding efficiency) but reliable (regarding the risk to skip a barrier where it is needed) could be to enter a critical section (disabling interrupts) anyway, if it isn't strictly needed.
int uart_task(void) {
ENTER_CRITICAL_SECTION();
int c = -1;
if (out != in) {
c = rxfifo.buf[out % RXBUF_SIZE];
out++;
}
EXIT_CRITICAL_SECTION();
return -1;
}
Another solution could be to apply volatile keyword to rxfifo.in *AND* rxfifo.buf too, so compiler can't change the order of accesses them.
Do you have other suggestions?
[1] https://barrgroup.com/embedded-systems/how-to/c-volatile-keywordDisable interrupts while accessing the fifo. you really have to.
[2] https://blog.regehr.org/archives/28
On 10/23/2021 12:07 AM, pozz wrote:data will be processed at the next call to uart_task().
Even I write software for embedded systems for more than 10 years, there's an argument that from time to time let me think for hours and leave me with many doubts.
Consider a simple embedded system based on a MCU (AVR8 or Cortex-Mx). The software is bare metal, without any OS. The main pattern is the well known mainloop (background code) that is interrupted by ISR.
Interrupts are used mainly for timings and for low-level driver. For example, the UART reception ISR move the last received char in a FIFO buffer, while the mainloop code pops new data from the FIFO.
static struct {
unsigned char buf[RXBUF_SIZE];
uint8_t in;
uint8_t out;
} rxfifo;
/* ISR */
void uart_rx_isr(void) {
unsigned char c = UART->DATA;
rxfifo.buf[in % RXBUF_SIZE] = c;
rxfifo.in++;
// Reset interrupt flag
}
/* Called regularly from mainloop code */
int uart_task(void) {
int c = -1;
if (out != in) {
c = rxfifo.buf[out % RXBUF_SIZE];
out++;
}
return -1;
}
From a 20-years old article[1] by Nigle Jones, this seems a situation where volatile must be used for rxfifo.in, because is modified by an ISR and used in the mainloop code.
I don't think so, rxfifo.in is read from memory only one time in uart_task(), so there isn't the risk that compiler can optimize badly. Even if ISR is fired immediately after the if statement, this doesn't bring to a dangerous state: the just received
true and the register value is returned. However the register value isn't correct.
So IMHO volatile isn't necessary here. And critical sections (i.e. disabling interrupts) aren't useful too.
However I'm thinking about memory barrier. Suppose the compiler reorder the instructions in uart_task() as follows:
c = rxfifo.buf[out % RXBUF_SIZE]
if (out != in) {
out++;
return c;
} else {
return -1;
}
Here there's a big problem, because compiler decided to firstly read rxfifo.buf[] and then test in and out equality. If the ISR is fired immediately after moving data to c (most probably an internal register), the condition in the if statement will be
Disable interrupts while accessing the fifo. you really have to. alternatively you'll often get away not using a fifo at all,
I don't think any modern C compiler reorder uart_task() in this way, but we can't be sure. The result shouldn't change for the compiler, so it can do this kind of things.
How to fix this issue if I want to be extremely sure the compiler will not reorder this way? Applying volatile to rxfifo.in shouldn't help for this, because compiler is allowed to reorder access of non volatile variables yet[2].
One solution is adding a memory barrier in this way:
int uart_task(void) {
int c = -1;
if (out != in) {
memory_barrier();
c = rxfifo.buf[out % RXBUF_SIZE];
out++;
}
return -1;
}
However this approach appears to me dangerous. You have to check and double check if, when and where memory barriers are necessary and it's simple to skip a barrier where it's nedded and add a barrier where it isn't needed.
So I'm thinking that a sub-optimal (regarding efficiency) but reliable (regarding the risk to skip a barrier where it is needed) could be to enter a critical section (disabling interrupts) anyway, if it isn't strictly needed.
int uart_task(void) {
ENTER_CRITICAL_SECTION();
int c = -1;
if (out != in) {
c = rxfifo.buf[out % RXBUF_SIZE];
out++;
}
EXIT_CRITICAL_SECTION();
return -1;
}
Another solution could be to apply volatile keyword to rxfifo.in *AND* rxfifo.buf too, so compiler can't change the order of accesses them.
Do you have other suggestions?
[1] https://barrgroup.com/embedded-systems/how-to/c-volatile-keyword
[2] https://blog.regehr.org/archives/28
unless you're blocking for a long while in some part of the code.
Il 23/10/2021 18:09, David Brown ha scritto:
On 23/10/2021 00:07, pozz wrote:
Even I write software for embedded systems for more than 10 years,
there's an argument that from time to time let me think for hours and
leave me with many doubts.
It's nice to see a thread like this here - the group needs such
discussions!
Consider a simple embedded system based on a MCU (AVR8 or Cortex-Mx).
The software is bare metal, without any OS. The main pattern is the well >>> known mainloop (background code) that is interrupted by ISR.
Interrupts are used mainly for timings and for low-level driver. For
example, the UART reception ISR move the last received char in a FIFO
buffer, while the mainloop code pops new data from the FIFO.
static struct {
unsigned char buf[RXBUF_SIZE];
uint8_t in;
uint8_t out;
} rxfifo;
/* ISR */
void uart_rx_isr(void) {
unsigned char c = UART->DATA;
rxfifo.buf[in % RXBUF_SIZE] = c;
rxfifo.in++;
Unless you are sure that RXBUF_SIZE is a power of two, this is going to
be quite slow on an AVR. Modulo means division, and while division by a
constant is usually optimised to a multiplication by the compiler, you
still have a multiply, a shift, and some compensation for it all being
done as signed integer arithmetic.
It's also wrong, for non-power of two sizes, since the wrapping of your
increment and your modulo RXBUF_SIZE get out of sync.
Yes, RXBUF_SIZE is a power of two.
The usual choice is to track "head" and "tail", and use something like:
void uart_rx_isr(void) {
unsigned char c = UART->DATA;
// Reset interrupt flag
uint8_t next = rxfifo.tail;
rxfifo.buf[next] = c;
next++;
if (next >= RXBUF_SIZE) next -= RXBUF_SIZE;
rxfifo.tail = next;
}
This isn't the point of this thread, anyway...
You insist that tail is always in the range [0...RXBUF_SIZE - 1]. My
approach is different.
RXBUF_SIZE is a power of two, usualy <=256. head and tail are uint8_t
and *can* reach the maximum value of 255, even RXBUF_SIZE is 128. All
works well.
Suppose rxfifo.in=rxfifo.out=127, FIFO is empty. When a new char is
received, it is saved into rxfifo.buf[127 % 128=127] and rxfifo.in will
be increased to 128.
Now mainloop detect the new char (in != out), reads the new char at rxfifo.buf[127 % 128=127] and increase out that will be 128.
The next byte will be saved into rxfifo.rxbuf[rxfifo.in % 128=128 % 128
= 0] and rxfifo.in will be 129. Again, the next byte will be saved to rxbuf[rxfifo.in % 128=129 % 128=1] and rxfifo.in will be 130.
When the mainloop tries to pop data from fifo, it tests
rxfifo.in(130) !=rxfifo.out(128)
The test is true, so the code extracts chars from rxbuf[out % 128] that
is rxbuf[0]... and so on.
I hope that explanation is good.
// Reset interrupt flag
}
/* Called regularly from mainloop code */
int uart_task(void) {
int c = -1;
if (out != in) {
c = rxfifo.buf[out % RXBUF_SIZE];
out++;
}
return -1;
}
int uart_task(void) {
int c = -1;
uint8_t next = rxfifo.head;
if (next != rxfifo.tail) {
c = rxfifo.buf[next];
next++;
if (next >= RXBUF_SIZE) next -= RXBUF_SIZE;
rxfifo.head = next;
}
return c;
}
These don't track buffer overflow at all - you need to call uart_task()
often enough to avoid that.
Sure, with a good number for RXBUF_SIZE, buffer overflow shouldn't
happen ever. Anyway, if it happens, the higher level layers (protocol)
should detect a corrupted packet.
(I'm skipping volatiles so we don't get ahead of your point.)
From a 20-years old article[1] by Nigle Jones, this seems a situation
where volatile must be used for rxfifo.in, because is modified by an ISR >>> and used in the mainloop code.
Certainly whenever data is shared between ISR's and mainloop code, or
different threads, then you need to think about how to make sure data is
synchronised and exchanged. "volatile" is one method, atomics are
another, and memory barriers can be used.
I don't think so, rxfifo.in is read from memory only one time in
uart_task(), so there isn't the risk that compiler can optimize badly.
That is incorrect in two ways. One - baring compiler bugs (which do
occur, but they are very rare compared to user bugs), there is no such
thing as "optimising badly". If optimising changes the behaviour of the
code, other than its size and speed, the code is wrong.
Yes of course, but I don't think the absence of volatile for rxfifo.in,
even if it can change in ISR, could be a *real* problem with *modern and current* compilers.
voltile attribute needs to avoid compiler optimization (that would be a
bad thing, because of volatile nature of the variabile), but on that
code it's difficult to think of an optimization, caused by the absence
of volatile, that changes the behaviour erroneously... except reorering.
Two - it is a
very bad idea to imagine that having code inside a function somehow
"protects" it from re-ordering or other optimisation.
I didn't say this, at the contrary I was thinking exactly to reordering issues.
Functions can be inlined, outlined, cloned, and shuffled about.
Link-time optimisation, code in headers, C++ modules, and other
cross-unit optimisations are becoming more and more common. So while it
might be true /today/ that the compiler has no alternative but to read
rxfifo.in once per call to uart_task(), you cannot assume that will be
the case with later compilers or with more advanced optimisation
techniques enabled. It is safer, more portable, and more future-proof
to avoid such assumptions.
Ok, you are talking of future scenarios. I don't think actually this
could be a real problem. Anyway your observation makes sense.
Even if ISR is fired immediately after the if statement, this doesn't
bring to a dangerous state: the just received data will be processed at
the next call to uart_task().
So IMHO volatile isn't necessary here. And critical sections (i.e.
disabling interrupts) aren't useful too.
However I'm thinking about memory barrier. Suppose the compiler reorder
the instructions in uart_task() as follows:
c = rxfifo.buf[out % RXBUF_SIZE]
if (out != in) {
out++;
return c;
} else {
return -1;
}
Here there's a big problem, because compiler decided to firstly read
rxfifo.buf[] and then test in and out equality. If the ISR is fired
immediately after moving data to c (most probably an internal register), >>> the condition in the if statement will be true and the register value is >>> returned. However the register value isn't correct.
You are absolutely correct.
I don't think any modern C compiler reorder uart_task() in this way, but >>> we can't be sure. The result shouldn't change for the compiler, so it
can do this kind of things.
It is not an unreasonable re-arrangement. On processors with
out-of-order execution (which does not apply to the AVR or Cortex-M),
compilers will often push loads as early as they can in the instruction
stream so that they start the cache loading process as quickly as
possible. (But note that on such "big" processors, much of this
discussion on volatile and memory barriers is not sufficient, especially
if there is more than one core. You need atomics and fences, but that's
a story for another day.)
How to fix this issue if I want to be extremely sure the compiler will
not reorder this way? Applying volatile to rxfifo.in shouldn't help for
this, because compiler is allowed to reorder access of non volatile
variables yet[2].
The important thing about "volatile" is that it is /accesses/ that are
volatile, not objects. A volatile object is nothing more than an object
for which all accesses are volatile by default. But you can use
volatile accesses on non-volatile objects. This macro is your friend:
#define volatileAccess(v) *((volatile typeof((v)) *) &(v))
(Linux has the same macro, called ACCESS_ONCE. It uses a gcc extension
- if you are using other compilers then you can make an uglier
equivalent using _Generic. However, if you are using a C compiler that
supports C11, it is probably gcc or clang, and you can use the "typeof"
extension.)
That macro will let you make a volatile read or write to an object
without requiring that /all/ accesses to it are volatile.
This is a good point. The code in ISR can't be interrupted, so there's
no need to have volatile access in ISR.
One solution is adding a memory barrier in this way:
int uart_task(void) {
int c = -1;
if (out != in) {
memory_barrier();
c = rxfifo.buf[out % RXBUF_SIZE];
out++;
}
return -1;
}
Note that you are forcing the compiler to read "out" twice here, as it
can't keep the value of "out" in a register across the memory barrier.
Yes, you're right. A small penalty to avoid the problem of reordering.
(And as I mentioned before, the compiler might be able to do larger
scale optimisation across compilation units or functions, and in that
way keep values across multiple calls to uart_task.)
However this approach appears to me dangerous. You have to check and
double check if, when and where memory barriers are necessary and it's
simple to skip a barrier where it's nedded and add a barrier where it
isn't needed.
Memory barriers are certainly useful, but they are a shotgun approach -
they affect /everything/ involving reads and writes to memory. (But
remember they don't affect ordering of calculations.)
So I'm thinking that a sub-optimal (regarding efficiency) but reliable
(regarding the risk to skip a barrier where it is needed) could be to
enter a critical section (disabling interrupts) anyway, if it isn't
strictly needed.
int uart_task(void) {
ENTER_CRITICAL_SECTION();
int c = -1;
if (out != in) {
c = rxfifo.buf[out % RXBUF_SIZE];
out++;
}
EXIT_CRITICAL_SECTION();
return -1;
}
Critical sections for something like this are /way/ overkill. And a
critical section with a division in the middle? Not a good idea.
Another solution could be to apply volatile keyword to rxfifo.in *AND*
rxfifo.buf too, so compiler can't change the order of accesses them.
Do you have other suggestions?
Marking "in" and "buf" as volatile is /far/ better than using a critical
section, and likely to be more efficient than a memory barrier. You can
also use volatileAccess rather than making buf volatile, and it is often
slightly more efficient to cache volatile variables in a local variable
while working with them.
Yes, I think so too. Lastly I read many experts say volatile is often a
bad thing, so I'm re-thinking about its use compared with other approaches.
[1] https://barrgroup.com/embedded-systems/how-to/c-volatile-keyword
[2] https://blog.regehr.org/archives/28
On 23/10/2021 22:49, pozz wrote:
Il 23/10/2021 18:09, David Brown ha scritto:
On 23/10/2021 00:07, pozz wrote:
Even I write software for embedded systems for more than 10 years,
there's an argument that from time to time let me think for hours and
leave me with many doubts.
It's nice to see a thread like this here - the group needs such
discussions!
Consider a simple embedded system based on a MCU (AVR8 or Cortex-Mx).
The software is bare metal, without any OS. The main pattern is the well >>>> known mainloop (background code) that is interrupted by ISR.
Interrupts are used mainly for timings and for low-level driver. For
example, the UART reception ISR move the last received char in a FIFO
buffer, while the mainloop code pops new data from the FIFO.
static struct {
unsigned char buf[RXBUF_SIZE];
uint8_t in;
uint8_t out;
} rxfifo;
/* ISR */
void uart_rx_isr(void) {
unsigned char c = UART->DATA;
rxfifo.buf[in % RXBUF_SIZE] = c;
rxfifo.in++;
Unless you are sure that RXBUF_SIZE is a power of two, this is going to
be quite slow on an AVR. Modulo means division, and while division by a >>> constant is usually optimised to a multiplication by the compiler, you
still have a multiply, a shift, and some compensation for it all being
done as signed integer arithmetic.
It's also wrong, for non-power of two sizes, since the wrapping of your
increment and your modulo RXBUF_SIZE get out of sync.
Yes, RXBUF_SIZE is a power of two.
If your code relies on that, make sure the code will fail to compile if
it is not the case. Documentation is good, compile-time check is better:
static_assert((RXBUF_SIZE & (RXBUF_SIZE - 1)) == 0, "Needs power of 2");
The usual choice is to track "head" and "tail", and use something like:
void uart_rx_isr(void) {
unsigned char c = UART->DATA;
// Reset interrupt flag
uint8_t next = rxfifo.tail;
rxfifo.buf[next] = c;
next++;
if (next >= RXBUF_SIZE) next -= RXBUF_SIZE;
rxfifo.tail = next;
}
This isn't the point of this thread, anyway...
You insist that tail is always in the range [0...RXBUF_SIZE - 1]. My
approach is different.
RXBUF_SIZE is a power of two, usualy <=256. head and tail are uint8_t
and *can* reach the maximum value of 255, even RXBUF_SIZE is 128. All
works well.
Yes, your approach will work - /if/ you have a power-of-two buffer size.
It has no noticeable efficiency advantages, merely an extra
inconvenient restriction and the possible confusion caused by doing
things in a different way from common idioms.
However, this is not the point of the thread - so I am happy to leave
that for now.
Suppose rxfifo.in=rxfifo.out=127, FIFO is empty. When a new char is
received, it is saved into rxfifo.buf[127 % 128=127] and rxfifo.in will
be increased to 128.
Now mainloop detect the new char (in != out), reads the new char at
rxfifo.buf[127 % 128=127] and increase out that will be 128.
The next byte will be saved into rxfifo.rxbuf[rxfifo.in % 128=128 % 128
= 0] and rxfifo.in will be 129. Again, the next byte will be saved to
rxbuf[rxfifo.in % 128=129 % 128=1] and rxfifo.in will be 130.
When the mainloop tries to pop data from fifo, it tests
rxfifo.in(130) !=rxfifo.out(128)
The test is true, so the code extracts chars from rxbuf[out % 128] that
is rxbuf[0]... and so on.
I hope that explanation is good.
// Reset interrupt flag
}
/* Called regularly from mainloop code */
int uart_task(void) {
int c = -1;
if (out != in) {
c = rxfifo.buf[out % RXBUF_SIZE];
out++;
}
return -1;
}
int uart_task(void) {
int c = -1;
uint8_t next = rxfifo.head;
if (next != rxfifo.tail) {
c = rxfifo.buf[next];
next++;
if (next >= RXBUF_SIZE) next -= RXBUF_SIZE;
rxfifo.head = next;
}
return c;
}
These don't track buffer overflow at all - you need to call uart_task()
often enough to avoid that.
Sure, with a good number for RXBUF_SIZE, buffer overflow shouldn't
happen ever. Anyway, if it happens, the higher level layers (protocol)
should detect a corrupted packet.
You risk getting seriously out of sync if there is an overflow.
Normally, on an overflow there will be a dropped character or two (which
as you say, must be caught at a higher level). Here you could end up
going round your buffer an extra time and /gaining/ RXBUF_SIZE extra characters.
Still, if you are sure that your functions are called fast enough so
that overflow is not a concern, then that's fine. Extra code to check
for a situation that can't occur is not helpful.
(I'm skipping volatiles so we don't get ahead of your point.)
From a 20-years old article[1] by Nigle Jones, this seems a situation >>>> where volatile must be used for rxfifo.in, because is modified by an ISR >>>> and used in the mainloop code.
Certainly whenever data is shared between ISR's and mainloop code, or
different threads, then you need to think about how to make sure data is >>> synchronised and exchanged. "volatile" is one method, atomics are
another, and memory barriers can be used.
I don't think so, rxfifo.in is read from memory only one time in
uart_task(), so there isn't the risk that compiler can optimize badly.
That is incorrect in two ways. One - baring compiler bugs (which do
occur, but they are very rare compared to user bugs), there is no such
thing as "optimising badly". If optimising changes the behaviour of the >>> code, other than its size and speed, the code is wrong.
Yes of course, but I don't think the absence of volatile for rxfifo.in,
even if it can change in ISR, could be a *real* problem with *modern and
current* compilers.
Personally, I am not satisfied with "it's unlikely to be a problem in practice" - I prefer "The language guarantees it is not a problem".
Remember, when you know the data needs to be read at this point, then
using a volatile read is free. Volatile does not make code less
efficient unless you use it incorrectly and force more accesses than are necessary. So using volatile accesses for "rxfifo.in" here turns
"probably safe" into "certainly safe" without cost. What's not to like?
voltile attribute needs to avoid compiler optimization (that would be a
bad thing, because of volatile nature of the variabile), but on that
code it's difficult to think of an optimization, caused by the absence
of volatile, that changes the behaviour erroneously... except reorering.
Two - it is a
very bad idea to imagine that having code inside a function somehow
"protects" it from re-ordering or other optimisation.
I didn't say this, at the contrary I was thinking exactly to reordering
issues.
Functions can be inlined, outlined, cloned, and shuffled about.
Link-time optimisation, code in headers, C++ modules, and other
cross-unit optimisations are becoming more and more common. So while it >>> might be true /today/ that the compiler has no alternative but to read
rxfifo.in once per call to uart_task(), you cannot assume that will be
the case with later compilers or with more advanced optimisation
techniques enabled. It is safer, more portable, and more future-proof
to avoid such assumptions.
Ok, you are talking of future scenarios. I don't think actually this
could be a real problem. Anyway your observation makes sense.
Even if ISR is fired immediately after the if statement, this doesn't
bring to a dangerous state: the just received data will be processed at >>>> the next call to uart_task().
So IMHO volatile isn't necessary here. And critical sections (i.e.
disabling interrupts) aren't useful too.
However I'm thinking about memory barrier. Suppose the compiler reorder >>>> the instructions in uart_task() as follows:
c = rxfifo.buf[out % RXBUF_SIZE]
if (out != in) {
out++;
return c;
} else {
return -1;
}
Here there's a big problem, because compiler decided to firstly read
rxfifo.buf[] and then test in and out equality. If the ISR is fired
immediately after moving data to c (most probably an internal register), >>>> the condition in the if statement will be true and the register value is >>>> returned. However the register value isn't correct.
You are absolutely correct.
I don't think any modern C compiler reorder uart_task() in this way, but >>>> we can't be sure. The result shouldn't change for the compiler, so it
can do this kind of things.
It is not an unreasonable re-arrangement. On processors with
out-of-order execution (which does not apply to the AVR or Cortex-M),
compilers will often push loads as early as they can in the instruction
stream so that they start the cache loading process as quickly as
possible. (But note that on such "big" processors, much of this
discussion on volatile and memory barriers is not sufficient, especially >>> if there is more than one core. You need atomics and fences, but that's >>> a story for another day.)
How to fix this issue if I want to be extremely sure the compiler will >>>> not reorder this way? Applying volatile to rxfifo.in shouldn't help for >>>> this, because compiler is allowed to reorder access of non volatile
variables yet[2].
The important thing about "volatile" is that it is /accesses/ that are
volatile, not objects. A volatile object is nothing more than an object >>> for which all accesses are volatile by default. But you can use
volatile accesses on non-volatile objects. This macro is your friend:
#define volatileAccess(v) *((volatile typeof((v)) *) &(v))
(Linux has the same macro, called ACCESS_ONCE. It uses a gcc extension
- if you are using other compilers then you can make an uglier
equivalent using _Generic. However, if you are using a C compiler that
supports C11, it is probably gcc or clang, and you can use the "typeof"
extension.)
That macro will let you make a volatile read or write to an object
without requiring that /all/ accesses to it are volatile.
This is a good point. The code in ISR can't be interrupted, so there's
no need to have volatile access in ISR.
Correct. (Well, /almost/ correct - bigger microcontrollers have
multiple interrupt priorities. But it should be correct in this case,
as no other interrupt would be messing with the same variables anyway.)
One solution is adding a memory barrier in this way:
int uart_task(void) {
int c = -1;
if (out != in) {
memory_barrier();
c = rxfifo.buf[out % RXBUF_SIZE];
out++;
}
return -1;
}
Note that you are forcing the compiler to read "out" twice here, as it
can't keep the value of "out" in a register across the memory barrier.
Yes, you're right. A small penalty to avoid the problem of reordering.
But an unnecessary penalty.
(And as I mentioned before, the compiler might be able to do larger
scale optimisation across compilation units or functions, and in that
way keep values across multiple calls to uart_task.)
However this approach appears to me dangerous. You have to check and
double check if, when and where memory barriers are necessary and it's >>>> simple to skip a barrier where it's nedded and add a barrier where it
isn't needed.
Memory barriers are certainly useful, but they are a shotgun approach -
they affect /everything/ involving reads and writes to memory. (But
remember they don't affect ordering of calculations.)
So I'm thinking that a sub-optimal (regarding efficiency) but reliable >>>> (regarding the risk to skip a barrier where it is needed) could be to
enter a critical section (disabling interrupts) anyway, if it isn't
strictly needed.
int uart_task(void) {
ENTER_CRITICAL_SECTION();
int c = -1;
if (out != in) {
c = rxfifo.buf[out % RXBUF_SIZE];
out++;
}
EXIT_CRITICAL_SECTION();
return -1;
}
Critical sections for something like this are /way/ overkill. And a
critical section with a division in the middle? Not a good idea.
Another solution could be to apply volatile keyword to rxfifo.in *AND* >>>> rxfifo.buf too, so compiler can't change the order of accesses them.
Do you have other suggestions?
Marking "in" and "buf" as volatile is /far/ better than using a critical >>> section, and likely to be more efficient than a memory barrier. You can >>> also use volatileAccess rather than making buf volatile, and it is often >>> slightly more efficient to cache volatile variables in a local variable
while working with them.
Yes, I think so too. Lastly I read many experts say volatile is often a
bad thing, so I'm re-thinking about its use compared with other approaches. >>
People who say "volatile is a bad thing" are often wrong. Remember, all generalisations are false :-)
"volatile" is a tool. It doesn't do everything that some people think
it does, but it is a very useful tool nonetheless. It has little place
in big systems - Linus Torvalds wrote a rant against it as being both
too much and too little, and in the context of writing Linux code, he
was correct. For Linux programming, you should be using OS-specific
features (which rely on "volatile" for their implementation) or atomics, rather than using "volatile" directly.
But for small-systems embedded programming, it is very handy. Used
well, it is free - used excessively it has a cost, but an extra volatile
will not make an otherwise correct program fail.
Memory barriers are great for utility functions such as interrupt enable/disable inline functions, but are usually sub-optimal compared to specific and targeted volatile accesses.
[1] https://barrgroup.com/embedded-systems/how-to/c-volatile-keyword
[2] https://blog.regehr.org/archives/28
Il 24/10/2021 13:02, David Brown ha scritto:
People who say "volatile is a bad thing" are often wrong. Remember, all
generalisations are false :-)
Ok, I wrote "volatile is **often** a bad thing".
"volatile" is a tool. It doesn't do everything that some people think
it does, but it is a very useful tool nonetheless. It has little place
in big systems - Linus Torvalds wrote a rant against it as being both
too much and too little, and in the context of writing Linux code, he
was correct. For Linux programming, you should be using OS-specific
features (which rely on "volatile" for their implementation) or atomics,
rather than using "volatile" directly.
But for small-systems embedded programming, it is very handy. Used
well, it is free - used excessively it has a cost, but an extra volatile
will not make an otherwise correct program fail.
Memory barriers are great for utility functions such as interrupt
enable/disable inline functions, but are usually sub-optimal compared to
specific and targeted volatile accesses.
Just to say what I read:
https://blog.regehr.org/archives/28
https://mcuoneclipse.com/2021/10/12/spilling-the-beans-volatile-qualifier/
[1] https://barrgroup.com/embedded-systems/how-to/c-volatile-keyword >>>>> [2] https://blog.regehr.org/archives/28
Disable interrupts while accessing the fifo. you really have to.
alternatively you'll often get away not using a fifo at all,
unless you're blocking for a long while in some part of the code.
Why would you do that. The fifo write pointer is only modified by
the interrupt handler, the read pointer is only modified by the
interrupted code. Has been done so for times immemorial.
Although this thread is on how to wrestle a poor
language to do what you want, sort of how to use a hammer on a screw
instead of taking the screwdriver, there would be no need to
mask interrupts with C either.
On 10/24/2021 4:14 AM, Dimiter_Popoff wrote:
Disable interrupts while accessing the fifo. you really have to.
alternatively you'll often get away not using a fifo at all,
unless you're blocking for a long while in some part of the code.
Why would you do that. The fifo write pointer is only modified by
the interrupt handler, the read pointer is only modified by the
interrupted code. Has been done so for times immemorial.
The OPs code doesn't differentiate between FIFO full and empty.
So, *he* may gain some advantage from disabling interrupts to
ensure the character he is about to retrieve is n ot overwritten
by an incoming character, placed at that location (cuz he lets
his FIFO wrap indiscriminately).
And, if the offsets ever got larger (wider) -- or became actual
pointers -- then the possibility of PART of a value being updated
on either "side" of an ISR is also possible.
And, there's nothing to say the OP has disclosed EVERYTHING
that might be happening in his ISR (maintaining handshaking signals,
flow control, etc.) which could compound the references
(e.g., if you need to know that you have space for N characters
remaining so you can signal the remote device to stop sending,
then you're doing "pointer/offset arithmetic" and *acting* on the
result)
Although this thread is on how to wrestle a poor
language to do what you want, sort of how to use a hammer on a screw
instead of taking the screwdriver, there would be no need to
mask interrupts with C either.
The "problem" with the language is that it gives the compiler the freedom
to make EQUIVALENT changes to your code that you might not have foreseen
or that might not have been consistent with your "style" -- yet do not
alter the results.
After all, a programming language -- ANY programming language -- is just
a vehicle for conveying our desires to the machine in a semi-unambiguous manner. I'd much rather *SAY*, "What are the roots of ax^2 + bx + c?"
than have to implement an algorithmic solution, worry about cancellation, required precision, etc. (and, in some languages, you can do just that!)
On 10/24/2021 22:54, Don Y wrote:
On 10/24/2021 4:14 AM, Dimiter_Popoff wrote:
Disable interrupts while accessing the fifo. you really have to.
alternatively you'll often get away not using a fifo at all,
unless you're blocking for a long while in some part of the code.
Why would you do that. The fifo write pointer is only modified by
the interrupt handler, the read pointer is only modified by the
interrupted code. Has been done so for times immemorial.
The OPs code doesn't differentiate between FIFO full and empty.
So he should fix that first, there is no sane reason why not.
Few things are simpler to do than that.
So, *he* may gain some advantage from disabling interrupts to
ensure the character he is about to retrieve is n ot overwritten
by an incoming character, placed at that location (cuz he lets
his FIFO wrap indiscriminately).
And, if the offsets ever got larger (wider) -- or became actual
pointers -- then the possibility of PART of a value being updated
on either "side" of an ISR is also possible.
And, there's nothing to say the OP has disclosed EVERYTHING
that might be happening in his ISR (maintaining handshaking signals,
flow control, etc.) which could compound the references
(e.g., if you need to know that you have space for N characters
remaining so you can signal the remote device to stop sending,
then you're doing "pointer/offset arithmetic" and *acting* on the
result)
Whatever handshakes he makes there is no problem knowing whether
the fifo is full - just check if the position the write pointer
will have after putting the next byte matches the read pointer
at the moment. Like I said before, few things are simpler than
that, can't imagine someone working as a programmer being
stuck at *that*.
Although this thread is on how to wrestle a poor
language to do what you want, sort of how to use a hammer on a screw
instead of taking the screwdriver, there would be no need to
mask interrupts with C either.
The "problem" with the language is that it gives the compiler the freedom
to make EQUIVALENT changes to your code that you might not have foreseen
or that might not have been consistent with your "style" -- yet do not
alter the results.
Don, let us not go into this. Just looking at the thread is enough to
see it is about wrestling the language so it can be made some use of.
After all, a programming language -- ANY programming language -- is just
a vehicle for conveying our desires to the machine in a semi-unambiguous
manner. I'd much rather *SAY*, "What are the roots of ax^2 + bx + c?"
than have to implement an algorithmic solution, worry about cancellation,
required precision, etc. (and, in some languages, you can do just that!)
Indeed you don't want to write how the equation is solved every time.
This is why you can call it once you have it available. This is language independent.
Then solving expressions etc. is well within 1% of the effort in
programming if the task at hand is going to take > 2 weeks; after that
the programmer's time is wasted on wrestling the language like
demonstrated by this thread. Sadly almost everybody has accepted
C as a standard - which makes it a very popular poor language.
Similar to say Chinese, very popular, spoken by billions, yet
where are its literary masterpieces. Being hieroglyph based there
are none; you will have to look at alphabet based languages to
find some.
On 10/24/2021 1:27 PM, Dimiter_Popoff wrote:
....
Whatever handshakes he makes there is no problem knowing whether
the fifo is full - just check if the position the write pointer
will have after putting the next byte matches the read pointer
at the moment. Like I said before, few things are simpler than
that, can't imagine someone working as a programmer being
stuck at *that*.
Yes, but if you want to implement flow control, you have to tell the
other end of the line BEFORE you've filled your buffer. There may be
a character being deserialized AS you are retrieving the "last"
character, another one (or more) preloaded into the transmitter on
the far device, etc. And, it will take some time for your
notification to reach the far end and be recognized as a desire
to suspend transmission. etc.
If you wait until you have no more space available, you are almost
certain to lose characters.
Although this thread is on how to wrestle a poor
language to do what you want, sort of how to use a hammer on a screw
instead of taking the screwdriver, there would be no need to
mask interrupts with C either.
The "problem" with the language is that it gives the compiler the
freedom
to make EQUIVALENT changes to your code that you might not have foreseen >>> or that might not have been consistent with your "style" -- yet do not
alter the results.
Don, let us not go into this. Just looking at the thread is enough to
see it is about wrestling the language so it can be made some use of.
The language isn't the problem. Witness the *millions* (?) of programs written in it, over the past 5 decades.
Indeed you don't want to write how the equation is solved every time.
This is why you can call it once you have it available. This is language
independent.
For a simple quadratic, you can explore the coefficients to determine which algorithm is best suited to giving you *accurate* results. >
What if I present *any* expression? Can you have your solution available
to handle any case? Did you even bother to develop such a solution if you were only encountering quadratics?
I've made some syntactic changes to my code that make it much easier
to read -- yet mean that I have to EXPLAIN how they work and why they
are present as any other developer would frown on encountering them.
(But, it's my opinion that, once explained, that developer will see them
as an efficient addition to the language in line with other *existing* mechanisms that are already present, there).
Similar to say Chinese, very popular, spoken by billions, yet
where are its literary masterpieces. Being hieroglyph based there
are none; you will have to look at alphabet based languages to
find some.
One can say the same thing about Unangax̂ -- spoken by ~100!
Popularity and literary masterpieces are completely different
axis.
Hear much latin or ancient greek spoken, recently?
On 10/25/2021 0:08, Don Y wrote:
On 10/24/2021 1:27 PM, Dimiter_Popoff wrote:
....
Whatever handshakes he makes there is no problem knowing whether
the fifo is full - just check if the position the write pointer
will have after putting the next byte matches the read pointer
at the moment. Like I said before, few things are simpler than
that, can't imagine someone working as a programmer being
stuck at *that*.
Yes, but if you want to implement flow control, you have to tell the
other end of the line BEFORE you've filled your buffer. There may be
a character being deserialized AS you are retrieving the "last"
character, another one (or more) preloaded into the transmitter on
the far device, etc. And, it will take some time for your
notification to reach the far end and be recognized as a desire
to suspend transmission. etc.
If you wait until you have no more space available, you are almost
certain to lose characters.
Well of course so, we have all done that sort of thing since the 80-s,
other people have done it before I suppose. Implementing fifo thresholds
is not (and has never been) rocket science.
The point is there is no point in throwing huge efforts at a
self-inflicted problem instead of just doing it the easy way which
is well, common knowledge.
Although this thread is on how to wrestle a poor
language to do what you want, sort of how to use a hammer on a screw >>>>> instead of taking the screwdriver, there would be no need to
mask interrupts with C either.
The "problem" with the language is that it gives the compiler the freedom >>>> to make EQUIVALENT changes to your code that you might not have foreseen >>>> or that might not have been consistent with your "style" -- yet do not >>>> alter the results.
Don, let us not go into this. Just looking at the thread is enough to
see it is about wrestling the language so it can be made some use of.
The language isn't the problem. Witness the *millions* (?) of programs
written in it, over the past 5 decades.
This does not prove much, it has been the only language allowing
"everybody" to do what they did.
I am not denying this is the best
language currently available to almost everybody. I just happened to
have been daring enough to explore my own way/language and have seen
how much is there to be gained if not having to wrestle a language
which is just a more complete phrase book than the rest of the
phrase books (aka high level languages).
I've made some syntactic changes to my code that make it much easier
to read -- yet mean that I have to EXPLAIN how they work and why they
are present as any other developer would frown on encountering them.
Oh I am well aware of the value of standardization and popularity,
these are the strongest points of C.
(But, it's my opinion that, once explained, that developer will see them
as an efficient addition to the language in line with other *existing*
mechanisms that are already present, there).
Of course, but you have to have them on board first...
Similar to say Chinese, very popular, spoken by billions, yet
where are its literary masterpieces. Being hieroglyph based there
are none; you will have to look at alphabet based languages to
find some.
One can say the same thing about Unangax̂ -- spoken by ~100!
Popularity and literary masterpieces are completely different
axis.
Hear much latin or ancient greek spoken, recently?
The Latin alphabet looks pretty popular nowadays :-). Everything
evolves, including languages. And there are dead ends within them
which just die out - e.g. roman numbers. Can't see much future in
any hieroglyph based language though, inventing a symbol for each
word has been demonstrated to be a bad idea by history.
...
ASM has always been available.
Folks just found it too inefficient
to solve "big" problems, in reasonable effort.
I am not denying this is the best
language currently available to almost everybody. I just happened to
have been daring enough to explore my own way/language and have seen
how much is there to be gained if not having to wrestle a language
which is just a more complete phrase book than the rest of the
phrase books (aka high level languages).
But you only have yourself as a client.
Most of us have to write code
(or modify already written code) that others will see/maintain. It
does no good to have a "great tool" if no one else uses it! >
I use (scant!) ASM, a modified ("proprietary") C dialect, SQL and a
scripting
language in my current design. (not counting the tools that generate my documentation).
....
Hear much latin or ancient greek spoken, recently?
The Latin alphabet looks pretty popular nowadays :-). Everything
evolves, including languages. And there are dead ends within them
which just die out - e.g. roman numbers. Can't see much future in
any hieroglyph based language though, inventing a symbol for each
word has been demonstrated to be a bad idea by history.
Witness the rise of arabic numerals and their efficacy towards
advancing mathematics.
On 10/25/2021 1:47, Don Y wrote:
...
ASM has always been available.
There is no such language as ASM, there is a wide variety of machines.
Folks just found it too inefficient
to solve "big" problems, in reasonable effort.
Especially with the advent of load/store machines (although C must have
been helped a lot by the clunky x86 architecture for its popularity), programming in the native assembler for any RISC machine would be
masochistic at best. Which is why I took the steps I took etc., no
need to go into that.
I am not denying this is the best
language currently available to almost everybody. I just happened to
have been daring enough to explore my own way/language and have seen
how much is there to be gained if not having to wrestle a language
which is just a more complete phrase book than the rest of the
phrase books (aka high level languages).
But you only have yourself as a client.
Yes, but this does not mean much. Looking at pieces I wrote 20 or
30 years ago - even 10 years ago sometimes - is like reading it
for the first time for many parts (tens of megabytes of sources, http://tgi-sci.com/misc/scnt21.gif ).
Most of us have to write code
(or modify already written code) that others will see/maintain. It
does no good to have a "great tool" if no one else uses it! >
I use (scant!) ASM, a modified ("proprietary") C dialect, SQL and a scripting
language in my current design. (not counting the tools that generate my
documentation).
Here comes the advantage of an "alphabet" rather than "hieroglyph" based approach/language. A lot less of lookup tables to memorize, you learn
while going etc. I am quite sure someone like you would get used to it
quite fast, much much faster than to an unknown high level language.
In fact it may take you very short to see it is something you have more
or less been familiar with forever.
Grasping the big picture of the entire environment and becoming
really good at writing within it would take longer, obviously.
On 10/24/2021 13:39, Johann Klammer wrote:
Disable interrupts while accessing the fifo. you really have to.
alternatively you'll often get away not using a fifo at all,
unless you're blocking for a long while in some part of the code.
Why would you do that. The fifo write pointer is only modified by
the interrupt handler, the read pointer is only modified by the
interrupted code. Has been done so for times immemorial.
Although this thread is on how to wrestle a poor
language to do what you want, sort of how to use a hammer on a screw
instead of taking the screwdriver, there would be no need to
mask interrupts with C either.
There are (and have been) many "safer" languages. Many that are more descriptive (for certain classes of problem). But, C has survived to
handle all-of-the-above... perhaps in a suboptimal way but at least
a manner that can get to the desired solution.
Look at how few applications SNOBOL handles. Write an OS in COBOL? Ada?
On 10/24/2021 22:54, Don Y wrote:
On 10/24/2021 4:14 AM, Dimiter_Popoff wrote:
Disable interrupts while accessing the fifo. you really have to.
alternatively you'll often get away not using a fifo at all,
unless you're blocking for a long while in some part of the code.
Why would you do that. The fifo write pointer is only modified by
the interrupt handler, the read pointer is only modified by the
interrupted code. Has been done so for times immemorial.
The OPs code doesn't differentiate between FIFO full and empty.
So he should fix that first, there is no sane reason why not.
Few things are simpler to do than that.
Whatever handshakes he makes there is no problem knowing whether
the fifo is full - just check if the position the write pointer
will have after putting the next byte matches the read pointer
at the moment. Like I said before, few things are simpler than
that, can't imagine someone working as a programmer being
stuck at *that*.
The language isn't the problem. Witness the *millions* (?) of programs written in it, over the past 5 decades.
The problem is that it never was an assembly language -- even though it
was treated as such "in days gone by" (because the compiler's were
just "language translators" and didn't add any OTHER value to the "programming process").
It's only recently that compilers have become "independent agents",
of a sort... adding their own "spin" on the developer's code.
On 2021-10-25 0:08, Don Y wrote:
There are (and have been) many "safer" languages. Many that are more
descriptive (for certain classes of problem). But, C has survived to
handle all-of-the-above... perhaps in a suboptimal way but at least
a manner that can get to the desired solution.
Look at how few applications SNOBOL handles. Write an OS in COBOL? Ada?
I don't know about COBOL, but typically the real-time kernels ("run-time systems") associated with Ada compilers for bare-board embedded systems are written in Ada, with a minor amount of assembly language for the most HW-related bits like HW context saving and restoring. I'm pretty sure that C-language OS kernels also use assembly for those things.
On 2021-10-24 23:27, Dimiter_Popoff wrote:
On 10/24/2021 22:54, Don Y wrote:
On 10/24/2021 4:14 AM, Dimiter_Popoff wrote:
Disable interrupts while accessing the fifo. you really have to.
alternatively you'll often get away not using a fifo at all,
unless you're blocking for a long while in some part of the code.
Why would you do that. The fifo write pointer is only modified by
the interrupt handler, the read pointer is only modified by the
interrupted code. Has been done so for times immemorial.
The OPs code doesn't differentiate between FIFO full and empty.
So he should fix that first, there is no sane reason why not.
Few things are simpler to do than that.
[snip]
Whatever handshakes he makes there is no problem knowing whether
the fifo is full - just check if the position the write pointer
will have after putting the next byte matches the read pointer
at the moment. Like I said before, few things are simpler than
that, can't imagine someone working as a programmer being
stuck at *that*.
That simple check would require keeping a maximum of only N-1 entries in the N-position FIFO buffer, and the OP explicitly said they did not want to allocate an unused place in the buffer (which I think is unreasonable of the OP, but that is only IMO).
The simple explanation for the N-1 limit is that the difference between two wrap-around pointers into an N-place buffer has at most N different values, while there are N+1 possible filling states of the buffer, from empty (zero items) to full (N items).
On 10/25/2021 12:56 AM, Niklas Holsti wrote:
On 2021-10-25 0:08, Don Y wrote:
There are (and have been) many "safer" languages. Many that are more
descriptive (for certain classes of problem). But, C has survived to
handle all-of-the-above... perhaps in a suboptimal way but at least
a manner that can get to the desired solution.
Look at how few applications SNOBOL handles. Write an OS in COBOL?
Ada?
I don't know about COBOL, but typically the real-time kernels
("run-time systems") associated with Ada compilers for bare-board
embedded systems are written in Ada, with a minor amount of assembly
language for the most HW-related bits like HW context saving and
restoring. I'm pretty sure that C-language OS kernels also use
assembly for those things.
Of course you *can* do these things.
The question is how often
they are ACTUALLY done with these other languages.
[By the same token, expecting the past to mirror the present is equally naive. People forget that tools and processes have evolved (in the 40+ years that I've been designing embedded products). And, that the isssues folks now face often weren't issues when tools were "stupider" (I've
probably got $60K of obsolete compilers to prove this -- anyone written
any C on an 1802 recently? Or, a 2A03? 65816? Z180? 6809?) Don't even *think* about finding an Ada compiler for them -- in the past!]
On 10/25/2021 1:09 AM, Niklas Holsti wrote:
On 2021-10-24 23:27, Dimiter_Popoff wrote:
On 10/24/2021 22:54, Don Y wrote:
On 10/24/2021 4:14 AM, Dimiter_Popoff wrote:
Disable interrupts while accessing the fifo. you really have to.
alternatively you'll often get away not using a fifo at all,
unless you're blocking for a long while in some part of the code.
Why would you do that. The fifo write pointer is only modified by
the interrupt handler, the read pointer is only modified by the
interrupted code. Has been done so for times immemorial.
The OPs code doesn't differentiate between FIFO full and empty.
So he should fix that first, there is no sane reason why not.
Few things are simpler to do than that.
[snip]
Whatever handshakes he makes there is no problem knowing whether
the fifo is full - just check if the position the write pointer
will have after putting the next byte matches the read pointer
at the moment. Like I said before, few things are simpler than
that, can't imagine someone working as a programmer being
stuck at *that*.
That simple check would require keeping a maximum of only N-1 entries
in the N-position FIFO buffer, and the OP explicitly said they did not
want to allocate an unused place in the buffer (which I think is
unreasonable of the OP, but that is only IMO).
The simple explanation for the N-1 limit is that the difference
between two wrap-around pointers into an N-place buffer has at most N
different values, while there are N+1 possible filling states of the
buffer, from empty (zero items) to full (N items).
But, again, that just deals with the "full check". The easiest way to do this is just to check ".in" *after* advancement and inhibit the store if
it coincides with the ".out" value.
Checking for a "high water mark" to enable flow control requires more computation (albeit simple) as you have to accommodate the delays in
that notification reaching the remote sender (lest he continue
sending and overrun your buffer).
And, later noting when you've consumed enough of the FIFO's contents
to reach a "low water mark" and reenable the remote's transmissions.
[And, if you ever have to deal with more "established" protocols
that require the sequencing of specific control signals DURING
a transfer, the ISR quickly becomes very complex!]
On 2021-10-25 11:28, Don Y wrote:
On 10/25/2021 1:09 AM, Niklas Holsti wrote:
On 2021-10-24 23:27, Dimiter_Popoff wrote:
On 10/24/2021 22:54, Don Y wrote:
On 10/24/2021 4:14 AM, Dimiter_Popoff wrote:
Disable interrupts while accessing the fifo. you really have to. >>>>>>> alternatively you'll often get away not using a fifo at all,Why would you do that. The fifo write pointer is only modified by
unless you're blocking for a long while in some part of the code. >>>>>>
the interrupt handler, the read pointer is only modified by the
interrupted code. Has been done so for times immemorial.
The OPs code doesn't differentiate between FIFO full and empty.
So he should fix that first, there is no sane reason why not.
Few things are simpler to do than that.
[snip]
Whatever handshakes he makes there is no problem knowing whether
the fifo is full - just check if the position the write pointer
will have after putting the next byte matches the read pointer
at the moment. Like I said before, few things are simpler than
that, can't imagine someone working as a programmer being
stuck at *that*.
That simple check would require keeping a maximum of only N-1 entries in the
N-position FIFO buffer, and the OP explicitly said they did not want to
allocate an unused place in the buffer (which I think is unreasonable of the
OP, but that is only IMO).
The simple explanation for the N-1 limit is that the difference between two >>> wrap-around pointers into an N-place buffer has at most N different values, >>> while there are N+1 possible filling states of the buffer, from empty (zero >>> items) to full (N items).
But, again, that just deals with the "full check". The easiest way to do
this is just to check ".in" *after* advancement and inhibit the store if
it coincides with the ".out" value.
Checking for a "high water mark" to enable flow control requires more
computation (albeit simple) as you have to accommodate the delays in
that notification reaching the remote sender (lest he continue
sending and overrun your buffer).
And, later noting when you've consumed enough of the FIFO's contents
to reach a "low water mark" and reenable the remote's transmissions.
[And, if you ever have to deal with more "established" protocols
that require the sequencing of specific control signals DURING
a transfer, the ISR quickly becomes very complex!]
Of course. Perhaps you (Don) did not see that I was agreeing with your position
and objecting to the "it is very simple" stance of Dimiter (considering the OP's expressed constraints).
Personally I would use critical sections to avoid relying on delicate reasoning
about interleaved executions. And to allow for easy future complexification of
the concurrent activities. The overhead of interrupt disabling and enabling is
seldom significant when that can be done directly without kernel calls.
On 2021-10-25 11:19, Don Y wrote:
On 10/25/2021 12:56 AM, Niklas Holsti wrote:
On 2021-10-25 0:08, Don Y wrote:
There are (and have been) many "safer" languages. Many that are moreI don't know about COBOL, but typically the real-time kernels ("run-time >>> systems") associated with Ada compilers for bare-board embedded systems are >>> written in Ada, with a minor amount of assembly language for the most
descriptive (for certain classes of problem). But, C has survived to
handle all-of-the-above... perhaps in a suboptimal way but at least
a manner that can get to the desired solution.
Look at how few applications SNOBOL handles. Write an OS in COBOL? Ada? >>>
HW-related bits like HW context saving and restoring. I'm pretty sure that >>> C-language OS kernels also use assembly for those things.
Of course you *can* do these things.
Then I misunderstood your (rhetorical?) question.
The question is how often
they are ACTUALLY done with these other languages.
I don't find that question very interesting.
It is a typical chicken-and-egg, first-to-market conundrum. There is an enormous amount of status-quo-favouring friction in awareness, education, tool
availability, and legacy code.
[By the same token, expecting the past to mirror the present is equally
naive. People forget that tools and processes have evolved (in the 40+
years that I've been designing embedded products). And, that the isssues
folks now face often weren't issues when tools were "stupider" (I've
probably got $60K of obsolete compilers to prove this -- anyone written
any C on an 1802 recently? Or, a 2A03? 65816? Z180? 6809?) Don't
even *think* about finding an Ada compiler for them -- in the past!]
Well, the Janus/Ada compiler was available for Z80 in its day. There are also Ada compilers that use C as an intermediate language, with applications for example on TI MSP430's, but those were probably not available in the past ages
you refer to.
Well, the Janus/Ada compiler was available for Z80 in its day. There are
also Ada compilers that use C as an intermediate language, with
applications for example on TI MSP430's, but those were probably not available in the past ages you refer to.
On 25/10/2021 10:52, Niklas Holsti wrote:
Well, the Janus/Ada compiler was available for Z80 in its day. There are
also Ada compilers that use C as an intermediate language, with
applications for example on TI MSP430's, but those were probably not
available in the past ages you refer to.
Presumably there is gcc-based Ada for the msp430 (as there is for the
8-bit AVR)?
There might not be a full library available, or possibly
some missing features in the language.
On 2021-10-24 23:27, Dimiter_Popoff wrote:
On 10/24/2021 22:54, Don Y wrote:
On 10/24/2021 4:14 AM, Dimiter_Popoff wrote:
Disable interrupts while accessing the fifo. you really have to.
alternatively you'll often get away not using a fifo at all,
unless you're blocking for a long while in some part of the code.
Why would you do that. The fifo write pointer is only modified by
the interrupt handler, the read pointer is only modified by the
interrupted code. Has been done so for times immemorial.
The OPs code doesn't differentiate between FIFO full and empty.
So he should fix that first, there is no sane reason why not.
Few things are simpler to do than that.
[snip]
Whatever handshakes he makes there is no problem knowing whether
the fifo is full - just check if the position the write pointer
will have after putting the next byte matches the read pointer
at the moment. Like I said before, few things are simpler than
that, can't imagine someone working as a programmer being
stuck at *that*.
That simple check would require keeping a maximum of only N-1 entries in
the N-position FIFO buffer, and the OP explicitly said they did not want
to allocate an unused place in the buffer (which I think is unreasonable
of the OP, but that is only IMO).
On 10/25/2021 11:09, Niklas Holsti wrote:
On 2021-10-24 23:27, Dimiter_Popoff wrote:
On 10/24/2021 22:54, Don Y wrote:
On 10/24/2021 4:14 AM, Dimiter_Popoff wrote:
Disable interrupts while accessing the fifo. you really have to.
alternatively you'll often get away not using a fifo at all,
unless you're blocking for a long while in some part of the code.
Why would you do that. The fifo write pointer is only modified by
the interrupt handler, the read pointer is only modified by the
interrupted code. Has been done so for times immemorial.
The OPs code doesn't differentiate between FIFO full and empty.
So he should fix that first, there is no sane reason why not.
Few things are simpler to do than that.
[snip]
Whatever handshakes he makes there is no problem knowing whether
the fifo is full - just check if the position the write pointer
will have after putting the next byte matches the read pointer
at the moment. Like I said before, few things are simpler than
that, can't imagine someone working as a programmer being
stuck at *that*.
That simple check would require keeping a maximum of only N-1 entries
in the N-position FIFO buffer, and the OP explicitly said they did not
want to allocate an unused place in the buffer (which I think is
unreasonable of the OP, but that is only IMO).
Well it might be reasonable if the fifo has a size of two, you know :-).
On 10/25/2021 20:43, Don Y wrote:
On 10/25/2021 8:34 AM, Niklas Holsti wrote:
And if each of those two items is large, yes. But here we have a FIFO of >>> 8-bit characters... few programs are so tight on memory that they cannot >>> stand one unused octet.
It's not "unused". Rather, it's roll is that of indicating "full/overrun". >> The OP seems to have decided that this is of no concern -- in *one* app?
Oh come on, I joked about the fifo of two bytes only because this whole thread is a joke
- pages and pages of C to maintain a fifo, what can be
more of a joke than this.
And if each of those two items is large, yes. But here we have a FIFO of 8-bit
characters... few programs are so tight on memory that they cannot stand one unused octet.
On 10/25/2021 8:34 AM, Niklas Holsti wrote:
And if each of those two items is large, yes. But here we have a FIFO
of 8-bit characters... few programs are so tight on memory that they
cannot stand one unused octet.
It's not "unused". Rather, it's roll is that of indicating "full/overrun". The OP seems to have decided that this is of no concern -- in *one* app?
On 2021-10-25 16:04, Dimiter_Popoff wrote:
On 10/25/2021 11:09, Niklas Holsti wrote:
On 2021-10-24 23:27, Dimiter_Popoff wrote:
On 10/24/2021 22:54, Don Y wrote:
On 10/24/2021 4:14 AM, Dimiter_Popoff wrote:
Disable interrupts while accessing the fifo. you really have to. >>>>>>> alternatively you'll often get away not using a fifo at all,Why would you do that. The fifo write pointer is only modified by
unless you're blocking for a long while in some part of the code. >>>>>>
the interrupt handler, the read pointer is only modified by the
interrupted code. Has been done so for times immemorial.
The OPs code doesn't differentiate between FIFO full and empty.
So he should fix that first, there is no sane reason why not.
Few things are simpler to do than that.
[snip]
Whatever handshakes he makes there is no problem knowing whether
the fifo is full - just check if the position the write pointer
will have after putting the next byte matches the read pointer
at the moment. Like I said before, few things are simpler than
that, can't imagine someone working as a programmer being
stuck at *that*.
That simple check would require keeping a maximum of only N-1 entries
in the N-position FIFO buffer, and the OP explicitly said they did
not want to allocate an unused place in the buffer (which I think is
unreasonable of the OP, but that is only IMO).
Well it might be reasonable if the fifo has a size of two, you know :-).
And if each of those two items is large, yes. But here we have a FIFO of 8-bit characters... few programs are so tight on memory that they cannot stand one unused octet.
However I know this isn't the best implementation ever and it's a pity the thread emphasis has been against this implementation (that was used as *one* implementation just to have an example to discuss on).
The main point was the use of volatile (and other techniques) to guarantee a correct compiler output, whatever legal (respect the C standard) optimizations
the compiler thinks to do.
It seems to me the arguments againts or for volatile are completely indipendent
from the implementation of ring-buffer.
Marking "in" and "buf" as volatile is /far/ better than using a critical section, and likely to be more efficient than a memory barrier. You can
also use volatileAccess rather than making buf volatile, and it is often slightly more efficient to cache volatile variables in a local variable
while working with them.
And if each of those two items is large, yes. But here we have a FIFO of 8-bit characters... few programs are so tight on memory that they cannot stand one unused octet.
Il 23/10/2021 18:09, David Brown ha scritto:
[...]
Marking "in" and "buf" as volatile is /far/ better than using a critical
section, and likely to be more efficient than a memory barrier. You can
also use volatileAccess rather than making buf volatile, and it is often
slightly more efficient to cache volatile variables in a local variable
while working with them.
I think I got your point, but I'm wondering why there are plenty of
examples of ring-buffer implementations that don't use volatile at all,
even if the author explicitly refers to interrupts and multithreading.
Just an example[1] by Quantum Leaps. It promises to be a *lock-free* (I
think thread-safe) ring-buffer implementation in the scenario of single producer/single consumer (that is my scenario too).
In the source code there's no use of volatile. I could call
RingBuf_put() in my rx uart ISR and call RingBuf_get() in my mainloop code.
From what I learned from you, this code usually works, but the standard doesn't guarantee it will work with every old, current and future
compilers.
[1] https://github.com/QuantumLeaps/lock-free-ring-buffer
Il 25/10/2021 17:34, Niklas Holsti ha scritto:
On 2021-10-25 16:04, Dimiter_Popoff wrote:
On 10/25/2021 11:09, Niklas Holsti wrote:
On 2021-10-24 23:27, Dimiter_Popoff wrote:
On 10/24/2021 22:54, Don Y wrote:
On 10/24/2021 4:14 AM, Dimiter_Popoff wrote:
Disable interrupts while accessing the fifo. you really have to. >>>>>>>> alternatively you'll often get away not using a fifo at all,Why would you do that. The fifo write pointer is only modified by >>>>>>> the interrupt handler, the read pointer is only modified by the
unless you're blocking for a long while in some part of the code. >>>>>>>
interrupted code. Has been done so for times immemorial.
The OPs code doesn't differentiate between FIFO full and empty.
When I have a small (<256) power-of-two (16, 32, 64, 128) buffer (and
this is the case for a UART receiving ring-buffer), I like to use this implementation that works and doesn't waste any element.
However I know this isn't the best implementation ever and it's a pity
the thread emphasis has been against this implementation (that was used
as *one* implementation just to have an example to discuss on).
The main point was the use of volatile (and other techniques) to
guarantee a correct compiler output, whatever legal (respect the C
standard) optimizations the compiler thinks to do.
It seems to me the arguments againts or for volatile are completely indipendent from the implementation of ring-buffer.
....
If the FIFO implementation is based on just two pointers (read and
write), and each pointer is modified by just one of the two threads
(main thread = reader, and interrupt handler = writer), and those modifications are both "volatile" AND atomic (which has not been
discussed so far, IIRC...), then one can do without a critical region.
But then detection of a full buffer needs one "wasted" element in the
buffer.
To avoid the wasted element, one could add a "full"/"not full" Boolean
flag. But that flag would be modified by both threads, and should be
modified atomically together with the pointer modifications, which (I
think) means that a critical section is needed.
On 10/25/2021 21:33, Niklas Holsti wrote:
....
If the FIFO implementation is based on just two pointers (read and
write), and each pointer is modified by just one of the two threads
(main thread = reader, and interrupt handler = writer), and those
modifications are both "volatile" AND atomic (which has not been
discussed so far, IIRC...), then one can do without a critical region.
But then detection of a full buffer needs one "wasted" element in the
buffer.
Why atomic?
On 2021-10-25 22:09, Dimiter_Popoff wrote:
On 10/25/2021 21:33, Niklas Holsti wrote:
....
If the FIFO implementation is based on just two pointers (read and
write), and each pointer is modified by just one of the two threads
(main thread = reader, and interrupt handler = writer), and those
modifications are both "volatile" AND atomic (which has not been
discussed so far, IIRC...), then one can do without a critical
region. But then detection of a full buffer needs one "wasted"
element in the buffer.
Why atomic?
If the read/write pointers/indices are, say, 16 bits, but the processor
has only 8-bit store/load instructions, updating a pointer/index happens non-atomically, 8 bits at a time, and the interrupt handler can read a half-updated value if the interrupt happens in the middle of an update.
That would certainly mess up the comparison between the read and write
points in the interrupt handler.
In the OP's code, I suppose (but I don't recall) that the indices are 8
bits, so probably atomically readable and writable.
On 10/24/2021 4:14 AM, Dimiter_Popoff wrote:
Disable interrupts while accessing the fifo. you really have to.
alternatively you'll often get away not using a fifo at all,
unless you're blocking for a long while in some part of the code.
Why would you do that. The fifo write pointer is only modified by
the interrupt handler, the read pointer is only modified by the
interrupted code. Has been done so for times immemorial.
The OPs code doesn't differentiate between FIFO full and empty.
On 2021-10-24 23:27, Dimiter_Popoff wrote:
On 10/24/2021 22:54, Don Y wrote:
On 10/24/2021 4:14 AM, Dimiter_Popoff wrote:
Disable interrupts while accessing the fifo. you really have to.
alternatively you'll often get away not using a fifo at all,
unless you're blocking for a long while in some part of the code.
Why would you do that. The fifo write pointer is only modified by
the interrupt handler, the read pointer is only modified by the
interrupted code. Has been done so for times immemorial.
The OPs code doesn't differentiate between FIFO full and empty.
So he should fix that first, there is no sane reason why not.
Few things are simpler to do than that.
[snip]
Whatever handshakes he makes there is no problem knowing whether
the fifo is full - just check if the position the write pointer
will have after putting the next byte matches the read pointer
at the moment. Like I said before, few things are simpler than
that, can't imagine someone working as a programmer being
stuck at *that*.
That simple check would require keeping a maximum of only N-1 entries in
the N-position FIFO buffer, and the OP explicitly said they did not want
to allocate an unused place in the buffer (which I think is unreasonable
of the OP, but that is only IMO).
On 2021-10-25 20:52, pozz wrote:
Il 25/10/2021 17:34, Niklas Holsti ha scritto:
On 2021-10-25 16:04, Dimiter_Popoff wrote:
On 10/25/2021 11:09, Niklas Holsti wrote:
On 2021-10-24 23:27, Dimiter_Popoff wrote:
On 10/24/2021 22:54, Don Y wrote:
On 10/24/2021 4:14 AM, Dimiter_Popoff wrote:
Disable interrupts while accessing the fifo. you really have to. >>>>>>>>> alternatively you'll often get away not using a fifo at all, >>>>>>>>> unless you're blocking for a long while in some part of the code. >>>>>>>>Why would you do that. The fifo write pointer is only modified by >>>>>>>> the interrupt handler, the read pointer is only modified by the >>>>>>>> interrupted code. Has been done so for times immemorial.
The OPs code doesn't differentiate between FIFO full and empty.
(I suspect something is not quite right with the attributions of the quotations above -- Dimiter probably did not suggest disabling
interrupts -- but no matter.)
[snip]
When I have a small (<256) power-of-two (16, 32, 64, 128) buffer (and
this is the case for a UART receiving ring-buffer), I like to use this
implementation that works and doesn't waste any element.
However I know this isn't the best implementation ever and it's a pity
the thread emphasis has been against this implementation (that was
used as *one* implementation just to have an example to discuss on).
The main point was the use of volatile (and other techniques) to
guarantee a correct compiler output, whatever legal (respect the C
standard) optimizations the compiler thinks to do.
It seems to me the arguments againts or for volatile are completely
indipendent from the implementation of ring-buffer.
Of course "volatile" is needed, in general, whenever anything is written
in one thread and read in another. The issue, I think, is when
"volatile" is _enough_.
I feel that detection of a full buffer (FIFO overflow) is required for a proper ring buffer implementation, and that has implications for the
data structure needed, and that has implications for whether critical sections are needed.
If the FIFO implementation is based on just two pointers (read and
write), and each pointer is modified by just one of the two threads
(main thread = reader, and interrupt handler = writer), and those modifications are both "volatile" AND atomic (which has not been
discussed so far, IIRC...), then one can do without a critical region.
But then detection of a full buffer needs one "wasted" element in the
buffer.
To avoid the wasted element, one could add a "full"/"not full" Boolean
flag. But that flag would be modified by both threads, and should be
modified atomically together with the pointer modifications, which (I
think) means that a critical section is needed.
On 10/25/2021 22:53, Niklas Holsti wrote:
On 2021-10-25 22:09, Dimiter_Popoff wrote:
On 10/25/2021 21:33, Niklas Holsti wrote:
....
If the FIFO implementation is based on just two pointers (read and
write), and each pointer is modified by just one of the two threads
(main thread = reader, and interrupt handler = writer), and those
modifications are both "volatile" AND atomic (which has not been
discussed so far, IIRC...), then one can do without a critical
region. But then detection of a full buffer needs one "wasted"
element in the buffer.
Why atomic?
If the read/write pointers/indices are, say, 16 bits, but the
processor has only 8-bit store/load instructions, updating a
pointer/index happens non-atomically, 8 bits at a time, and the
interrupt handler can read a half-updated value if the interrupt
happens in the middle of an update. That would certainly mess up the
comparison between the read and write points in the interrupt handler.
In the OP's code, I suppose (but I don't recall) that the indices are
8 bits, so probably atomically readable and writable.
Ah, well, this is a possible scenario in a multicore system (or single
core if the two bytes are written by separate opcodes).
Don Y <blockedofcourse@foo.invalid> wrote:
On 10/24/2021 4:14 AM, Dimiter_Popoff wrote:
Disable interrupts while accessing the fifo. you really have to.
alternatively you'll often get away not using a fifo at all,
unless you're blocking for a long while in some part of the code.
Why would you do that. The fifo write pointer is only modified by
the interrupt handler, the read pointer is only modified by the
interrupted code. Has been done so for times immemorial.
The OPs code doesn't differentiate between FIFO full and empty.
If you read carefuly what he wrote you would know that he does.
The trick he uses is that his indices may point outside buffer:
empty is equal indices, full is difference equal to buffer
size. Of course his approach has its own limitations, like
buffer size being power of 2 and with 8 bit indices maximal
buffer size is 128.
Il 23/10/2021 18:09, David Brown ha scritto:
[...]
Marking "in" and "buf" as volatile is /far/ better than using a critical
section, and likely to be more efficient than a memory barrier. You can
also use volatileAccess rather than making buf volatile, and it is often
slightly more efficient to cache volatile variables in a local variable
while working with them.
I think I got your point, but I'm wondering why there are plenty of
examples of ring-buffer implementations that don't use volatile at all,
even if the author explicitly refers to interrupts and multithreading.
Just an example[1] by Quantum Leaps. It promises to be a *lock-free* (I
think thread-safe) ring-buffer implementation in the scenario of single producer/single consumer (that is my scenario too).
In the source code there's no use of volatile. I could call
RingBuf_put() in my rx uart ISR and call RingBuf_get() in my mainloop code.
From what I learned from you, this code usually works, but the standard doesn't guarantee it will work with every old, current and future
compilers.
[1] https://github.com/QuantumLeaps/lock-free-ring-buffer
Don Y <blockedofcourse@foo.invalid> wrote:
On 10/25/2021 2:32 PM, antispam@math.uni.wroc.pl wrote:
Don Y <blockedofcourse@foo.invalid> wrote:
On 10/24/2021 4:14 AM, Dimiter_Popoff wrote:
Disable interrupts while accessing the fifo. you really have to.
alternatively you'll often get away not using a fifo at all,
unless you're blocking for a long while in some part of the code.
Why would you do that. The fifo write pointer is only modified by
the interrupt handler, the read pointer is only modified by the
interrupted code. Has been done so for times immemorial.
The OPs code doesn't differentiate between FIFO full and empty.
If you read carefuly what he wrote you would know that he does.
The trick he uses is that his indices may point outside buffer:
empty is equal indices, full is difference equal to buffer
Doesn't matter as any index can increase by any amount and
invalidate the "reality" of the buffer's contents (i.e.
actual number of characters that have been tranfered to
that region of memory).
AFAIK OP considers this not a problem in his application.
Of course, if such changes were a problem he would need to
add test preventing writing to full buffer (he already have
test preventing reading from empty buffer).
Buffer size is 128, for example. in is 127, out is 127.
What's that mean?
Empty buffer.
Can you tell me what has happened prior
to this point in time? Have 127 characters been received?
Or, 383? Or, 1151?
Does not matter.
How many characters have been removed from the buffer?
(same numeric examples).
The same as has been stored. Point is that received is
always bigger or equal to removed and does not exceed
removed by more than 128. So you can exactly recover
difference between received and removed.
The biggest practical limitation is that of expectations of
other developers who may inherit (or copy) his code expecting
the FIFO to be "well behaved".
Well, personally I would avoid storing to full buffer. And
even on small MCU it is not clear for me if his "savings"
are worth it. But his core design is sound.
Concerning other developers, I always working on assumption
that code is "as is" and any claim what it is doing are of
limited value unless there is convincing argument (proof
or outline of proof) what it is doing.
Fact that code
worked well in past system(s) is rather unconvincing.
I have seen small (few lines) pieces of code that contained
multiple bugs. And that code was in "production" use
for several years and passed its tests.
Certainly code like FIFO-s where there are multiple tradeofs
and actual code tends to be relatively small deserves
examination before re-use.
On 10/25/2021 2:32 PM, antispam@math.uni.wroc.pl wrote:
Don Y <blockedofcourse@foo.invalid> wrote:
On 10/24/2021 4:14 AM, Dimiter_Popoff wrote:
Disable interrupts while accessing the fifo. you really have to.
alternatively you'll often get away not using a fifo at all,
unless you're blocking for a long while in some part of the code.
Why would you do that. The fifo write pointer is only modified by
the interrupt handler, the read pointer is only modified by the
interrupted code. Has been done so for times immemorial.
The OPs code doesn't differentiate between FIFO full and empty.
If you read carefuly what he wrote you would know that he does.
The trick he uses is that his indices may point outside buffer:
empty is equal indices, full is difference equal to buffer
Doesn't matter as any index can increase by any amount and
invalidate the "reality" of the buffer's contents (i.e.
actual number of characters that have been tranfered to
that region of memory).
Buffer size is 128, for example. in is 127, out is 127.
What's that mean?
Can you tell me what has happened prior
to this point in time? Have 127 characters been received?
Or, 383? Or, 1151?
How many characters have been removed from the buffer?
(same numeric examples).
The biggest practical limitation is that of expectations of
other developers who may inherit (or copy) his code expecting
the FIFO to be "well behaved".
On 10/26/2021 5:20 PM, antispam@math.uni.wroc.pl wrote:
Don Y <blockedofcourse@foo.invalid> wrote:
On 10/25/2021 2:32 PM, antispam@math.uni.wroc.pl wrote:
Don Y <blockedofcourse@foo.invalid> wrote:
On 10/24/2021 4:14 AM, Dimiter_Popoff wrote:
Disable interrupts while accessing the fifo. you really have to. >>>>>> alternatively you'll often get away not using a fifo at all,Why would you do that. The fifo write pointer is only modified by
unless you're blocking for a long while in some part of the code. >>>>>
the interrupt handler, the read pointer is only modified by the
interrupted code. Has been done so for times immemorial.
The OPs code doesn't differentiate between FIFO full and empty.
If you read carefuly what he wrote you would know that he does.
The trick he uses is that his indices may point outside buffer:
empty is equal indices, full is difference equal to buffer
Doesn't matter as any index can increase by any amount and
invalidate the "reality" of the buffer's contents (i.e.
actual number of characters that have been tranfered to
that region of memory).
AFAIK OP considers this not a problem in his application.
And I don't think I have to test for division by zero -- as
*my* code is the code that is passing numerator and denominator
to that operator, right?
Can you remember all of the little assumptions you've made in
any non-trivial piece of code -- a week later? a month later?
6 months later (when a bug manifests or a feature upgrade
is requested)?
Do not check the inputs of routines for validity -- assume everything is correct (cuz YOU wrote it to be so, right?).
Do not handle error conditions -- because they can't exist (because
you wrote the code and feel confident that you've anticipated
every contingency -- including those for future upgrades).
Ignore compiler warnings -- surely you know better than a silly
"generic" program!
Would you hire someone who viewed your product's quality (and
your reputation) in this regard?
Of course, if such changes were a problem he would need to
add test preventing writing to full buffer (he already have
test preventing reading from empty buffer).
Buffer size is 128, for example. in is 127, out is 127.
What's that mean?
Empty buffer.
No, it means you can't sort out *if* there have been any characters
received, based solely on this fact (and, what other facts are there
to observe?)
Can you tell me what has happened prior
to this point in time? Have 127 characters been received?
Or, 383? Or, 1151?
Does not matter.
Of course it does! Something has happened that the code MIGHT have
detected in other circumstances (e.g., if uart_task had been invoked
more frequently). The world has changed and the code doesn't know it.
Why write code that only *sometimes* works?
How many characters have been removed from the buffer?
(same numeric examples).
The same as has been stored. Point is that received is
always bigger or equal to removed and does not exceed
removed by more than 128. So you can exactly recover
difference between received and removed.
If it can wrap, then "some data" can look like "no data".
If "no data", then NOTHING has been received -- from the
viewpoint of the code.
Tell me what prevents 256 characters from being received
after .in (and .out) are initially 0 -- without any
indication of their presence. What "limits" the difference
to "128"? Do you see any conditionals in the code that
do so? Is there some magic in the hardware that enforces
this?
This is how you end up with bugs in your code. The sorts
of bugs that you can witness -- with your own eyes -- and
never reproduce (until the code has been released and
lots of customers' eyes witness it as well).
The biggest practical limitation is that of expectations of
other developers who may inherit (or copy) his code expecting
the FIFO to be "well behaved".
Well, personally I would avoid storing to full buffer. And
even on small MCU it is not clear for me if his "savings"
are worth it. But his core design is sound.
Concerning other developers, I always working on assumption
that code is "as is" and any claim what it is doing are of
limited value unless there is convincing argument (proof
or outline of proof) what it is doing.
Ever worked on 100KLoC projects? 500KLoC? Do you personally examine
the entire codebase before you get started?
Do you purchase source
licenses for every library that you rely upon in your design?
(or, do you just assume software vendors are infallible?)
How would you feel if a fellow worker told you "yeah, the previous
guy had a habit of cutting corners in his FIFO management code"?
Or, "the previous guy always assumed malloc would succeed and
didn't even build an infrastructure to address the possibility
of it failing"
You could, perhaps, grep(1) for "malloc" or "FIFO" and manually
examine those code fragments.
What about division operators?
Or, verifying that data types never overflow their limits? Or...
Fact that code
worked well in past system(s) is rather unconvincing.
I have seen small (few lines) pieces of code that contained
multiple bugs. And that code was in "production" use
for several years and passed its tests.
Certainly code like FIFO-s where there are multiple tradeofs
and actual code tends to be relatively small deserves
examination before re-use.
It's not "FIFO code". It's a UART driver. Do you examine every piece
of code that might *contain* a FIFO? How do you know that there *is* a FIFO in a piece of code -- without manually inspecting it? What if it is a
FIFO mechanism but not explicitly named as a FIFO?
One wants to be able to move towards the goal of software *components*.
You don't want to have to inspect the design of every *diode* that
you use; you want to look at it's overall specifications and decide
if those fit your needs.
Unlikely that this code will describe itself as "works well enough
SOME of the time..."
And, when/if you stumble on such faults, good luck explaining to
your customer why it's going to take longer to fix and retest the
*existing* codebase before you can get on with your modifications...
One wants to be able to move towards the goal of software *components*.
You don't want to have to inspect the design of every *diode* that
you use; you want to look at it's overall specifications and decide
if those fit your needs.
Sure, I would love to see really reusable components. But IMHO we
are quite far from that.
There are some things which are reusable
if you accept modest to severe overhead.
For example things tends
to compose nicely if you dynamically allocate everything and use
garbage collection. But performace cost may be substantial.
And in embedded setting garbage collection may be unacceptable.
In some cases I have found out that I can get much better
speed joing things that could be done as composition of library
operations into single big routine.
In other cases I fixed
bugs by replacing composition of library routines by a single
routine: there were interactions making simple composition
incorrect. Correct alterantive was single routine.
As I wrote my embedded programs are simple and small. But I
use almost no external libraries. Trying some existing libraries
I have found out that some produce rather large programs, linking
in a lot of unneeded stuff.
Of course, writing for scratch
will not scale to bigger programs. OTOH, I feel that with
proper tooling it would be possible to retain efficiency and
small code size at least for large class of microntroller
programs (but existing tools and libraries do not support this).
Unlikely that this code will describe itself as "works well enough
SOME of the time..."
And, when/if you stumble on such faults, good luck explaining to
your customer why it's going to take longer to fix and retest the
*existing* codebase before you can get on with your modifications...
Commercial vendors like to say how good their progam are. But
market reality is that program my be quite bad and still sell.
On 10/26/2021 10:22 PM, antispam@math.uni.wroc.pl wrote:
One wants to be able to move towards the goal of software *components*.
You don't want to have to inspect the design of every *diode* that
you use; you want to look at it's overall specifications and decide
if those fit your needs.
Sure, I would love to see really reusable components. But IMHO we
are quite far from that.
Do you use the standard libraries?
Aren't THEY components?
You rely on the compiler to decide how to divide X by Y -- instead
of writing your own division routine.
How often do you reimplement
?printf() to avoid all of the bloat that typically accompanies it?
(when was the last time you needed ALL of those format specifiers
in an application? And modifiers?
There are some things which are reusable
if you accept modest to severe overhead.
What you need is components with varying characteristics.
You can buy diodes with all sorts of current carrying capacities,
PIVs, package styles, etc. But, they all still perform the
same function. Why so many different part numbers? Why not
just use the biggest, baddest diode in ALL your circuits?
I.e., we readily accept differences in "standard components"
in other disciplines; why not when it comes to software
modules?
For example things tends
to compose nicely if you dynamically allocate everything and use
garbage collection. But performace cost may be substantial.
And in embedded setting garbage collection may be unacceptable.
In some cases I have found out that I can get much better
speed joing things that could be done as composition of library
operations into single big routine.
Sure, but now you're tuning a solution to a specific problem.
I've designed custom chips to solve particular problems.
But, they ONLY solve those particular problems! OTOH,
I use lots of OTC components in my designs because those have
been designed (for the most part) with an eye towards
meeting a variety of market needs.
In other cases I fixed
bugs by replacing composition of library routines by a single
routine: there were interactions making simple composition
incorrect. Correct alterantive was single routine.
As I wrote my embedded programs are simple and small. But I
use almost no external libraries. Trying some existing libraries
I have found out that some produce rather large programs, linking
in a lot of unneeded stuff.
Because they try to address a variety of solution spaces without
trying to be "optimal" for any. You trade flexibility/capability
for speed/performance/etc.
Of course, writing for scratch
will not scale to bigger programs. OTOH, I feel that with
proper tooling it would be possible to retain efficiency and
small code size at least for large class of microntroller
programs (but existing tools and libraries do not support this).
Templates are an attempt in this direction. Allowing a class of
problems to be solved once and then tailored to the specific
application.
But, personal experience is where you win the most. You write
your second or third UART driver and start realizing that you
could leverage a previous design if you'd just thought it out
more fully -- instead of tailoring it to the specific needs
of the original application.
And, as you EXPECT to be reusing it in other applications (as
evidenced by the fact that it's your third time writing the same
piece of code!), you anticipate what those *might* need and
think about how to implement those features "economically".
It's rare that an application is *so* constrained that it can't
afford a couple of extra lines of code, here and there. If
you've considered efficiency in the design of your algorithms,
then these little bits of inefficiency will be below the noise floor.
Unlikely that this code will describe itself as "works well enough
SOME of the time..."
And, when/if you stumble on such faults, good luck explaining to
your customer why it's going to take longer to fix and retest the
*existing* codebase before you can get on with your modifications...
Commercial vendors like to say how good their progam are. But
market reality is that program my be quite bad and still sell.
The same is true of FOSS -- despite the claim that many eyes (may)
have looked at it (suggesting that bugs would have been caught!)
From "KLEE: Unassisted and Automatic Generation of High-Coverage
Tests for Complex Systems Programs":
KLEE finds important errors in heavily-tested code. It
found ten fatal errors in COREUTILS (including three
that had escaped detection for 15 years), which account
for more crashing bugs than were reported in 2006, 2007
and 2008 combined. It further found 24 bugs in BUSYBOX, 21
bugs in MINIX, and a security vulnerability in HISTAR? a
total of 56 serious bugs.
Ooops! I wonder how many FOSS *eyes* missed those errors?
Every time you reinvent a solution, you lose much of the benefit
of the previous TESTED solution.
Aren't THEY components?
Well, some folks expect more from components than from
traditional libraries. Some evan claim to deliver.
However, libraries have limitations and ATM I see nothing
that fundamentally change situation.
How often do you reimplement
?printf() to avoid all of the bloat that typically accompanies it?
I did that once (for OS kernel where standard library would not
work). If needed I can reuse it. On PC-s I am not worried by
bloat due to printf. OTOH, on MCU-s I am not sure if I ever used
printf. Rather, printing was done by specialized routines
either library provided or my own.
(when was the last time you needed ALL of those format specifiers
in an application? And modifiers?
There are some things which are reusable
if you accept modest to severe overhead.
What you need is components with varying characteristics.
You can buy diodes with all sorts of current carrying capacities,
PIVs, package styles, etc. But, they all still perform the
same function. Why so many different part numbers? Why not
just use the biggest, baddest diode in ALL your circuits?
I heard such electronic analogies many times. But they miss
important point: there is no way for me to make my own diode,
I am stuck with what is available on the market. And diode
is logically pretty simple component, yet we need many kinds.
I.e., we readily accept differences in "standard components"
in other disciplines; why not when it comes to software
modules?
Well, software is _much_ more compilcated than physical
engineering artifacts. Physical thing may have 10000 joints,
but if joints are identical, then this is moral equivalent of
simple loop that just iterates fixed number of times.
At software level number of possible pre-composed blocks
is so large that it is infeasible to deliver all of them.
Classic trick it to parametrize. However even if you
parametrize there are hundreds of design decisions going
into relatively small piece of code. If you expose all
design decisions then user as well may write his/her own
code because complexity will be similar. So normaly
parametrization is limited and there will be users who
find hardcoded desion choices inadequate.
Another things is that current tools are rather weak
at supporting parametrization.
For example things tends
to compose nicely if you dynamically allocate everything and use
garbage collection. But performace cost may be substantial.
And in embedded setting garbage collection may be unacceptable.
In some cases I have found out that I can get much better
speed joing things that could be done as composition of library
operations into single big routine.
Sure, but now you're tuning a solution to a specific problem.
I've designed custom chips to solve particular problems.
But, they ONLY solve those particular problems! OTOH,
I use lots of OTC components in my designs because those have
been designed (for the most part) with an eye towards
meeting a variety of market needs.
Maybe I made wrong impression, I think some explanation is in
place here. I am trying to make my code reusable. For my
problems performance is important part of reusablity: our
capability to solve problem is limited by performance and with
better perfomance users can solve bigger problems. I am
re-using code that I can and I would re-use more if I could
but there there are technical obstacles. Also, while I am
trying to make my code reusable, there are intrusive
design decision which may interfere with your possiobility
and willingness to re-use.
In slightly different spirit: in another thread you wrote
about accessing disc without OS file cache. Here I
normaly depend on OS and OS file caching is big thing.
It is not perfect, but OS (OK, at least Linux) is doing
this resonably well I have no temptation to avoid it.
And I appreciate that with OS cache performance is
usually much better that would be "without cache".
OTOH, I routinly avoid stdio for I/O critical things
(so no printf in I/O critical code).
In other cases I fixed
bugs by replacing composition of library routines by a single
routine: there were interactions making simple composition
incorrect. Correct alterantive was single routine.
As I wrote my embedded programs are simple and small. But I
use almost no external libraries. Trying some existing libraries
I have found out that some produce rather large programs, linking
in a lot of unneeded stuff.
Because they try to address a variety of solution spaces without
trying to be "optimal" for any. You trade flexibility/capability
for speed/performance/etc.
I think that this is more subtle: libraries frequently force some
way of doing things. Which may be good if you try to quickly roll
solution and are within capabilities of library. But if you
need/want different design, then library may be too inflexible
to deliver it.
Of course, writing for scratch
will not scale to bigger programs. OTOH, I feel that with
proper tooling it would be possible to retain efficiency and
small code size at least for large class of microntroller
programs (but existing tools and libraries do not support this).
Templates are an attempt in this direction. Allowing a class of
problems to be solved once and then tailored to the specific
application.
Yes, templates could help. But they also have problems. One
of them is that (among others) I would like to target STM8
and I have no C++ compiler for STM8. My idea is to create
custom "optimizer/generator" for (annotated) C code.
ATM it is vapourware, but I think it is feasible with
reasonable effort.
But, personal experience is where you win the most. You write
your second or third UART driver and start realizing that you
could leverage a previous design if you'd just thought it out
more fully -- instead of tailoring it to the specific needs
of the original application.
And, as you EXPECT to be reusing it in other applications (as
evidenced by the fact that it's your third time writing the same
piece of code!), you anticipate what those *might* need and
think about how to implement those features "economically".
It's rare that an application is *so* constrained that it can't
afford a couple of extra lines of code, here and there. If
you've considered efficiency in the design of your algorithms,
then these little bits of inefficiency will be below the noise floor.
Well, I am not talking about "couple of extra lines". Rather
about IMO substantial fixed overhead. As I wrote, one of my
targets is STM8 with 8k flash, another is MSP430 with 16k flash,
another is STM32 with 16k flash (there are also bigger targets).
One of libraries/frameworks for STM32 after activating few featurs
pulled in about 16k code, this is substantial overhead given
how little features I needed. Other folks reported that for
trivial programs vendor supplied frameworks pulled close to 30k
code. That may be fine if you have bigger device and need features,
but for smaller MCU-s it may be difference between not fitting into
device or (without library) having plenty of free space.
When I tried it Free RTOS for STM32 needed about 8k flash. Which
is fine if you need RTOS. But ATM my designs run without RTOS.
I have found libopencm3 to have small overhead. But is routines
are doing so little that direct register access may give simpler
code.
Unlikely that this code will describe itself as "works well enough
SOME of the time..."
And, when/if you stumble on such faults, good luck explaining to
your customer why it's going to take longer to fix and retest the
*existing* codebase before you can get on with your modifications...
Commercial vendors like to say how good their progam are. But
market reality is that program my be quite bad and still sell.
The same is true of FOSS -- despite the claim that many eyes (may)
have looked at it (suggesting that bugs would have been caught!)
From "KLEE: Unassisted and Automatic Generation of High-Coverage
Tests for Complex Systems Programs":
KLEE finds important errors in heavily-tested code. It
found ten fatal errors in COREUTILS (including three
that had escaped detection for 15 years), which account
for more crashing bugs than were reported in 2006, 2007
and 2008 combined. It further found 24 bugs in BUSYBOX, 21
bugs in MINIX, and a security vulnerability in HISTAR? a
total of 56 serious bugs.
Ooops! I wonder how many FOSS *eyes* missed those errors?
Open source folks tend to be more willing to talk about bugs.
And the above nicely shows that there is a lot of bugs, most
waiting to by discovered.
Every time you reinvent a solution, you lose much of the benefit
of the previous TESTED solution.
TESTED part works for simple repeatable tasks. But if you have
complex task it is quite likely that you will be the first
person with given use case. gcc is borderline case: if you
throw really new code at it you can expect to see bugs.
gcc user community it large and there is resonable chance that
sombody wrote earlier code which is sufficiently similar to
yours to catch troubles. But there are domains that are at
least as complicated as compilation and have much smaller
user community. You may find out that there are _no_ code
that could be reasonably re-used. Were you ever in situation
when you looked how some "standard library" solves a tricky
problem and realized that in fact library does not solve
the problem?
On 10/31/2021 3:54 PM, antispam@math.uni.wroc.pl wrote:<snip>
Aren't THEY components?
Well, some folks expect more from components than from
traditional libraries. Some evan claim to deliver.
However, libraries have limitations and ATM I see nothing
that fundamentally change situation.
A component is something that you can use as a black box,
without having to reinvent it. It is the epitome of reuse.
How often do you reimplement
?printf() to avoid all of the bloat that typically accompanies it?
I did that once (for OS kernel where standard library would not
work). If needed I can reuse it. On PC-s I am not worried by
bloat due to printf. OTOH, on MCU-s I am not sure if I ever used
printf. Rather, printing was done by specialized routines
either library provided or my own.
You can also create a ?printf() that you can configure at build time to support the modifiers and specifiers that you know you will need.
Just like you can configure a UART driver to support a FIFO size defined
at configuration, hardware handshaking, software flowcontrol, the
high and low water marks for each of those (as they can be different),
the character to send to request the remote to stop transmitting,
the character you send to request resumption of transmission, which
character YOU will recognize as requesting your Tx channel to pause,
the character (or condition) you will recognize to resume your Tx,
whether or not you will sample the condition codes in the UART, how
you read/write the data register, how you read/write the status register, etc.
While these sound like lots of options, they are all relatively
trivial additions to the code.
(when was the last time you needed ALL of those format specifiers
in an application? And modifiers?
There are some things which are reusable
if you accept modest to severe overhead.
What you need is components with varying characteristics.
You can buy diodes with all sorts of current carrying capacities,
PIVs, package styles, etc. But, they all still perform the
same function. Why so many different part numbers? Why not
just use the biggest, baddest diode in ALL your circuits?
I am stuck with what is available on the market. And diode
is logically pretty simple component, yet we need many kinds.
I.e., we readily accept differences in "standard components"
in other disciplines; why not when it comes to software
modules?
Well, software is _much_ more compilcated than physical
engineering artifacts. Physical thing may have 10000 joints,
but if joints are identical, then this is moral equivalent of
simple loop that just iterates fixed number of times.
This is the argument in favor of components. You'd much rather
read a comprehensive specification ("datasheet") for a software
component than have to read through all of the code that implements
it.
What if it was implemented in some programming language in
which you aren't expert? What if it was a binary "BLOB" and
couldn't be inspected?
At software level number of possible pre-composed blocks
is so large that it is infeasible to deliver all of them.
You don't have to deliver all of them. When you wire a circuit,
you still have to *solder* connections, don't you? The
components don't magically glue themselves together...
Classic trick it to parametrize. However even if you
parametrize there are hundreds of design decisions going
into relatively small piece of code. If you expose all
design decisions then user as well may write his/her own
code because complexity will be similar. So normaly
parametrization is limited and there will be users who
find hardcoded desion choices inadequate.
Another things is that current tools are rather weak
at supporting parametrization.
Look at a fleshy UART driver and think about how you would decompose
it into N different variants that could be "compile time configurable". You'll be surprised as to how easy it is. Even if the actual UART
hardware differs from instance to instance.
For example things tends
to compose nicely if you dynamically allocate everything and use
garbage collection. But performace cost may be substantial.
And in embedded setting garbage collection may be unacceptable.
In some cases I have found out that I can get much better
speed joing things that could be done as composition of library
operations into single big routine.
Sure, but now you're tuning a solution to a specific problem.
I've designed custom chips to solve particular problems.
But, they ONLY solve those particular problems! OTOH,
I use lots of OTC components in my designs because those have
been designed (for the most part) with an eye towards
meeting a variety of market needs.
Maybe I made wrong impression, I think some explanation is in
place here. I am trying to make my code reusable. For my
problems performance is important part of reusablity: our
capability to solve problem is limited by performance and with
better perfomance users can solve bigger problems. I am
re-using code that I can and I would re-use more if I could
but there there are technical obstacles. Also, while I am
trying to make my code reusable, there are intrusive
design decision which may interfere with your possiobility
and willingness to re-use.
If you don't know where the design is headed, then you can't
pick the components that it will need.
I approach a design from the top (down) and bottom (up). This
lets me gauge the types of information that I *may* have
available from the hardware -- so I can sort out how to
approach those limitations from above. E.g., if I can't
control the data rate of a comm channel, then I either have
to ensure I can catch every (complete) message *or* design a
protocol that lets me detect when I've missed something.
There are costs to both approaches. If I dedicate resource to
ensuring I don't miss anything, then some other aspect of the
design will bear that cost. If I rely on detecting missed
messages, then I have to put a figure on their relative
likelihood so my device doesn't fail to provide its desired
functionality (because it is always missing one or two characters
out of EVERY message -- and, thus, sees NO messages).
In slightly different spirit: in another thread you wrote
about accessing disc without OS file cache. Here I
normaly depend on OS and OS file caching is big thing.
It is not perfect, but OS (OK, at least Linux) is doing
this resonably well I have no temptation to avoid it.
And I appreciate that with OS cache performance is
usually much better that would be "without cache".
OTOH, I routinly avoid stdio for I/O critical things
(so no printf in I/O critical code).
My point about the cache was that it is of no value in my case;
I'm not going to revisit a file once I've seen it the first
time (so why hold onto that data?)
In other cases I fixed
bugs by replacing composition of library routines by a single
routine: there were interactions making simple composition
incorrect. Correct alterantive was single routine.
As I wrote my embedded programs are simple and small. But I
use almost no external libraries. Trying some existing libraries
I have found out that some produce rather large programs, linking
in a lot of unneeded stuff.
Because they try to address a variety of solution spaces without
trying to be "optimal" for any. You trade flexibility/capability
for speed/performance/etc.
I think that this is more subtle: libraries frequently force some
way of doing things. Which may be good if you try to quickly roll
solution and are within capabilities of library. But if you
need/want different design, then library may be too inflexible
to deliver it.
Use a different diode.
Of course, writing for scratch
will not scale to bigger programs. OTOH, I feel that with
proper tooling it would be possible to retain efficiency and
small code size at least for large class of microntroller
programs (but existing tools and libraries do not support this).
Templates are an attempt in this direction. Allowing a class of
problems to be solved once and then tailored to the specific
application.
Yes, templates could help. But they also have problems. One
of them is that (among others) I would like to target STM8
and I have no C++ compiler for STM8. My idea is to create
custom "optimizer/generator" for (annotated) C code.
ATM it is vapourware, but I think it is feasible with
reasonable effort.
But, personal experience is where you win the most. You write
your second or third UART driver and start realizing that you
could leverage a previous design if you'd just thought it out
more fully -- instead of tailoring it to the specific needs
of the original application.
And, as you EXPECT to be reusing it in other applications (as
evidenced by the fact that it's your third time writing the same
piece of code!), you anticipate what those *might* need and
think about how to implement those features "economically".
It's rare that an application is *so* constrained that it can't
afford a couple of extra lines of code, here and there. If
you've considered efficiency in the design of your algorithms,
then these little bits of inefficiency will be below the noise floor.
Well, I am not talking about "couple of extra lines". Rather
about IMO substantial fixed overhead. As I wrote, one of my
targets is STM8 with 8k flash, another is MSP430 with 16k flash,
another is STM32 with 16k flash (there are also bigger targets).
One of libraries/frameworks for STM32 after activating few featurs
pulled in about 16k code, this is substantial overhead given
how little features I needed. Other folks reported that for
trivial programs vendor supplied frameworks pulled close to 30k
A "framework" is considerably more than a set of individually
selectable components. I've designed products with 2KB of code and
128 bytes of RAM. The "components" were ASM modules instead of
HLL modules. Each told me how big it was, how much RAM it required,
how deep the stack penetration when invoked, how many T-states
(worst case) to execute, etc.
So, before I designed the hardware, I knew what I would need
by way of ROM/RAM (before the days of FLASH) and could commit
the hardware to foil without fear of running out of "space" or
"time".
code. That may be fine if you have bigger device and need features,
but for smaller MCU-s it may be difference between not fitting into
device or (without library) having plenty of free space.
Sure. But a component will have a datasheet that tells you what
it provides and at what *cost*.
When I tried it Free RTOS for STM32 needed about 8k flash. Which
is fine if you need RTOS. But ATM my designs run without RTOS.
RTOS is a commonly misused term. Many are more properly called
MTOSs (they provide no real timeliness guarantees, just multitasking primitives).
IMO, the advantages of writing in a multitasking environment so
far outweigh the "costs" of an MTOS that it behooves one to consider
how to shoehorn that functionality into EVERY design.
When writing in a HLL, there are complications that impose
constraints on how the MTOS provides its services. But, for small
projects written in ASM, you can gain the benefits of an MTOS
for very few bytes of code (and effectively zero RAM).
Unlikely that this code will describe itself as "works well enough
SOME of the time..."
And, when/if you stumble on such faults, good luck explaining to
your customer why it's going to take longer to fix and retest the
*existing* codebase before you can get on with your modifications...
Commercial vendors like to say how good their progam are. But
market reality is that program my be quite bad and still sell.
The same is true of FOSS -- despite the claim that many eyes (may)
have looked at it (suggesting that bugs would have been caught!)
From "KLEE: Unassisted and Automatic Generation of High-Coverage
Tests for Complex Systems Programs":
KLEE finds important errors in heavily-tested code. It
found ten fatal errors in COREUTILS (including three
that had escaped detection for 15 years), which account
for more crashing bugs than were reported in 2006, 2007
and 2008 combined. It further found 24 bugs in BUSYBOX, 21
bugs in MINIX, and a security vulnerability in HISTAR? a
total of 56 serious bugs.
Ooops! I wonder how many FOSS *eyes* missed those errors?
Open source folks tend to be more willing to talk about bugs.
And the above nicely shows that there is a lot of bugs, most
waiting to by discovered.
Part of the problem is ownership of the codebase. You are
more likely to know where your own bugs lie -- and, more
willing to fix them ("pride of ownership"). When a piece
of code is shared, over time, there seems to be less incentive
for folks to tackle big -- often dubious -- issues as the
"reward" is minimal (i.e., you may not own the code when the bug
eventually becomes a problem)
Don Y <blockedofcourse@foo.invalid> wrote:
Classic trick it to parametrize. However even if you
parametrize there are hundreds of design decisions going
into relatively small piece of code. If you expose all
design decisions then user as well may write his/her own
code because complexity will be similar. So normaly
parametrization is limited and there will be users who
find hardcoded desion choices inadequate.
Another things is that current tools are rather weak
at supporting parametrization.
Look at a fleshy UART driver and think about how you would decompose
it into N different variants that could be "compile time configurable".
You'll be surprised as to how easy it is. Even if the actual UART
hardware differs from instance to instance.
UART-s are simple. And yet some things are tricky: in C to have
"compile time configurable" buffer size you need to use macros.
Works, but in a sense UART implementation "leaks" to user code.
There are costs to both approaches. If I dedicate resource to
ensuring I don't miss anything, then some other aspect of the
design will bear that cost. If I rely on detecting missed
messages, then I have to put a figure on their relative
likelihood so my device doesn't fail to provide its desired
functionality (because it is always missing one or two characters
out of EVERY message -- and, thus, sees NO messages).
My thinking goes toward using relatively short messages and
buffer big enough for two messages.
If there is need for
high speed I would go for continous messages and DMA
transfers (using break interrupt to discover end of message
in case of variable length messages). So device should
be able to get all messages and in case of excess message
trafic whole message could be dropped (possibly looking
first for some high priority messages). Of course, there
may be some externaly mandated message format and/or
communitation protocal making DMA inappropriate.
Still, assuming interrupts, all characters should reach
interrupt handler, causing possibly some extra CPU
load. The only possiblity of unnoticed loss of characters
would be blocking interrupts too long. If interrupts can
be blocked for too long, then I would expect loss of whole
messages. In such case protocol should have something like
"dont talk to me for next 100 miliseconds, I will be busy"
to warn other nodes and request silence. Now, if you
need to faithfully support sillyness like Modbus RTU timeouts,
then I hope that you are adequatly paid...
IMO, the advantages of writing in a multitasking environment so
far outweigh the "costs" of an MTOS that it behooves one to consider
how to shoehorn that functionality into EVERY design.
When writing in a HLL, there are complications that impose
constraints on how the MTOS provides its services. But, for small
projects written in ASM, you can gain the benefits of an MTOS
for very few bytes of code (and effectively zero RAM).
Well, looking at books and articles I did not find convincing argument/example showing that one really need multitasking for
small systems.
I tend to think rather in terms of collection
of coupled finite state machines (or if you prefer Petri net).
State machines transition in response to events and may generate
events. Each finite state machine could be a task. But it is
not clear if it should. Some transitions are simple and should
be fast and that I would do in interrupt handlers. Some
other are triggered in regular way from other machines and
are naturally handled by function calls. Some need queues.
The whole thing fits resonably well in "super loop" paradigm.
From "KLEE: Unassisted and Automatic Generation of High-Coverage
Tests for Complex Systems Programs":
KLEE finds important errors in heavily-tested code. It
found ten fatal errors in COREUTILS (including three
that had escaped detection for 15 years), which account
for more crashing bugs than were reported in 2006, 2007
and 2008 combined. It further found 24 bugs in BUSYBOX, 21
bugs in MINIX, and a security vulnerability in HISTAR? a
total of 56 serious bugs.
Ooops! I wonder how many FOSS *eyes* missed those errors?
Open source folks tend to be more willing to talk about bugs.
And the above nicely shows that there is a lot of bugs, most
waiting to by discovered.
Part of the problem is ownership of the codebase. You are
more likely to know where your own bugs lie -- and, more
willing to fix them ("pride of ownership"). When a piece
of code is shared, over time, there seems to be less incentive
for folks to tackle big -- often dubious -- issues as the
"reward" is minimal (i.e., you may not own the code when the bug
eventually becomes a problem)
Ownership may cause problems: there is tendency to "solve"
problems locally, that is in code that given person "owns".
This is good if there is easy local solution. However, this
may also lead to ugly workarounds that really do not work
well, while problem is easily solvable in different part
("owned" by different programmer). I have seen such thing
several times, looking at whole codebase after some effort
it was possible to do simple fix, while there were workarounds
in different ("wrong") places. I had no contact with
original authors, but it seems that workarounds were due to
"ownership".
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 293 |
Nodes: | 16 (2 / 14) |
Uptime: | 241:07:32 |
Calls: | 6,624 |
Files: | 12,173 |
Messages: | 5,320,138 |