• How to write a simple driver in bare metal systems: volatile, memory ba

    From pozz@21:1/5 to All on Sat Oct 23 00:07:40 2021
    Even I write software for embedded systems for more than 10 years,
    there's an argument that from time to time let me think for hours and
    leave me with many doubts.

    Consider a simple embedded system based on a MCU (AVR8 or Cortex-Mx).
    The software is bare metal, without any OS. The main pattern is the well
    known mainloop (background code) that is interrupted by ISR.

    Interrupts are used mainly for timings and for low-level driver. For
    example, the UART reception ISR move the last received char in a FIFO
    buffer, while the mainloop code pops new data from the FIFO.


    static struct {
    unsigned char buf[RXBUF_SIZE];
    uint8_t in;
    uint8_t out;
    } rxfifo;

    /* ISR */
    void uart_rx_isr(void) {
    unsigned char c = UART->DATA;
    rxfifo.buf[in % RXBUF_SIZE] = c;
    rxfifo.in++;
    // Reset interrupt flag
    }

    /* Called regularly from mainloop code */
    int uart_task(void) {
    int c = -1;
    if (out != in) {
    c = rxfifo.buf[out % RXBUF_SIZE];
    out++;
    }
    return -1;
    }


    From a 20-years old article[1] by Nigle Jones, this seems a situation
    where volatile must be used for rxfifo.in, because is modified by an ISR
    and used in the mainloop code.

    I don't think so, rxfifo.in is read from memory only one time in
    uart_task(), so there isn't the risk that compiler can optimize badly.
    Even if ISR is fired immediately after the if statement, this doesn't
    bring to a dangerous state: the just received data will be processed at
    the next call to uart_task().

    So IMHO volatile isn't necessary here. And critical sections (i.e.
    disabling interrupts) aren't useful too.

    However I'm thinking about memory barrier. Suppose the compiler reorder
    the instructions in uart_task() as follows:


    c = rxfifo.buf[out % RXBUF_SIZE]
    if (out != in) {
    out++;
    return c;
    } else {
    return -1;
    }


    Here there's a big problem, because compiler decided to firstly read rxfifo.buf[] and then test in and out equality. If the ISR is fired
    immediately after moving data to c (most probably an internal register),
    the condition in the if statement will be true and the register value is returned. However the register value isn't correct.

    I don't think any modern C compiler reorder uart_task() in this way, but
    we can't be sure. The result shouldn't change for the compiler, so it
    can do this kind of things.

    How to fix this issue if I want to be extremely sure the compiler will
    not reorder this way? Applying volatile to rxfifo.in shouldn't help for
    this, because compiler is allowed to reorder access of non volatile
    variables yet[2].

    One solution is adding a memory barrier in this way:


    int uart_task(void) {
    int c = -1;
    if (out != in) {
    memory_barrier();
    c = rxfifo.buf[out % RXBUF_SIZE];
    out++;
    }
    return -1;
    }


    However this approach appears to me dangerous. You have to check and
    double check if, when and where memory barriers are necessary and it's
    simple to skip a barrier where it's nedded and add a barrier where it
    isn't needed.

    So I'm thinking that a sub-optimal (regarding efficiency) but reliable (regarding the risk to skip a barrier where it is needed) could be to
    enter a critical section (disabling interrupts) anyway, if it isn't
    strictly needed.


    int uart_task(void) {
    ENTER_CRITICAL_SECTION();
    int c = -1;
    if (out != in) {
    c = rxfifo.buf[out % RXBUF_SIZE];
    out++;
    }
    EXIT_CRITICAL_SECTION();
    return -1;
    }


    Another solution could be to apply volatile keyword to rxfifo.in *AND* rxfifo.buf too, so compiler can't change the order of accesses them.

    Do you have other suggestions?



    [1] https://barrgroup.com/embedded-systems/how-to/c-volatile-keyword
    [2] https://blog.regehr.org/archives/28

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Clifford Heath@21:1/5 to pozz on Sat Oct 23 13:40:40 2021
    On 23/10/21 9:07 am, pozz wrote:
    Even I write software for embedded systems for more than 10 years,
    there's an argument that from time to time let me think for hours and
    leave me with many doubts.

    Consider a simple embedded system based on a MCU (AVR8 or Cortex-Mx).
    The software is bare metal, without any OS. The main pattern is the well known mainloop (background code) that is interrupted by ISR.

    Interrupts are used mainly for timings and for low-level driver. For
    example, the UART reception ISR move the last received char in a FIFO
    buffer, while the mainloop code pops new data from the FIFO.


    static struct {
      unsigned char buf[RXBUF_SIZE];
      uint8_t in;
      uint8_t out;
    } rxfifo;

    /* ISR */
    void uart_rx_isr(void) {
      unsigned char c = UART->DATA;
      rxfifo.buf[in % RXBUF_SIZE] = c;
      rxfifo.in++;
      // Reset interrupt flag
    }

    /* Called regularly from mainloop code */
    int uart_task(void) {
      int c = -1;
      if (out != in) {
        c = rxfifo.buf[out % RXBUF_SIZE];
        out++;
      }
      return -1;
    }


    From a 20-years old article[1] by Nigle Jones, this seems a situation
    where volatile must be used for rxfifo.in, because is modified by an ISR
    and used in the mainloop code.

    I don't think so, rxfifo.in is read from memory only one time in
    uart_task(), so there isn't the risk that compiler can optimize badly.
    Even if ISR is fired immediately after the if statement, this doesn't
    bring to a dangerous state: the just received data will be processed at
    the next call to uart_task().

    So IMHO volatile isn't necessary here. And critical sections (i.e.
    disabling interrupts) aren't useful too.

    However I'm thinking about memory barrier. Suppose the compiler reorder
    the instructions in uart_task() as follows:


      c = rxfifo.buf[out % RXBUF_SIZE]
      if (out != in) {
        out++;
        return c;
      } else {
        return -1;
      }


    Here there's a big problem, because compiler decided to firstly read rxfifo.buf[] and then test in and out equality. If the ISR is fired immediately after moving data to c (most probably an internal register),
    the condition in the if statement will be true and the register value is returned. However the register value isn't correct.

    I don't think any modern C compiler reorder uart_task() in this way, but
    we can't be sure. The result shouldn't change for the compiler, so it
    can do this kind of things.

    How to fix this issue if I want to be extremely sure the compiler will
    not reorder this way? Applying volatile to rxfifo.in shouldn't help for
    this, because compiler is allowed to reorder access of non volatile
    variables yet[2].

    One solution is adding a memory barrier in this way:


    int uart_task(void) {
      int c = -1;
      if (out != in) {
        memory_barrier();
        c = rxfifo.buf[out % RXBUF_SIZE];
        out++;
      }
      return -1;
    }


    However this approach appears to me dangerous. You have to check and
    double check if, when and where memory barriers are necessary and it's
    simple to skip a barrier where it's nedded and add a barrier where it
    isn't needed.

    So I'm thinking that a sub-optimal (regarding efficiency) but reliable (regarding the risk to skip a barrier where it is needed) could be to
    enter a critical section (disabling interrupts) anyway, if it isn't
    strictly needed.


    int uart_task(void) {
      ENTER_CRITICAL_SECTION();
      int c = -1;
      if (out != in) {
        c = rxfifo.buf[out % RXBUF_SIZE];
        out++;
      }
      EXIT_CRITICAL_SECTION();
      return -1;
    }


    Another solution could be to apply volatile keyword to rxfifo.in *AND* rxfifo.buf too, so compiler can't change the order of accesses them.

    Do you have other suggestions?



    [1] https://barrgroup.com/embedded-systems/how-to/c-volatile-keyword
    [2] https://blog.regehr.org/archives/28

    This is a good introduction to how Linux makes this possible for its
    horde of device-driver authors:

    <https://www.kernel.org/doc/Documentation/memory-barriers.txt>


    Clifford Heath

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Don Y@21:1/5 to pozz on Fri Oct 22 22:09:17 2021
    On 10/22/2021 3:07 PM, pozz wrote:
    static struct {
    unsigned char buf[RXBUF_SIZE];
    uint8_t in;
    uint8_t out;
    } rxfifo;

    /* ISR */
    void uart_rx_isr(void) {
    unsigned char c = UART->DATA;
    rxfifo.buf[in % RXBUF_SIZE] = c;
    rxfifo.in++;
    // Reset interrupt flag
    }

    /* Called regularly from mainloop code */
    int uart_task(void) {
    int c = -1;

    Why? And why a retval from uart_task -- if it is always "-1"?

    if (out != in) {
    c = rxfifo.buf[out % RXBUF_SIZE];
    out++;
    }
    return -1;
    }

    This is a bug(s) waiting to happen.

    How is RXBUF_SIZE defined? How does it reflect the data rate (and,
    thus, interrupt rate) as well as the maximum latency between "main
    loop" accesses? I.e., what happens when the buffer is *full* -- and,
    thus, appears EMPTY? What stops the "in" member from growing to the
    maximum size of a uint8 -- and then wrapping? How do you convey this
    to the upper level code ("Hey, we just lost a whole RXBUF_SIZE of
    characters so if the character stream doesn't make sense, that might
    be a cause...")? What if RXBUF_SIZE is relatively prime wrt uint8max?

    When writing UART handlers, I fetch the received datum along with
    the uart's flags and stuff *both* of those things in the FIFO.
    If the FIFO would be full, I, instead, modify the flags of the
    preceeding datum to reflect this fact ("Some number of characters
    have been lost AFTER this one...") and discard the current character.

    I then signal an event and let a task waiting for that specific event
    wake up and retrieve the contents of the FIFO (which may include more
    than one character, at that time as characters can arrive after the
    initial event has been signaled).

    This lets me move the line discipline out of the ISR and still keep
    the system "responsive".

    Figure out everything that you need to do before you start sorting out
    how the compiler can "shaft" you...

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to pozz on Sat Oct 23 18:09:13 2021
    On 23/10/2021 00:07, pozz wrote:
    Even I write software for embedded systems for more than 10 years,
    there's an argument that from time to time let me think for hours and
    leave me with many doubts.

    It's nice to see a thread like this here - the group needs such discussions!


    Consider a simple embedded system based on a MCU (AVR8 or Cortex-Mx).
    The software is bare metal, without any OS. The main pattern is the well known mainloop (background code) that is interrupted by ISR.

    Interrupts are used mainly for timings and for low-level driver. For
    example, the UART reception ISR move the last received char in a FIFO
    buffer, while the mainloop code pops new data from the FIFO.


    static struct {
      unsigned char buf[RXBUF_SIZE];
      uint8_t in;
      uint8_t out;
    } rxfifo;

    /* ISR */
    void uart_rx_isr(void) {
      unsigned char c = UART->DATA;
      rxfifo.buf[in % RXBUF_SIZE] = c;
      rxfifo.in++;

    Unless you are sure that RXBUF_SIZE is a power of two, this is going to
    be quite slow on an AVR. Modulo means division, and while division by a constant is usually optimised to a multiplication by the compiler, you
    still have a multiply, a shift, and some compensation for it all being
    done as signed integer arithmetic.

    It's also wrong, for non-power of two sizes, since the wrapping of your increment and your modulo RXBUF_SIZE get out of sync.

    The usual choice is to track "head" and "tail", and use something like:

    void uart_rx_isr(void) {
    unsigned char c = UART->DATA;
    // Reset interrupt flag
    uint8_t next = rxfifo.tail;
    rxfifo.buf[next] = c;
    next++;
    if (next >= RXBUF_SIZE) next -= RXBUF_SIZE;
    rxfifo.tail = next;
    }

      // Reset interrupt flag
    }

    /* Called regularly from mainloop code */
    int uart_task(void) {
      int c = -1;
      if (out != in) {
        c = rxfifo.buf[out % RXBUF_SIZE];
        out++;
      }
      return -1;
    }

    int uart_task(void) {
    int c = -1;
    uint8_t next = rxfifo.head;
    if (next != rxfifo.tail) {
    c = rxfifo.buf[next];
    next++;
    if (next >= RXBUF_SIZE) next -= RXBUF_SIZE;
    rxfifo.head = next;
    }
    return c;
    }

    These don't track buffer overflow at all - you need to call uart_task()
    often enough to avoid that.

    (I'm skipping volatiles so we don't get ahead of your point.)



    From a 20-years old article[1] by Nigle Jones, this seems a situation
    where volatile must be used for rxfifo.in, because is modified by an ISR
    and used in the mainloop code.


    Certainly whenever data is shared between ISR's and mainloop code, or
    different threads, then you need to think about how to make sure data is synchronised and exchanged. "volatile" is one method, atomics are
    another, and memory barriers can be used.

    I don't think so, rxfifo.in is read from memory only one time in
    uart_task(), so there isn't the risk that compiler can optimize badly.

    That is incorrect in two ways. One - baring compiler bugs (which do
    occur, but they are very rare compared to user bugs), there is no such
    thing as "optimising badly". If optimising changes the behaviour of the
    code, other than its size and speed, the code is wrong. Two - it is a
    very bad idea to imagine that having code inside a function somehow
    "protects" it from re-ordering or other optimisation.

    Functions can be inlined, outlined, cloned, and shuffled about.
    Link-time optimisation, code in headers, C++ modules, and other
    cross-unit optimisations are becoming more and more common. So while it
    might be true /today/ that the compiler has no alternative but to read rxfifo.in once per call to uart_task(), you cannot assume that will be
    the case with later compilers or with more advanced optimisation
    techniques enabled. It is safer, more portable, and more future-proof
    to avoid such assumptions.

    Even if ISR is fired immediately after the if statement, this doesn't
    bring to a dangerous state: the just received data will be processed at
    the next call to uart_task().

    So IMHO volatile isn't necessary here. And critical sections (i.e.
    disabling interrupts) aren't useful too.

    However I'm thinking about memory barrier. Suppose the compiler reorder
    the instructions in uart_task() as follows:


      c = rxfifo.buf[out % RXBUF_SIZE]
      if (out != in) {
        out++;
        return c;
      } else {
        return -1;
      }


    Here there's a big problem, because compiler decided to firstly read rxfifo.buf[] and then test in and out equality. If the ISR is fired immediately after moving data to c (most probably an internal register),
    the condition in the if statement will be true and the register value is returned. However the register value isn't correct.

    You are absolutely correct.


    I don't think any modern C compiler reorder uart_task() in this way, but
    we can't be sure. The result shouldn't change for the compiler, so it
    can do this kind of things.

    It is not an unreasonable re-arrangement. On processors with
    out-of-order execution (which does not apply to the AVR or Cortex-M),
    compilers will often push loads as early as they can in the instruction
    stream so that they start the cache loading process as quickly as
    possible. (But note that on such "big" processors, much of this
    discussion on volatile and memory barriers is not sufficient, especially
    if there is more than one core. You need atomics and fences, but that's
    a story for another day.)


    How to fix this issue if I want to be extremely sure the compiler will
    not reorder this way? Applying volatile to rxfifo.in shouldn't help for
    this, because compiler is allowed to reorder access of non volatile
    variables yet[2].


    The important thing about "volatile" is that it is /accesses/ that are volatile, not objects. A volatile object is nothing more than an object
    for which all accesses are volatile by default. But you can use
    volatile accesses on non-volatile objects. This macro is your friend:

    #define volatileAccess(v) *((volatile typeof((v)) *) &(v))

    (Linux has the same macro, called ACCESS_ONCE. It uses a gcc extension
    - if you are using other compilers then you can make an uglier
    equivalent using _Generic. However, if you are using a C compiler that supports C11, it is probably gcc or clang, and you can use the "typeof" extension.)

    That macro will let you make a volatile read or write to an object
    without requiring that /all/ accesses to it are volatile.


    One solution is adding a memory barrier in this way:


    int uart_task(void) {
      int c = -1;
      if (out != in) {
        memory_barrier();
        c = rxfifo.buf[out % RXBUF_SIZE];
        out++;
      }
      return -1;
    }


    Note that you are forcing the compiler to read "out" twice here, as it
    can't keep the value of "out" in a register across the memory barrier.
    (And as I mentioned before, the compiler might be able to do larger
    scale optimisation across compilation units or functions, and in that
    way keep values across multiple calls to uart_task.)


    However this approach appears to me dangerous. You have to check and
    double check if, when and where memory barriers are necessary and it's
    simple to skip a barrier where it's nedded and add a barrier where it
    isn't needed.

    Memory barriers are certainly useful, but they are a shotgun approach -
    they affect /everything/ involving reads and writes to memory. (But
    remember they don't affect ordering of calculations.)


    So I'm thinking that a sub-optimal (regarding efficiency) but reliable (regarding the risk to skip a barrier where it is needed) could be to
    enter a critical section (disabling interrupts) anyway, if it isn't
    strictly needed.


    int uart_task(void) {
      ENTER_CRITICAL_SECTION();
      int c = -1;
      if (out != in) {
        c = rxfifo.buf[out % RXBUF_SIZE];
        out++;
      }
      EXIT_CRITICAL_SECTION();
      return -1;
    }

    Critical sections for something like this are /way/ overkill. And a
    critical section with a division in the middle? Not a good idea.



    Another solution could be to apply volatile keyword to rxfifo.in *AND* rxfifo.buf too, so compiler can't change the order of accesses them.

    Do you have other suggestions?


    Marking "in" and "buf" as volatile is /far/ better than using a critical section, and likely to be more efficient than a memory barrier. You can
    also use volatileAccess rather than making buf volatile, and it is often slightly more efficient to cache volatile variables in a local variable
    while working with them.



    [1] https://barrgroup.com/embedded-systems/how-to/c-volatile-keyword
    [2] https://blog.regehr.org/archives/28

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From pozz@21:1/5 to As I on Sat Oct 23 22:12:47 2021
    Il 23/10/2021 07:09, Don Y ha scritto:
    On 10/22/2021 3:07 PM, pozz wrote:
    static struct {
       unsigned char buf[RXBUF_SIZE];
       uint8_t in;
       uint8_t out;
    } rxfifo;

    /* ISR */
    void uart_rx_isr(void) {
       unsigned char c = UART->DATA;
       rxfifo.buf[in % RXBUF_SIZE] = c;
       rxfifo.in++;
       // Reset interrupt flag
    }

    /* Called regularly from mainloop code */
    int uart_task(void) {
       int c = -1;

    Why?  And why a retval from uart_task -- if it is always "-1"?

    It was my mistake. The last instruction of uart_task() should be

    return c;

    And maybe the name of uart_task() is not so good, it should be uart_rx().


       if (out != in) {
         c = rxfifo.buf[out % RXBUF_SIZE];
         out++;
       }
       return -1;
    }

    This is a bug(s) waiting to happen.

    How is RXBUF_SIZE defined?

    Power of two.


    How does it reflect the data rate (and,
    thus, interrupt rate) as well as the maximum latency between "main
    loop" accesses?

    Rx FIFO filled by interrupt is needed to face a burst (a packet?) of
    incoming characters.

    If the baudrate is 9600bps 8n1, interrupt would be fired every
    10/9600=1ms. If maximum interval between two successive uart_task()
    calls is 10ms, it is sufficient a buffer of 10 bytes, so RXBUF_SIZE
    could be 16 or 32.


    I.e., what happens when the buffer is *full* -- and,
    thus, appears EMPTY?

    These are good questions, but I didn't want to discuss about them. Of
    course ISR is not complete, because before pushing a new byte, we must
    check if FIFO is full. For example:

    /* The difference in-out gives a correct result even after a
    * wrap-around of in only, thanks to unsigned arithmetic. */
    #define RXFIFO_IS_FULL() (rxfifo.in - rxfifo.out < RXBUF_SIZE)

    void uart_rx_isr(void) {
    unsigned char c = UART->DATA;
    if (!RXFIFO_IS_FULL()) {
    rxfifo.buf[in % RXBUF_SIZE] = c;
    rxfifo.in++;
    } else {
    // FIFO is full, ignore the char
    }
    // Reset interrupt flag
    }




    What stops the "in" member from growing to the
    maximum size of a uint8 -- and then wrapping?

    As I wrote, this should work even after a wrap-around.


    How do you convey this
    to the upper level code ("Hey, we just lost a whole RXBUF_SIZE of
    characters so if the character stream doesn't make sense, that might
    be a cause...")?

    FIFO full is event is extremely rare if I'm able to size rx FIFO
    correctly, i.e. on the worst case.
    Anyway I usually ignore incoming chars when the FIFO is full. The high
    level protocols are usually defined in such a way the absence of chars
    are detected, mostly thanks to CRC.


    What if RXBUF_SIZE is relatively prime wrt uint8max?

    When writing UART handlers, I fetch the received datum along with
    the uart's flags and stuff *both* of those things in the FIFO.
    If the FIFO would be full, I, instead, modify the flags of the
    preceeding datum to reflect this fact ("Some number of characters
    have been lost AFTER this one...") and discard the current character.

    I then signal an event and let a task waiting for that specific event
    wake up and retrieve the contents of the FIFO (which may include more
    than one character, at that time as characters can arrive after the
    initial event has been signaled).

    Signal an event? Task waiting for a specific event? Maybe you are
    thinking of a full RTOS. I was thinking of bare metal systems.


    This lets me move the line discipline out of the ISR and still keep
    the system "responsive".

    Figure out everything that you need to do before you start sorting out
    how the compiler can "shaft" you...

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From pozz@21:1/5 to All on Sat Oct 23 22:49:37 2021
    Il 23/10/2021 18:09, David Brown ha scritto:
    On 23/10/2021 00:07, pozz wrote:
    Even I write software for embedded systems for more than 10 years,
    there's an argument that from time to time let me think for hours and
    leave me with many doubts.

    It's nice to see a thread like this here - the group needs such discussions!


    Consider a simple embedded system based on a MCU (AVR8 or Cortex-Mx).
    The software is bare metal, without any OS. The main pattern is the well
    known mainloop (background code) that is interrupted by ISR.

    Interrupts are used mainly for timings and for low-level driver. For
    example, the UART reception ISR move the last received char in a FIFO
    buffer, while the mainloop code pops new data from the FIFO.


    static struct {
      unsigned char buf[RXBUF_SIZE];
      uint8_t in;
      uint8_t out;
    } rxfifo;

    /* ISR */
    void uart_rx_isr(void) {
      unsigned char c = UART->DATA;
      rxfifo.buf[in % RXBUF_SIZE] = c;
      rxfifo.in++;

    Unless you are sure that RXBUF_SIZE is a power of two, this is going to
    be quite slow on an AVR. Modulo means division, and while division by a constant is usually optimised to a multiplication by the compiler, you
    still have a multiply, a shift, and some compensation for it all being
    done as signed integer arithmetic.

    It's also wrong, for non-power of two sizes, since the wrapping of your increment and your modulo RXBUF_SIZE get out of sync.

    Yes, RXBUF_SIZE is a power of two.



    The usual choice is to track "head" and "tail", and use something like:

    void uart_rx_isr(void) {
    unsigned char c = UART->DATA;
    // Reset interrupt flag
    uint8_t next = rxfifo.tail;
    rxfifo.buf[next] = c;
    next++;
    if (next >= RXBUF_SIZE) next -= RXBUF_SIZE;
    rxfifo.tail = next;
    }

    This isn't the point of this thread, anyway...
    You insist that tail is always in the range [0...RXBUF_SIZE - 1]. My
    approach is different.

    RXBUF_SIZE is a power of two, usualy <=256. head and tail are uint8_t
    and *can* reach the maximum value of 255, even RXBUF_SIZE is 128. All
    works well.

    Suppose rxfifo.in=rxfifo.out=127, FIFO is empty. When a new char is
    received, it is saved into rxfifo.buf[127 % 128=127] and rxfifo.in will
    be increased to 128.
    Now mainloop detect the new char (in != out), reads the new char at rxfifo.buf[127 % 128=127] and increase out that will be 128.

    The next byte will be saved into rxfifo.rxbuf[rxfifo.in % 128=128 % 128
    = 0] and rxfifo.in will be 129. Again, the next byte will be saved to rxbuf[rxfifo.in % 128=129 % 128=1] and rxfifo.in will be 130.

    When the mainloop tries to pop data from fifo, it tests

    rxfifo.in(130) !=rxfifo.out(128)

    The test is true, so the code extracts chars from rxbuf[out % 128] that
    is rxbuf[0]... and so on.

    I hope that explanation is good.



      // Reset interrupt flag
    }

    /* Called regularly from mainloop code */
    int uart_task(void) {
      int c = -1;
      if (out != in) {
        c = rxfifo.buf[out % RXBUF_SIZE];
        out++;
      }
      return -1;
    }

    int uart_task(void) {
    int c = -1;
    uint8_t next = rxfifo.head;
    if (next != rxfifo.tail) {
    c = rxfifo.buf[next];
    next++;
    if (next >= RXBUF_SIZE) next -= RXBUF_SIZE;
    rxfifo.head = next;
    }
    return c;
    }

    These don't track buffer overflow at all - you need to call uart_task()
    often enough to avoid that.

    Sure, with a good number for RXBUF_SIZE, buffer overflow shouldn't
    happen ever. Anyway, if it happens, the higher level layers (protocol)
    should detect a corrupted packet.


    (I'm skipping volatiles so we don't get ahead of your point.)



    From a 20-years old article[1] by Nigle Jones, this seems a situation
    where volatile must be used for rxfifo.in, because is modified by an ISR
    and used in the mainloop code.


    Certainly whenever data is shared between ISR's and mainloop code, or different threads, then you need to think about how to make sure data is synchronised and exchanged. "volatile" is one method, atomics are
    another, and memory barriers can be used.

    I don't think so, rxfifo.in is read from memory only one time in
    uart_task(), so there isn't the risk that compiler can optimize badly.

    That is incorrect in two ways. One - baring compiler bugs (which do
    occur, but they are very rare compared to user bugs), there is no such
    thing as "optimising badly". If optimising changes the behaviour of the code, other than its size and speed, the code is wrong.

    Yes of course, but I don't think the absence of volatile for rxfifo.in,
    even if it can change in ISR, could be a *real* problem with *modern and current* compilers.

    voltile attribute needs to avoid compiler optimization (that would be a
    bad thing, because of volatile nature of the variabile), but on that
    code it's difficult to think of an optimization, caused by the absence
    of volatile, that changes the behaviour erroneously... except reorering.


    Two - it is a
    very bad idea to imagine that having code inside a function somehow "protects" it from re-ordering or other optimisation.

    I didn't say this, at the contrary I was thinking exactly to reordering
    issues.


    Functions can be inlined, outlined, cloned, and shuffled about.
    Link-time optimisation, code in headers, C++ modules, and other
    cross-unit optimisations are becoming more and more common. So while it might be true /today/ that the compiler has no alternative but to read rxfifo.in once per call to uart_task(), you cannot assume that will be
    the case with later compilers or with more advanced optimisation
    techniques enabled. It is safer, more portable, and more future-proof
    to avoid such assumptions.

    Ok, you are talking of future scenarios. I don't think actually this
    could be a real problem. Anyway your observation makes sense.



    Even if ISR is fired immediately after the if statement, this doesn't
    bring to a dangerous state: the just received data will be processed at
    the next call to uart_task().

    So IMHO volatile isn't necessary here. And critical sections (i.e.
    disabling interrupts) aren't useful too.

    However I'm thinking about memory barrier. Suppose the compiler reorder
    the instructions in uart_task() as follows:


      c = rxfifo.buf[out % RXBUF_SIZE]
      if (out != in) {
        out++;
        return c;
      } else {
        return -1;
      }


    Here there's a big problem, because compiler decided to firstly read
    rxfifo.buf[] and then test in and out equality. If the ISR is fired
    immediately after moving data to c (most probably an internal register),
    the condition in the if statement will be true and the register value is
    returned. However the register value isn't correct.

    You are absolutely correct.


    I don't think any modern C compiler reorder uart_task() in this way, but
    we can't be sure. The result shouldn't change for the compiler, so it
    can do this kind of things.

    It is not an unreasonable re-arrangement. On processors with
    out-of-order execution (which does not apply to the AVR or Cortex-M), compilers will often push loads as early as they can in the instruction stream so that they start the cache loading process as quickly as
    possible. (But note that on such "big" processors, much of this
    discussion on volatile and memory barriers is not sufficient, especially
    if there is more than one core. You need atomics and fences, but that's
    a story for another day.)


    How to fix this issue if I want to be extremely sure the compiler will
    not reorder this way? Applying volatile to rxfifo.in shouldn't help for
    this, because compiler is allowed to reorder access of non volatile
    variables yet[2].


    The important thing about "volatile" is that it is /accesses/ that are volatile, not objects. A volatile object is nothing more than an object
    for which all accesses are volatile by default. But you can use
    volatile accesses on non-volatile objects. This macro is your friend:

    #define volatileAccess(v) *((volatile typeof((v)) *) &(v))

    (Linux has the same macro, called ACCESS_ONCE. It uses a gcc extension
    - if you are using other compilers then you can make an uglier
    equivalent using _Generic. However, if you are using a C compiler that supports C11, it is probably gcc or clang, and you can use the "typeof" extension.)

    That macro will let you make a volatile read or write to an object
    without requiring that /all/ accesses to it are volatile.

    This is a good point. The code in ISR can't be interrupted, so there's
    no need to have volatile access in ISR.


    One solution is adding a memory barrier in this way:


    int uart_task(void) {
      int c = -1;
      if (out != in) {
        memory_barrier();
        c = rxfifo.buf[out % RXBUF_SIZE];
        out++;
      }
      return -1;
    }


    Note that you are forcing the compiler to read "out" twice here, as it
    can't keep the value of "out" in a register across the memory barrier.

    Yes, you're right. A small penalty to avoid the problem of reordering.


    (And as I mentioned before, the compiler might be able to do larger
    scale optimisation across compilation units or functions, and in that
    way keep values across multiple calls to uart_task.)


    However this approach appears to me dangerous. You have to check and
    double check if, when and where memory barriers are necessary and it's
    simple to skip a barrier where it's nedded and add a barrier where it
    isn't needed.

    Memory barriers are certainly useful, but they are a shotgun approach -
    they affect /everything/ involving reads and writes to memory. (But
    remember they don't affect ordering of calculations.)


    So I'm thinking that a sub-optimal (regarding efficiency) but reliable
    (regarding the risk to skip a barrier where it is needed) could be to
    enter a critical section (disabling interrupts) anyway, if it isn't
    strictly needed.


    int uart_task(void) {
      ENTER_CRITICAL_SECTION();
      int c = -1;
      if (out != in) {
        c = rxfifo.buf[out % RXBUF_SIZE];
        out++;
      }
      EXIT_CRITICAL_SECTION();
      return -1;
    }

    Critical sections for something like this are /way/ overkill. And a
    critical section with a division in the middle? Not a good idea.



    Another solution could be to apply volatile keyword to rxfifo.in *AND*
    rxfifo.buf too, so compiler can't change the order of accesses them.

    Do you have other suggestions?


    Marking "in" and "buf" as volatile is /far/ better than using a critical section, and likely to be more efficient than a memory barrier. You can
    also use volatileAccess rather than making buf volatile, and it is often slightly more efficient to cache volatile variables in a local variable
    while working with them.

    Yes, I think so too. Lastly I read many experts say volatile is often a
    bad thing, so I'm re-thinking about its use compared with other approaches.


    [1] https://barrgroup.com/embedded-systems/how-to/c-volatile-keyword
    [2] https://blog.regehr.org/archives/28


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Don Y@21:1/5 to pozz on Sat Oct 23 15:59:57 2021
    On 10/23/2021 1:12 PM, pozz wrote:
    This is a bug(s) waiting to happen.

    How is RXBUF_SIZE defined?

    Power of two.

    The point was its relationship to the actual code.

    How does it reflect the data rate (and,
    thus, interrupt rate) as well as the maximum latency between "main
    loop" accesses?

    Rx FIFO filled by interrupt is needed to face a burst (a packet?) of incoming characters.

    If the baudrate is 9600bps 8n1, interrupt would be fired every 10/9600=1ms. If
    maximum interval between two successive uart_task() calls is 10ms, it is sufficient a buffer of 10 bytes, so RXBUF_SIZE could be 16 or 32.

    What GUARANTEES this in your system? Folks often see things that "can't happen" -- yet DID (THEY SAW IT!). Your code/design should ensure that
    "can't happen" REALLY /can't happen/. It costs very little to explain (commentary) WHY you don't have to check for X, Y or Z in your code.

    [If the user's actions (or any outside agency) can affect operation,
    then how can you guarantee that THEY "behave"?]

    And, give that a very high degree of visibility so that when someone
    decides they can increase the baudrate or add some sluggish task
    to your "main loop" that this ASSUMPTION isn't silently violated.

    I.e., what happens when the buffer is *full* -- and,
    thus, appears EMPTY?

    These are good questions, but I didn't want to discuss about them. Of course ISR is not complete, because before pushing a new byte, we must check if FIFO is full. For example:

    My point is that you should fleshout your code before you start
    thinking about what can go wrong.

    E.g., if the ISR is the *only* entity to modify ".in" and always does so
    in with interrupts off, then it can do so without worrying about conflict
    with something else -- if those other things always ensure they read it atomically (if they read it just before or just after it has been modified
    by the ISR, the value will still "work" -- they just may not realize, yet,
    that there is an extra character in the buffer that they haven't yet seen).

    Likewise, if the "task" is the only entity modifying ".out", then ensuring
    that those modifications are atomic means the ISR can safely use any *single* reference to it.

    How do you convey this
    to the upper level code ("Hey, we just lost a whole RXBUF_SIZE of
    characters so if the character stream doesn't make sense, that might
    be a cause...")?

    FIFO full is event is extremely rare if I'm able to size rx FIFO correctly, i.e. on the worst case.

    "Rare" and "impossible" are too entirely different scenarios.
    It is extremely rare for a specific individual to win the lottery.
    But, any individual *can* win it!

    Anyway I usually ignore incoming chars when the FIFO is full. The high level protocols are usually defined in such a way the absence of chars are detected,
    mostly thanks to CRC.

    What if the CRC characters disappear? Are you sure the front of one message can't appear to match the ass end of another?

    "Pozz is here."
    "Don is not here."

    "Pozz is not here."

    And that there is no value in knowing that one or more messages may have been dropped?

    What if RXBUF_SIZE is relatively prime wrt uint8max?

    When writing UART handlers, I fetch the received datum along with
    the uart's flags and stuff *both* of those things in the FIFO.
    If the FIFO would be full, I, instead, modify the flags of the
    preceeding datum to reflect this fact ("Some number of characters
    have been lost AFTER this one...") and discard the current character.

    I then signal an event and let a task waiting for that specific event
    wake up and retrieve the contents of the FIFO (which may include more
    than one character, at that time as characters can arrive after the
    initial event has been signaled).

    Signal an event? Task waiting for a specific event? Maybe you are thinking of a
    full RTOS. I was thinking of bare metal systems.

    You can implement as much or as little of an OS as you choose;
    you're not stuck with "all or nothing".

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Johann Klammer@21:1/5 to pozz on Sun Oct 24 12:39:02 2021
    On 10/23/2021 12:07 AM, pozz wrote:
    Even I write software for embedded systems for more than 10 years, there's an argument that from time to time let me think for hours and leave me with many doubts.

    Consider a simple embedded system based on a MCU (AVR8 or Cortex-Mx). The software is bare metal, without any OS. The main pattern is the well known mainloop (background code) that is interrupted by ISR.

    Interrupts are used mainly for timings and for low-level driver. For example, the UART reception ISR move the last received char in a FIFO buffer, while the mainloop code pops new data from the FIFO.


    static struct {
    unsigned char buf[RXBUF_SIZE];
    uint8_t in;
    uint8_t out;
    } rxfifo;

    /* ISR */
    void uart_rx_isr(void) {
    unsigned char c = UART->DATA;
    rxfifo.buf[in % RXBUF_SIZE] = c;
    rxfifo.in++;
    // Reset interrupt flag
    }

    /* Called regularly from mainloop code */
    int uart_task(void) {
    int c = -1;
    if (out != in) {
    c = rxfifo.buf[out % RXBUF_SIZE];
    out++;
    }
    return -1;
    }


    From a 20-years old article[1] by Nigle Jones, this seems a situation where volatile must be used for rxfifo.in, because is modified by an ISR and used in the mainloop code.

    I don't think so, rxfifo.in is read from memory only one time in uart_task(), so there isn't the risk that compiler can optimize badly. Even if ISR is fired immediately after the if statement, this doesn't bring to a dangerous state: the just received
    data will be processed at the next call to uart_task().

    So IMHO volatile isn't necessary here. And critical sections (i.e. disabling interrupts) aren't useful too.

    However I'm thinking about memory barrier. Suppose the compiler reorder the instructions in uart_task() as follows:


    c = rxfifo.buf[out % RXBUF_SIZE]
    if (out != in) {
    out++;
    return c;
    } else {
    return -1;
    }


    Here there's a big problem, because compiler decided to firstly read rxfifo.buf[] and then test in and out equality. If the ISR is fired immediately after moving data to c (most probably an internal register), the condition in the if statement will be
    true and the register value is returned. However the register value isn't correct.

    I don't think any modern C compiler reorder uart_task() in this way, but we can't be sure. The result shouldn't change for the compiler, so it can do this kind of things.

    How to fix this issue if I want to be extremely sure the compiler will not reorder this way? Applying volatile to rxfifo.in shouldn't help for this, because compiler is allowed to reorder access of non volatile variables yet[2].

    One solution is adding a memory barrier in this way:


    int uart_task(void) {
    int c = -1;
    if (out != in) {
    memory_barrier();
    c = rxfifo.buf[out % RXBUF_SIZE];
    out++;
    }
    return -1;
    }


    However this approach appears to me dangerous. You have to check and double check if, when and where memory barriers are necessary and it's simple to skip a barrier where it's nedded and add a barrier where it isn't needed.

    So I'm thinking that a sub-optimal (regarding efficiency) but reliable (regarding the risk to skip a barrier where it is needed) could be to enter a critical section (disabling interrupts) anyway, if it isn't strictly needed.


    int uart_task(void) {
    ENTER_CRITICAL_SECTION();
    int c = -1;
    if (out != in) {
    c = rxfifo.buf[out % RXBUF_SIZE];
    out++;
    }
    EXIT_CRITICAL_SECTION();
    return -1;
    }


    Another solution could be to apply volatile keyword to rxfifo.in *AND* rxfifo.buf too, so compiler can't change the order of accesses them.

    Do you have other suggestions?



    [1] https://barrgroup.com/embedded-systems/how-to/c-volatile-keyword
    [2] https://blog.regehr.org/archives/28
    Disable interrupts while accessing the fifo. you really have to.
    alternatively you'll often get away not using a fifo at all,
    unless you're blocking for a long while in some part of the code.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dimiter_Popoff@21:1/5 to Johann Klammer on Sun Oct 24 14:14:11 2021
    On 10/24/2021 13:39, Johann Klammer wrote:
    On 10/23/2021 12:07 AM, pozz wrote:
    Even I write software for embedded systems for more than 10 years, there's an argument that from time to time let me think for hours and leave me with many doubts.

    Consider a simple embedded system based on a MCU (AVR8 or Cortex-Mx). The software is bare metal, without any OS. The main pattern is the well known mainloop (background code) that is interrupted by ISR.

    Interrupts are used mainly for timings and for low-level driver. For example, the UART reception ISR move the last received char in a FIFO buffer, while the mainloop code pops new data from the FIFO.


    static struct {
    unsigned char buf[RXBUF_SIZE];
    uint8_t in;
    uint8_t out;
    } rxfifo;

    /* ISR */
    void uart_rx_isr(void) {
    unsigned char c = UART->DATA;
    rxfifo.buf[in % RXBUF_SIZE] = c;
    rxfifo.in++;
    // Reset interrupt flag
    }

    /* Called regularly from mainloop code */
    int uart_task(void) {
    int c = -1;
    if (out != in) {
    c = rxfifo.buf[out % RXBUF_SIZE];
    out++;
    }
    return -1;
    }


    From a 20-years old article[1] by Nigle Jones, this seems a situation where volatile must be used for rxfifo.in, because is modified by an ISR and used in the mainloop code.

    I don't think so, rxfifo.in is read from memory only one time in uart_task(), so there isn't the risk that compiler can optimize badly. Even if ISR is fired immediately after the if statement, this doesn't bring to a dangerous state: the just received
    data will be processed at the next call to uart_task().

    So IMHO volatile isn't necessary here. And critical sections (i.e. disabling interrupts) aren't useful too.

    However I'm thinking about memory barrier. Suppose the compiler reorder the instructions in uart_task() as follows:


    c = rxfifo.buf[out % RXBUF_SIZE]
    if (out != in) {
    out++;
    return c;
    } else {
    return -1;
    }


    Here there's a big problem, because compiler decided to firstly read rxfifo.buf[] and then test in and out equality. If the ISR is fired immediately after moving data to c (most probably an internal register), the condition in the if statement will be
    true and the register value is returned. However the register value isn't correct.

    I don't think any modern C compiler reorder uart_task() in this way, but we can't be sure. The result shouldn't change for the compiler, so it can do this kind of things.

    How to fix this issue if I want to be extremely sure the compiler will not reorder this way? Applying volatile to rxfifo.in shouldn't help for this, because compiler is allowed to reorder access of non volatile variables yet[2].

    One solution is adding a memory barrier in this way:


    int uart_task(void) {
    int c = -1;
    if (out != in) {
    memory_barrier();
    c = rxfifo.buf[out % RXBUF_SIZE];
    out++;
    }
    return -1;
    }


    However this approach appears to me dangerous. You have to check and double check if, when and where memory barriers are necessary and it's simple to skip a barrier where it's nedded and add a barrier where it isn't needed.

    So I'm thinking that a sub-optimal (regarding efficiency) but reliable (regarding the risk to skip a barrier where it is needed) could be to enter a critical section (disabling interrupts) anyway, if it isn't strictly needed.


    int uart_task(void) {
    ENTER_CRITICAL_SECTION();
    int c = -1;
    if (out != in) {
    c = rxfifo.buf[out % RXBUF_SIZE];
    out++;
    }
    EXIT_CRITICAL_SECTION();
    return -1;
    }


    Another solution could be to apply volatile keyword to rxfifo.in *AND* rxfifo.buf too, so compiler can't change the order of accesses them.

    Do you have other suggestions?



    [1] https://barrgroup.com/embedded-systems/how-to/c-volatile-keyword
    [2] https://blog.regehr.org/archives/28
    Disable interrupts while accessing the fifo. you really have to. alternatively you'll often get away not using a fifo at all,
    unless you're blocking for a long while in some part of the code.


    Why would you do that. The fifo write pointer is only modified by
    the interrupt handler, the read pointer is only modified by the
    interrupted code. Has been done so for times immemorial.

    Although this thread is on how to wrestle a poor
    language to do what you want, sort of how to use a hammer on a screw
    instead of taking the screwdriver, there would be no need to
    mask interrupts with C either.

    ======================================================
    Dimiter Popoff, TGI http://www.tgi-sci.com ====================================================== http://www.flickr.com/photos/didi_tgi/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to pozz on Sun Oct 24 13:02:32 2021
    On 23/10/2021 22:49, pozz wrote:
    Il 23/10/2021 18:09, David Brown ha scritto:
    On 23/10/2021 00:07, pozz wrote:
    Even I write software for embedded systems for more than 10 years,
    there's an argument that from time to time let me think for hours and
    leave me with many doubts.

    It's nice to see a thread like this here - the group needs such
    discussions!


    Consider a simple embedded system based on a MCU (AVR8 or Cortex-Mx).
    The software is bare metal, without any OS. The main pattern is the well >>> known mainloop (background code) that is interrupted by ISR.

    Interrupts are used mainly for timings and for low-level driver. For
    example, the UART reception ISR move the last received char in a FIFO
    buffer, while the mainloop code pops new data from the FIFO.


    static struct {
       unsigned char buf[RXBUF_SIZE];
       uint8_t in;
       uint8_t out;
    } rxfifo;

    /* ISR */
    void uart_rx_isr(void) {
       unsigned char c = UART->DATA;
       rxfifo.buf[in % RXBUF_SIZE] = c;
       rxfifo.in++;

    Unless you are sure that RXBUF_SIZE is a power of two, this is going to
    be quite slow on an AVR.  Modulo means division, and while division by a
    constant is usually optimised to a multiplication by the compiler, you
    still have a multiply, a shift, and some compensation for it all being
    done as signed integer arithmetic.

    It's also wrong, for non-power of two sizes, since the wrapping of your
    increment and your modulo RXBUF_SIZE get out of sync.

    Yes, RXBUF_SIZE is a power of two.


    If your code relies on that, make sure the code will fail to compile if
    it is not the case. Documentation is good, compile-time check is better:

    static_assert((RXBUF_SIZE & (RXBUF_SIZE - 1)) == 0, "Needs power of 2");




    The usual choice is to track "head" and "tail", and use something like:

    void uart_rx_isr(void) {
       unsigned char c = UART->DATA;
       // Reset interrupt flag
       uint8_t next = rxfifo.tail;
       rxfifo.buf[next] = c;
       next++;
       if (next >= RXBUF_SIZE) next -= RXBUF_SIZE;
       rxfifo.tail = next;
    }

    This isn't the point of this thread, anyway...
    You insist that tail is always in the range [0...RXBUF_SIZE - 1]. My
    approach is different.

    RXBUF_SIZE is a power of two, usualy <=256. head and tail are uint8_t
    and *can* reach the maximum value of 255, even RXBUF_SIZE is 128. All
    works well.


    Yes, your approach will work - /if/ you have a power-of-two buffer size.
    It has no noticeable efficiency advantages, merely an extra
    inconvenient restriction and the possible confusion caused by doing
    things in a different way from common idioms.

    However, this is not the point of the thread - so I am happy to leave
    that for now.

    Suppose rxfifo.in=rxfifo.out=127, FIFO is empty. When a new char is
    received, it is saved into rxfifo.buf[127 % 128=127] and rxfifo.in will
    be increased to 128.
    Now mainloop detect the new char (in != out), reads the new char at rxfifo.buf[127 % 128=127] and increase out that will be 128.

    The next byte will be saved into rxfifo.rxbuf[rxfifo.in % 128=128 % 128
    = 0] and rxfifo.in will be 129. Again, the next byte will be saved to rxbuf[rxfifo.in % 128=129 % 128=1] and rxfifo.in will be 130.

    When the mainloop tries to pop data from fifo, it tests

       rxfifo.in(130) !=rxfifo.out(128)

    The test is true, so the code extracts chars from rxbuf[out % 128] that
    is rxbuf[0]... and so on.

    I hope that explanation is good.



       // Reset interrupt flag
    }

    /* Called regularly from mainloop code */
    int uart_task(void) {
       int c = -1;
       if (out != in) {
         c = rxfifo.buf[out % RXBUF_SIZE];
         out++;
       }
       return -1;
    }

    int uart_task(void) {
       int c = -1;
       uint8_t next = rxfifo.head;
       if (next != rxfifo.tail) {
           c = rxfifo.buf[next];
           next++;
           if (next >= RXBUF_SIZE) next -= RXBUF_SIZE;
           rxfifo.head = next;
       }
       return c;
    }

    These don't track buffer overflow at all - you need to call uart_task()
    often enough to avoid that.

    Sure, with a good number for RXBUF_SIZE, buffer overflow shouldn't
    happen ever. Anyway, if it happens, the higher level layers (protocol)
    should detect a corrupted packet.


    You risk getting seriously out of sync if there is an overflow.
    Normally, on an overflow there will be a dropped character or two (which
    as you say, must be caught at a higher level). Here you could end up
    going round your buffer an extra time and /gaining/ RXBUF_SIZE extra characters.

    Still, if you are sure that your functions are called fast enough so
    that overflow is not a concern, then that's fine. Extra code to check
    for a situation that can't occur is not helpful.


    (I'm skipping volatiles so we don't get ahead of your point.)



     From a 20-years old article[1] by Nigle Jones, this seems a situation
    where volatile must be used for rxfifo.in, because is modified by an ISR >>> and used in the mainloop code.


    Certainly whenever data is shared between ISR's and mainloop code, or
    different threads, then you need to think about how to make sure data is
    synchronised and exchanged.  "volatile" is one method, atomics are
    another, and memory barriers can be used.

    I don't think so, rxfifo.in is read from memory only one time in
    uart_task(), so there isn't the risk that compiler can optimize badly.

    That is incorrect in two ways.  One - baring compiler bugs (which do
    occur, but they are very rare compared to user bugs), there is no such
    thing as "optimising badly".  If optimising changes the behaviour of the
    code, other than its size and speed, the code is wrong. 

    Yes of course, but I don't think the absence of volatile for rxfifo.in,
    even if it can change in ISR, could be a *real* problem with *modern and current* compilers.


    Personally, I am not satisfied with "it's unlikely to be a problem in
    practice" - I prefer "The language guarantees it is not a problem".
    Remember, when you know the data needs to be read at this point, then
    using a volatile read is free. Volatile does not make code less
    efficient unless you use it incorrectly and force more accesses than are necessary. So using volatile accesses for "rxfifo.in" here turns
    "probably safe" into "certainly safe" without cost. What's not to like?

    voltile attribute needs to avoid compiler optimization (that would be a
    bad thing, because of volatile nature of the variabile), but on that
    code it's difficult to think of an optimization, caused by the absence
    of volatile, that changes the behaviour erroneously... except reorering.


    Two - it is a
    very bad idea to imagine that having code inside a function somehow
    "protects" it from re-ordering or other optimisation.

    I didn't say this, at the contrary I was thinking exactly to reordering issues.


    Functions can be inlined, outlined, cloned, and shuffled about.
    Link-time optimisation, code in headers, C++ modules, and other
    cross-unit optimisations are becoming more and more common.  So while it
    might be true /today/ that the compiler has no alternative but to read
    rxfifo.in once per call to uart_task(), you cannot assume that will be
    the case with later compilers or with more advanced optimisation
    techniques enabled.  It is safer, more portable, and more future-proof
    to avoid such assumptions.

    Ok, you are talking of future scenarios. I don't think actually this
    could be a real problem. Anyway your observation makes sense.



    Even if ISR is fired immediately after the if statement, this doesn't
    bring to a dangerous state: the just received data will be processed at
    the next call to uart_task().

    So IMHO volatile isn't necessary here. And critical sections (i.e.
    disabling interrupts) aren't useful too.

    However I'm thinking about memory barrier. Suppose the compiler reorder
    the instructions in uart_task() as follows:


       c = rxfifo.buf[out % RXBUF_SIZE]
       if (out != in) {
         out++;
         return c;
       } else {
         return -1;
       }


    Here there's a big problem, because compiler decided to firstly read
    rxfifo.buf[] and then test in and out equality. If the ISR is fired
    immediately after moving data to c (most probably an internal register), >>> the condition in the if statement will be true and the register value is >>> returned. However the register value isn't correct.

    You are absolutely correct.


    I don't think any modern C compiler reorder uart_task() in this way, but >>> we can't be sure. The result shouldn't change for the compiler, so it
    can do this kind of things.

    It is not an unreasonable re-arrangement.  On processors with
    out-of-order execution (which does not apply to the AVR or Cortex-M),
    compilers will often push loads as early as they can in the instruction
    stream so that they start the cache loading process as quickly as
    possible.  (But note that on such "big" processors, much of this
    discussion on volatile and memory barriers is not sufficient, especially
    if there is more than one core.  You need atomics and fences, but that's
    a story for another day.)


    How to fix this issue if I want to be extremely sure the compiler will
    not reorder this way? Applying volatile to rxfifo.in shouldn't help for
    this, because compiler is allowed to reorder access of non volatile
    variables yet[2].


    The important thing about "volatile" is that it is /accesses/ that are
    volatile, not objects.  A volatile object is nothing more than an object
    for which all accesses are volatile by default.  But you can use
    volatile accesses on non-volatile objects.  This macro is your friend:

    #define volatileAccess(v) *((volatile typeof((v)) *) &(v))

    (Linux has the same macro, called ACCESS_ONCE.  It uses a gcc extension
    - if you are using other compilers then you can make an uglier
    equivalent using _Generic.  However, if you are using a C compiler that
    supports C11, it is probably gcc or clang, and you can use the "typeof"
    extension.)

    That macro will let you make a volatile read or write to an object
    without requiring that /all/ accesses to it are volatile.

    This is a good point. The code in ISR can't be interrupted, so there's
    no need to have volatile access in ISR.


    Correct. (Well, /almost/ correct - bigger microcontrollers have
    multiple interrupt priorities. But it should be correct in this case,
    as no other interrupt would be messing with the same variables anyway.)


    One solution is adding a memory barrier in this way:


    int uart_task(void) {
       int c = -1;
       if (out != in) {
         memory_barrier();
         c = rxfifo.buf[out % RXBUF_SIZE];
         out++;
       }
       return -1;
    }


    Note that you are forcing the compiler to read "out" twice here, as it
    can't keep the value of "out" in a register across the memory barrier.

    Yes, you're right. A small penalty to avoid the problem of reordering.


    But an unnecessary penalty.


    (And as I mentioned before, the compiler might be able to do larger
    scale optimisation across compilation units or functions, and in that
    way keep values across multiple calls to uart_task.)


    However this approach appears to me dangerous. You have to check and
    double check if, when and where memory barriers are necessary and it's
    simple to skip a barrier where it's nedded and add a barrier where it
    isn't needed.

    Memory barriers are certainly useful, but they are a shotgun approach -
    they affect /everything/ involving reads and writes to memory.  (But
    remember they don't affect ordering of calculations.)


    So I'm thinking that a sub-optimal (regarding efficiency) but reliable
    (regarding the risk to skip a barrier where it is needed) could be to
    enter a critical section (disabling interrupts) anyway, if it isn't
    strictly needed.


    int uart_task(void) {
       ENTER_CRITICAL_SECTION();
       int c = -1;
       if (out != in) {
         c = rxfifo.buf[out % RXBUF_SIZE];
         out++;
       }
       EXIT_CRITICAL_SECTION();
       return -1;
    }

    Critical sections for something like this are /way/ overkill.  And a
    critical section with a division in the middle?  Not a good idea.



    Another solution could be to apply volatile keyword to rxfifo.in *AND*
    rxfifo.buf too, so compiler can't change the order of accesses them.

    Do you have other suggestions?


    Marking "in" and "buf" as volatile is /far/ better than using a critical
    section, and likely to be more efficient than a memory barrier.  You can
    also use volatileAccess rather than making buf volatile, and it is often
    slightly more efficient to cache volatile variables in a local variable
    while working with them.

    Yes, I think so too. Lastly I read many experts say volatile is often a
    bad thing, so I'm re-thinking about its use compared with other approaches.


    People who say "volatile is a bad thing" are often wrong. Remember, all generalisations are false :-)

    "volatile" is a tool. It doesn't do everything that some people think
    it does, but it is a very useful tool nonetheless. It has little place
    in big systems - Linus Torvalds wrote a rant against it as being both
    too much and too little, and in the context of writing Linux code, he
    was correct. For Linux programming, you should be using OS-specific
    features (which rely on "volatile" for their implementation) or atomics,
    rather than using "volatile" directly.

    But for small-systems embedded programming, it is very handy. Used
    well, it is free - used excessively it has a cost, but an extra volatile
    will not make an otherwise correct program fail.

    Memory barriers are great for utility functions such as interrupt enable/disable inline functions, but are usually sub-optimal compared to specific and targeted volatile accesses.


    [1] https://barrgroup.com/embedded-systems/how-to/c-volatile-keyword
    [2] https://blog.regehr.org/archives/28



    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From pozz@21:1/5 to All on Sun Oct 24 17:39:07 2021
    Il 24/10/2021 13:02, David Brown ha scritto:
    On 23/10/2021 22:49, pozz wrote:
    Il 23/10/2021 18:09, David Brown ha scritto:
    On 23/10/2021 00:07, pozz wrote:
    Even I write software for embedded systems for more than 10 years,
    there's an argument that from time to time let me think for hours and
    leave me with many doubts.

    It's nice to see a thread like this here - the group needs such
    discussions!


    Consider a simple embedded system based on a MCU (AVR8 or Cortex-Mx).
    The software is bare metal, without any OS. The main pattern is the well >>>> known mainloop (background code) that is interrupted by ISR.

    Interrupts are used mainly for timings and for low-level driver. For
    example, the UART reception ISR move the last received char in a FIFO
    buffer, while the mainloop code pops new data from the FIFO.


    static struct {
       unsigned char buf[RXBUF_SIZE];
       uint8_t in;
       uint8_t out;
    } rxfifo;

    /* ISR */
    void uart_rx_isr(void) {
       unsigned char c = UART->DATA;
       rxfifo.buf[in % RXBUF_SIZE] = c;
       rxfifo.in++;

    Unless you are sure that RXBUF_SIZE is a power of two, this is going to
    be quite slow on an AVR.  Modulo means division, and while division by a >>> constant is usually optimised to a multiplication by the compiler, you
    still have a multiply, a shift, and some compensation for it all being
    done as signed integer arithmetic.

    It's also wrong, for non-power of two sizes, since the wrapping of your
    increment and your modulo RXBUF_SIZE get out of sync.

    Yes, RXBUF_SIZE is a power of two.


    If your code relies on that, make sure the code will fail to compile if
    it is not the case. Documentation is good, compile-time check is better:

    static_assert((RXBUF_SIZE & (RXBUF_SIZE - 1)) == 0, "Needs power of 2");

    Good point.


    The usual choice is to track "head" and "tail", and use something like:

    void uart_rx_isr(void) {
       unsigned char c = UART->DATA;
       // Reset interrupt flag
       uint8_t next = rxfifo.tail;
       rxfifo.buf[next] = c;
       next++;
       if (next >= RXBUF_SIZE) next -= RXBUF_SIZE;
       rxfifo.tail = next;
    }

    This isn't the point of this thread, anyway...
    You insist that tail is always in the range [0...RXBUF_SIZE - 1]. My
    approach is different.

    RXBUF_SIZE is a power of two, usualy <=256. head and tail are uint8_t
    and *can* reach the maximum value of 255, even RXBUF_SIZE is 128. All
    works well.


    Yes, your approach will work - /if/ you have a power-of-two buffer size.
    It has no noticeable efficiency advantages, merely an extra
    inconvenient restriction and the possible confusion caused by doing
    things in a different way from common idioms.

    However, this is not the point of the thread - so I am happy to leave
    that for now.

    If you want, we can start a small [OT].


    I know my ring-buffer implementation has the restriction of having a
    buffer with a power-of-two size. However I like it, because I can avoid introducing a new variable (actual number of elements in the buffer) or
    waste an element to solve the ambiguity when the buffer is full or empty.
    </OT>


    Suppose rxfifo.in=rxfifo.out=127, FIFO is empty. When a new char is
    received, it is saved into rxfifo.buf[127 % 128=127] and rxfifo.in will
    be increased to 128.
    Now mainloop detect the new char (in != out), reads the new char at
    rxfifo.buf[127 % 128=127] and increase out that will be 128.

    The next byte will be saved into rxfifo.rxbuf[rxfifo.in % 128=128 % 128
    = 0] and rxfifo.in will be 129. Again, the next byte will be saved to
    rxbuf[rxfifo.in % 128=129 % 128=1] and rxfifo.in will be 130.

    When the mainloop tries to pop data from fifo, it tests

       rxfifo.in(130) !=rxfifo.out(128)

    The test is true, so the code extracts chars from rxbuf[out % 128] that
    is rxbuf[0]... and so on.

    I hope that explanation is good.



       // Reset interrupt flag
    }

    /* Called regularly from mainloop code */
    int uart_task(void) {
       int c = -1;
       if (out != in) {
         c = rxfifo.buf[out % RXBUF_SIZE];
         out++;
       }
       return -1;
    }

    int uart_task(void) {
       int c = -1;
       uint8_t next = rxfifo.head;
       if (next != rxfifo.tail) {
           c = rxfifo.buf[next];
           next++;
           if (next >= RXBUF_SIZE) next -= RXBUF_SIZE;
           rxfifo.head = next;
       }
       return c;
    }

    These don't track buffer overflow at all - you need to call uart_task()
    often enough to avoid that.

    Sure, with a good number for RXBUF_SIZE, buffer overflow shouldn't
    happen ever. Anyway, if it happens, the higher level layers (protocol)
    should detect a corrupted packet.


    You risk getting seriously out of sync if there is an overflow.
    Normally, on an overflow there will be a dropped character or two (which
    as you say, must be caught at a higher level). Here you could end up
    going round your buffer an extra time and /gaining/ RXBUF_SIZE extra characters.

    Still, if you are sure that your functions are called fast enough so
    that overflow is not a concern, then that's fine. Extra code to check
    for a situation that can't occur is not helpful.


    (I'm skipping volatiles so we don't get ahead of your point.)



     From a 20-years old article[1] by Nigle Jones, this seems a situation >>>> where volatile must be used for rxfifo.in, because is modified by an ISR >>>> and used in the mainloop code.


    Certainly whenever data is shared between ISR's and mainloop code, or
    different threads, then you need to think about how to make sure data is >>> synchronised and exchanged.  "volatile" is one method, atomics are
    another, and memory barriers can be used.

    I don't think so, rxfifo.in is read from memory only one time in
    uart_task(), so there isn't the risk that compiler can optimize badly.

    That is incorrect in two ways.  One - baring compiler bugs (which do
    occur, but they are very rare compared to user bugs), there is no such
    thing as "optimising badly".  If optimising changes the behaviour of the >>> code, other than its size and speed, the code is wrong.

    Yes of course, but I don't think the absence of volatile for rxfifo.in,
    even if it can change in ISR, could be a *real* problem with *modern and
    current* compilers.


    Personally, I am not satisfied with "it's unlikely to be a problem in practice" - I prefer "The language guarantees it is not a problem".
    Remember, when you know the data needs to be read at this point, then
    using a volatile read is free. Volatile does not make code less
    efficient unless you use it incorrectly and force more accesses than are necessary. So using volatile accesses for "rxfifo.in" here turns
    "probably safe" into "certainly safe" without cost. What's not to like?

    voltile attribute needs to avoid compiler optimization (that would be a
    bad thing, because of volatile nature of the variabile), but on that
    code it's difficult to think of an optimization, caused by the absence
    of volatile, that changes the behaviour erroneously... except reorering.


    Two - it is a
    very bad idea to imagine that having code inside a function somehow
    "protects" it from re-ordering or other optimisation.

    I didn't say this, at the contrary I was thinking exactly to reordering
    issues.


    Functions can be inlined, outlined, cloned, and shuffled about.
    Link-time optimisation, code in headers, C++ modules, and other
    cross-unit optimisations are becoming more and more common.  So while it >>> might be true /today/ that the compiler has no alternative but to read
    rxfifo.in once per call to uart_task(), you cannot assume that will be
    the case with later compilers or with more advanced optimisation
    techniques enabled.  It is safer, more portable, and more future-proof
    to avoid such assumptions.

    Ok, you are talking of future scenarios. I don't think actually this
    could be a real problem. Anyway your observation makes sense.



    Even if ISR is fired immediately after the if statement, this doesn't
    bring to a dangerous state: the just received data will be processed at >>>> the next call to uart_task().

    So IMHO volatile isn't necessary here. And critical sections (i.e.
    disabling interrupts) aren't useful too.

    However I'm thinking about memory barrier. Suppose the compiler reorder >>>> the instructions in uart_task() as follows:


       c = rxfifo.buf[out % RXBUF_SIZE]
       if (out != in) {
         out++;
         return c;
       } else {
         return -1;
       }


    Here there's a big problem, because compiler decided to firstly read
    rxfifo.buf[] and then test in and out equality. If the ISR is fired
    immediately after moving data to c (most probably an internal register), >>>> the condition in the if statement will be true and the register value is >>>> returned. However the register value isn't correct.

    You are absolutely correct.


    I don't think any modern C compiler reorder uart_task() in this way, but >>>> we can't be sure. The result shouldn't change for the compiler, so it
    can do this kind of things.

    It is not an unreasonable re-arrangement.  On processors with
    out-of-order execution (which does not apply to the AVR or Cortex-M),
    compilers will often push loads as early as they can in the instruction
    stream so that they start the cache loading process as quickly as
    possible.  (But note that on such "big" processors, much of this
    discussion on volatile and memory barriers is not sufficient, especially >>> if there is more than one core.  You need atomics and fences, but that's >>> a story for another day.)


    How to fix this issue if I want to be extremely sure the compiler will >>>> not reorder this way? Applying volatile to rxfifo.in shouldn't help for >>>> this, because compiler is allowed to reorder access of non volatile
    variables yet[2].


    The important thing about "volatile" is that it is /accesses/ that are
    volatile, not objects.  A volatile object is nothing more than an object >>> for which all accesses are volatile by default.  But you can use
    volatile accesses on non-volatile objects.  This macro is your friend:

    #define volatileAccess(v) *((volatile typeof((v)) *) &(v))

    (Linux has the same macro, called ACCESS_ONCE.  It uses a gcc extension
    - if you are using other compilers then you can make an uglier
    equivalent using _Generic.  However, if you are using a C compiler that
    supports C11, it is probably gcc or clang, and you can use the "typeof"
    extension.)

    That macro will let you make a volatile read or write to an object
    without requiring that /all/ accesses to it are volatile.

    This is a good point. The code in ISR can't be interrupted, so there's
    no need to have volatile access in ISR.


    Correct. (Well, /almost/ correct - bigger microcontrollers have
    multiple interrupt priorities. But it should be correct in this case,
    as no other interrupt would be messing with the same variables anyway.)

    Yes, static variables defined in a uart driver aren't accessed from
    other interrupts.


    One solution is adding a memory barrier in this way:


    int uart_task(void) {
       int c = -1;
       if (out != in) {
         memory_barrier();
         c = rxfifo.buf[out % RXBUF_SIZE];
         out++;
       }
       return -1;
    }


    Note that you are forcing the compiler to read "out" twice here, as it
    can't keep the value of "out" in a register across the memory barrier.

    Yes, you're right. A small penalty to avoid the problem of reordering.


    But an unnecessary penalty.


    (And as I mentioned before, the compiler might be able to do larger
    scale optimisation across compilation units or functions, and in that
    way keep values across multiple calls to uart_task.)


    However this approach appears to me dangerous. You have to check and
    double check if, when and where memory barriers are necessary and it's >>>> simple to skip a barrier where it's nedded and add a barrier where it
    isn't needed.

    Memory barriers are certainly useful, but they are a shotgun approach -
    they affect /everything/ involving reads and writes to memory.  (But
    remember they don't affect ordering of calculations.)


    So I'm thinking that a sub-optimal (regarding efficiency) but reliable >>>> (regarding the risk to skip a barrier where it is needed) could be to
    enter a critical section (disabling interrupts) anyway, if it isn't
    strictly needed.


    int uart_task(void) {
       ENTER_CRITICAL_SECTION();
       int c = -1;
       if (out != in) {
         c = rxfifo.buf[out % RXBUF_SIZE];
         out++;
       }
       EXIT_CRITICAL_SECTION();
       return -1;
    }

    Critical sections for something like this are /way/ overkill.  And a
    critical section with a division in the middle?  Not a good idea.



    Another solution could be to apply volatile keyword to rxfifo.in *AND* >>>> rxfifo.buf too, so compiler can't change the order of accesses them.

    Do you have other suggestions?


    Marking "in" and "buf" as volatile is /far/ better than using a critical >>> section, and likely to be more efficient than a memory barrier.  You can >>> also use volatileAccess rather than making buf volatile, and it is often >>> slightly more efficient to cache volatile variables in a local variable
    while working with them.

    Yes, I think so too. Lastly I read many experts say volatile is often a
    bad thing, so I'm re-thinking about its use compared with other approaches. >>

    People who say "volatile is a bad thing" are often wrong. Remember, all generalisations are false :-)

    Ok, I wrote "volatile is **often** a bad thing".


    "volatile" is a tool. It doesn't do everything that some people think
    it does, but it is a very useful tool nonetheless. It has little place
    in big systems - Linus Torvalds wrote a rant against it as being both
    too much and too little, and in the context of writing Linux code, he
    was correct. For Linux programming, you should be using OS-specific
    features (which rely on "volatile" for their implementation) or atomics, rather than using "volatile" directly.

    But for small-systems embedded programming, it is very handy. Used
    well, it is free - used excessively it has a cost, but an extra volatile
    will not make an otherwise correct program fail.

    Memory barriers are great for utility functions such as interrupt enable/disable inline functions, but are usually sub-optimal compared to specific and targeted volatile accesses.

    Just to say what I read:

    https://blog.regehr.org/archives/28

    https://mcuoneclipse.com/2021/10/12/spilling-the-beans-volatile-qualifier/


    [1] https://barrgroup.com/embedded-systems/how-to/c-volatile-keyword
    [2] https://blog.regehr.org/archives/28




    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to pozz on Sun Oct 24 18:37:16 2021
    On 24/10/2021 17:39, pozz wrote:
    Il 24/10/2021 13:02, David Brown ha scritto:

    <snipping to save on electrons>

    People who say "volatile is a bad thing" are often wrong.  Remember, all
    generalisations are false :-)

    Ok, I wrote "volatile is **often** a bad thing".


    :-)


    "volatile" is a tool.  It doesn't do everything that some people think
    it does, but it is a very useful tool nonetheless.  It has little place
    in big systems - Linus Torvalds wrote a rant against it as being both
    too much and too little, and in the context of writing Linux code, he
    was correct.  For Linux programming, you should be using OS-specific
    features (which rely on "volatile" for their implementation) or atomics,
    rather than using "volatile" directly.

    But for small-systems embedded programming, it is very handy.  Used
    well, it is free - used excessively it has a cost, but an extra volatile
    will not make an otherwise correct program fail.

    Memory barriers are great for utility functions such as interrupt
    enable/disable inline functions, but are usually sub-optimal compared to
    specific and targeted volatile accesses.

    Just to say what I read:

    https://blog.regehr.org/archives/28

    Yes, you had that in your footnotes - and that is not a bad article.
    (It's better than a lot of that guy's stuff - he also writes a lot of
    crap about undefined behaviour and the evils of optimisation.)


    https://mcuoneclipse.com/2021/10/12/spilling-the-beans-volatile-qualifier/


    That article is mostly wrong - or at best, inappropriate for what you
    are doing. Critical sections are a massive over-kill for a ring buffer
    in most cases. If you intend to call your uart_task() from multiple
    places in a re-entrant manner (e.g., multiple RTOS threads), then you
    have a lot more challenges to deal with than just the ordering of the
    accesses to "in", "out" and "buf" - you can't just throw in a critical
    section and hope it's all okay. And if you are /not/ doing such a daft
    thing, then critical sections are completely unnecessary.



    [1] https://barrgroup.com/embedded-systems/how-to/c-volatile-keyword >>>>> [2] https://blog.regehr.org/archives/28





    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Don Y@21:1/5 to All on Sun Oct 24 12:54:39 2021
    On 10/24/2021 4:14 AM, Dimiter_Popoff wrote:
    Disable interrupts while accessing the fifo. you really have to.
    alternatively you'll often get away not using a fifo at all,
    unless you're blocking for a long while in some part of the code.

    Why would you do that. The fifo write pointer is only modified by
    the interrupt handler, the read pointer is only modified by the
    interrupted code. Has been done so for times immemorial.

    The OPs code doesn't differentiate between FIFO full and empty.
    So, *he* may gain some advantage from disabling interrupts to
    ensure the character he is about to retrieve is n ot overwritten
    by an incoming character, placed at that location (cuz he lets
    his FIFO wrap indiscriminately).

    And, if the offsets ever got larger (wider) -- or became actual
    pointers -- then the possibility of PART of a value being updated
    on either "side" of an ISR is also possible.

    And, there's nothing to say the OP has disclosed EVERYTHING
    that might be happening in his ISR (maintaining handshaking signals,
    flow control, etc.) which could compound the references
    (e.g., if you need to know that you have space for N characters
    remaining so you can signal the remote device to stop sending,
    then you're doing "pointer/offset arithmetic" and *acting* on the
    result)

    Although this thread is on how to wrestle a poor
    language to do what you want, sort of how to use a hammer on a screw
    instead of taking the screwdriver, there would be no need to
    mask interrupts with C either.

    The "problem" with the language is that it gives the compiler the freedom
    to make EQUIVALENT changes to your code that you might not have foreseen
    or that might not have been consistent with your "style" -- yet do not
    alter the results.

    For example, you might want to write:
    x = <expr1>
    y = <expr2>
    just because of some "residual OCD" that makes you think in terms of
    "x before y". Yet, there may be no dependencies in those statements
    that *require* that ordering. So, why should the compiler be crippled
    to implementing them in that order if it has found a way to alter
    their order (or their actual content)?

    A correctly written compiler will follow a set of rules that it *knows*
    to be safe "code translations"; but many developers don't have a similar understanding of those; so the problem lies in the developer's skillset,
    not the compiler or language.

    After all, a programming language -- ANY programming language -- is just
    a vehicle for conveying our desires to the machine in a semi-unambiguous manner. I'd much rather *SAY*, "What are the roots of ax^2 + bx + c?"
    than have to implement an algorithmic solution, worry about cancellation, required precision, etc. (and, in some languages, you can do just that!)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dimiter_Popoff@21:1/5 to Don Y on Sun Oct 24 23:27:43 2021
    On 10/24/2021 22:54, Don Y wrote:
    On 10/24/2021 4:14 AM, Dimiter_Popoff wrote:
    Disable interrupts while accessing the fifo. you really have to.
    alternatively you'll often get away not using a fifo at all,
    unless you're blocking for a long while in some part of the code.

    Why would you do that. The fifo write pointer is only modified by
    the interrupt handler, the read pointer is only modified by the
    interrupted code. Has been done so for times immemorial.

    The OPs code doesn't differentiate between FIFO full and empty.

    So he should fix that first, there is no sane reason why not.
    Few things are simpler to do than that.

    So, *he* may gain some advantage from disabling interrupts to
    ensure the character he is about to retrieve is n ot overwritten
    by an incoming character, placed at that location (cuz he lets
    his FIFO wrap indiscriminately).

    And, if the offsets ever got larger (wider) -- or became actual
    pointers -- then the possibility of PART of a value being updated
    on either "side" of an ISR is also possible.

    And, there's nothing to say the OP has disclosed EVERYTHING
    that might be happening in his ISR (maintaining handshaking signals,
    flow control, etc.) which could compound the references
    (e.g., if you need to know that you have space for N characters
    remaining so you can signal the remote device to stop sending,
    then you're doing "pointer/offset arithmetic" and *acting* on the
    result)

    Whatever handshakes he makes there is no problem knowing whether
    the fifo is full - just check if the position the write pointer
    will have after putting the next byte matches the read pointer
    at the moment. Like I said before, few things are simpler than
    that, can't imagine someone working as a programmer being
    stuck at *that*.


    Although this thread is on how to wrestle a poor
    language to do what you want, sort of how to use a hammer on a screw
    instead of taking the screwdriver, there would be no need to
    mask interrupts with C either.

    The "problem" with the language is that it gives the compiler the freedom
    to make EQUIVALENT changes to your code that you might not have foreseen
    or that might not have been consistent with your "style" -- yet do not
    alter the results.

    Don, let us not go into this. Just looking at the thread is enough to
    see it is about wrestling the language so it can be made some use of.


    After all, a programming language -- ANY programming language -- is just
    a vehicle for conveying our desires to the machine in a semi-unambiguous manner.  I'd much rather *SAY*, "What are the roots of ax^2 + bx + c?"
    than have to implement an algorithmic solution, worry about cancellation, required precision, etc.  (and, in some languages, you can do just that!)

    Indeed you don't want to write how the equation is solved every time.
    This is why you can call it once you have it available. This is language independent.
    Then solving expressions etc. is well within 1% of the effort in
    programming if the task at hand is going to take > 2 weeks; after that
    the programmer's time is wasted on wrestling the language like
    demonstrated by this thread. Sadly almost everybody has accepted
    C as a standard - which makes it a very popular poor language.
    Similar to say Chinese, very popular, spoken by billions, yet
    where are its literary masterpieces. Being hieroglyph based there
    are none; you will have to look at alphabet based languages to
    find some.

    ======================================================
    Dimiter Popoff, TGI http://www.tgi-sci.com ====================================================== http://www.flickr.com/photos/didi_tgi/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Don Y@21:1/5 to All on Sun Oct 24 14:08:13 2021
    On 10/24/2021 1:27 PM, Dimiter_Popoff wrote:
    On 10/24/2021 22:54, Don Y wrote:
    On 10/24/2021 4:14 AM, Dimiter_Popoff wrote:
    Disable interrupts while accessing the fifo. you really have to.
    alternatively you'll often get away not using a fifo at all,
    unless you're blocking for a long while in some part of the code.

    Why would you do that. The fifo write pointer is only modified by
    the interrupt handler, the read pointer is only modified by the
    interrupted code. Has been done so for times immemorial.

    The OPs code doesn't differentiate between FIFO full and empty.

    So he should fix that first, there is no sane reason why not.
    Few things are simpler to do than that.

    Yes, I pointed that out earlier, to him. Why worry about what the
    compiler *might* do if you haven't sorted out what you really WANT to do?

    So, *he* may gain some advantage from disabling interrupts to
    ensure the character he is about to retrieve is n ot overwritten
    by an incoming character, placed at that location (cuz he lets
    his FIFO wrap indiscriminately).

    And, if the offsets ever got larger (wider) -- or became actual
    pointers -- then the possibility of PART of a value being updated
    on either "side" of an ISR is also possible.

    And, there's nothing to say the OP has disclosed EVERYTHING
    that might be happening in his ISR (maintaining handshaking signals,
    flow control, etc.) which could compound the references
    (e.g., if you need to know that you have space for N characters
    remaining so you can signal the remote device to stop sending,
    then you're doing "pointer/offset arithmetic" and *acting* on the
    result)

    Whatever handshakes he makes there is no problem knowing whether
    the fifo is full - just check if the position the write pointer
    will have after putting the next byte matches the read pointer
    at the moment. Like I said before, few things are simpler than
    that, can't imagine someone working as a programmer being
    stuck at *that*.

    Yes, but if you want to implement flow control, you have to tell the
    other end of the line BEFORE you've filled your buffer. There may be
    a character being deserialized AS you are retrieving the "last"
    character, another one (or more) preloaded into the transmitter on
    the far device, etc. And, it will take some time for your
    notification to reach the far end and be recognized as a desire
    to suspend transmission. etc.

    If you wait until you have no more space available, you are almost
    certain to lose characters.

    Although this thread is on how to wrestle a poor
    language to do what you want, sort of how to use a hammer on a screw
    instead of taking the screwdriver, there would be no need to
    mask interrupts with C either.

    The "problem" with the language is that it gives the compiler the freedom
    to make EQUIVALENT changes to your code that you might not have foreseen
    or that might not have been consistent with your "style" -- yet do not
    alter the results.

    Don, let us not go into this. Just looking at the thread is enough to
    see it is about wrestling the language so it can be made some use of.

    The language isn't the problem. Witness the *millions* (?) of programs
    written in it, over the past 5 decades.

    The problem is that it never was an assembly language -- even though it
    was treated as such "in days gone by" (because the compiler's were
    just "language translators" and didn't add any OTHER value to the
    "programming process").

    It's only recently that compilers have become "independent agents",
    of a sort... adding their own "spin" on the developer's code.

    And, with more capable hardware (multiple cores/threads) being "dirt cheap", it's a lot easier for a developer to find himself in a situation that
    was previously pie-in-the-sky.

    After all, a programming language -- ANY programming language -- is just
    a vehicle for conveying our desires to the machine in a semi-unambiguous
    manner. I'd much rather *SAY*, "What are the roots of ax^2 + bx + c?"
    than have to implement an algorithmic solution, worry about cancellation,
    required precision, etc. (and, in some languages, you can do just that!)

    Indeed you don't want to write how the equation is solved every time.
    This is why you can call it once you have it available. This is language independent.

    For a simple quadratic, you can explore the coefficients to determine which algorithm is best suited to giving you *accurate* results.

    What if I present *any* expression? Can you have your solution available
    to handle any case? Did you even bother to develop such a solution if you
    were only encountering quadratics?

    Then solving expressions etc. is well within 1% of the effort in
    programming if the task at hand is going to take > 2 weeks; after that
    the programmer's time is wasted on wrestling the language like
    demonstrated by this thread. Sadly almost everybody has accepted
    C as a standard - which makes it a very popular poor language.

    It makes it *popular* but concluding that it is "poor" is an overreach.

    There are (and have been) many "safer" languages. Many that are more descriptive (for certain classes of problem). But, C has survived to
    handle all-of-the-above... perhaps in a suboptimal way but at least
    a manner that can get to the desired solution.

    Look at how few applications SNOBOL handles. Write an OS in COBOL? Ada?

    A tool is only effective if it solves real problems. Under real cost
    and time constraints. There are lots of externalities that come into
    play in that analysis.

    I've made some syntactic changes to my code that make it much easier
    to read -- yet mean that I have to EXPLAIN how they work and why they
    are present as any other developer would frown on encountering them.
    (But, it's my opinion that, once explained, that developer will see them
    as an efficient addition to the language in line with other *existing* mechanisms that are already present, there).

    Similar to say Chinese, very popular, spoken by billions, yet
    where are its literary masterpieces. Being hieroglyph based there
    are none; you will have to look at alphabet based languages to
    find some.

    One can say the same thing about Unangax̂ -- spoken by ~100!
    Popularity and literary masterpieces are completely different
    axis.

    Hear much latin or ancient greek spoken, recently?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dimiter_Popoff@21:1/5 to Don Y on Mon Oct 25 00:50:29 2021
    On 10/25/2021 0:08, Don Y wrote:
    On 10/24/2021 1:27 PM, Dimiter_Popoff wrote:
    ....

    Whatever handshakes he makes there is no problem knowing whether
    the fifo is full - just check if the position the write pointer
    will have after putting the next byte matches the read pointer
    at the moment.  Like I said before, few things are simpler than
    that, can't imagine someone working as a programmer being
    stuck at *that*.

    Yes, but if you want to implement flow control, you have to tell the
    other end of the line BEFORE you've filled your buffer.  There may be
    a character being deserialized AS you are retrieving the "last"
    character, another one (or more) preloaded into the transmitter on
    the far device, etc.  And, it will take some time for your
    notification to reach the far end and be recognized as a desire
    to suspend transmission.  etc.

    If you wait until you have no more space available, you are almost
    certain to lose characters.

    Well of course so, we have all done that sort of thing since the 80-s,
    other people have done it before I suppose. Implementing fifo thresholds
    is not (and has never been) rocket science.
    The point is there is no point in throwing huge efforts at a
    self-inflicted problem instead of just doing it the easy way which
    is well, common knowledge.


    Although this thread is on how to wrestle a poor
    language to do what you want, sort of how to use a hammer on a screw
    instead of taking the screwdriver, there would be no need to
    mask interrupts with C either.

    The "problem" with the language is that it gives the compiler the
    freedom
    to make EQUIVALENT changes to your code that you might not have foreseen >>> or that might not have been consistent with your "style" -- yet do not
    alter the results.

    Don, let us not go into this. Just looking at the thread is enough to
    see it is about wrestling the language so it can be made some use of.

    The language isn't the problem.  Witness the *millions* (?) of programs written in it, over the past 5 decades.

    This does not prove much, it has been the only language allowing
    "everybody" to do what they did. I am not denying this is the best
    language currently available to almost everybody. I just happened to
    have been daring enough to explore my own way/language and have seen
    how much is there to be gained if not having to wrestle a language
    which is just a more complete phrase book than the rest of the
    phrase books (aka high level languages).


    Indeed you don't want to write how the equation is solved every time.
    This is why you can call it once you have it available. This is language
    independent.

    For a simple quadratic, you can explore the coefficients to determine which algorithm is best suited to giving you *accurate* results. >
    What if I present *any* expression?  Can you have your solution available
    to handle any case?  Did you even bother to develop such a solution if you were only encountering quadratics?

    Any expression solver has its limitations, why go into that? Mine (in
    the dps environment) can do all arithmetic and logic for
    integers, the fp can do all arithmetic, knows pi, e, haven't needed
    to expand it for years. And again, solving expressions has never taken
    me any significant part of the effort on a project.


    I've made some syntactic changes to my code that make it much easier
    to read -- yet mean that I have to EXPLAIN how they work and why they
    are present as any other developer would frown on encountering them.

    Oh I am well aware of the value of standardization and popularity,
    these are the strongest points of C.

    (But, it's my opinion that, once explained, that developer will see them
    as an efficient addition to the language in line with other *existing* mechanisms that are already present, there).

    Of course, but you have to have them on board first...

    Similar to say Chinese, very popular, spoken by billions, yet
    where are its literary masterpieces. Being hieroglyph based there
    are none; you will have to look at alphabet based languages to
    find some.

    One can say the same thing about Unangax̂ -- spoken by ~100!
    Popularity and literary masterpieces are completely different
    axis.

    Hear much latin or ancient greek spoken, recently?

    The Latin alphabet looks pretty popular nowadays :-). Everything
    evolves, including languages. And there are dead ends within them
    which just die out - e.g. roman numbers. Can't see much future in
    any hieroglyph based language though, inventing a symbol for each
    word has been demonstrated to be a bad idea by history.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Don Y@21:1/5 to All on Sun Oct 24 15:47:39 2021
    On 10/24/2021 2:50 PM, Dimiter_Popoff wrote:
    On 10/25/2021 0:08, Don Y wrote:
    On 10/24/2021 1:27 PM, Dimiter_Popoff wrote:
    ....

    Whatever handshakes he makes there is no problem knowing whether
    the fifo is full - just check if the position the write pointer
    will have after putting the next byte matches the read pointer
    at the moment. Like I said before, few things are simpler than
    that, can't imagine someone working as a programmer being
    stuck at *that*.

    Yes, but if you want to implement flow control, you have to tell the
    other end of the line BEFORE you've filled your buffer. There may be
    a character being deserialized AS you are retrieving the "last"
    character, another one (or more) preloaded into the transmitter on
    the far device, etc. And, it will take some time for your
    notification to reach the far end and be recognized as a desire
    to suspend transmission. etc.

    If you wait until you have no more space available, you are almost
    certain to lose characters.

    Well of course so, we have all done that sort of thing since the 80-s,
    other people have done it before I suppose. Implementing fifo thresholds
    is not (and has never been) rocket science.
    The point is there is no point in throwing huge efforts at a
    self-inflicted problem instead of just doing it the easy way which
    is well, common knowledge.

    *My* point (to the OP) was that you need to understand what you
    will be doing before you can understand the "opportunities"
    the compiler will have to catch you off guard.

    Although this thread is on how to wrestle a poor
    language to do what you want, sort of how to use a hammer on a screw >>>>> instead of taking the screwdriver, there would be no need to
    mask interrupts with C either.

    The "problem" with the language is that it gives the compiler the freedom >>>> to make EQUIVALENT changes to your code that you might not have foreseen >>>> or that might not have been consistent with your "style" -- yet do not >>>> alter the results.

    Don, let us not go into this. Just looking at the thread is enough to
    see it is about wrestling the language so it can be made some use of.

    The language isn't the problem. Witness the *millions* (?) of programs
    written in it, over the past 5 decades.

    This does not prove much, it has been the only language allowing
    "everybody" to do what they did.

    ASM has always been available. Folks just found it too inefficient
    to solve "big" problems, in reasonable effort.

    I am not denying this is the best
    language currently available to almost everybody. I just happened to
    have been daring enough to explore my own way/language and have seen
    how much is there to be gained if not having to wrestle a language
    which is just a more complete phrase book than the rest of the
    phrase books (aka high level languages).

    But you only have yourself as a client. Most of us have to write code
    (or modify already written code) that others will see/maintain. It
    does no good to have a "great tool" if no one else uses it!

    I use (scant!) ASM, a modified ("proprietary") C dialect, SQL and a scripting language in my current design. (not counting the tools that generate my documentation).

    This is a LOT to expect a developer to have a firm grasp of. But, inventing
    a new language that will address all of these needs would be even moreso!
    At least one can find books/documentation describing each of these individual languages *and* likely find folks with proficiency in each of them. So, I
    can spend my efforts describing "how things work" instead of the details
    of how to TELL them to work.

    I've made some syntactic changes to my code that make it much easier
    to read -- yet mean that I have to EXPLAIN how they work and why they
    are present as any other developer would frown on encountering them.

    Oh I am well aware of the value of standardization and popularity,
    these are the strongest points of C.

    (But, it's my opinion that, once explained, that developer will see them
    as an efficient addition to the language in line with other *existing*
    mechanisms that are already present, there).

    Of course, but you have to have them on board first...

    Yes. They have to have incentive to want to use the codebase.
    They'd not be keen on making any special effort to learn how
    to modify a "program" that picks resistor values to form
    voltage dividers.

    And, even "well motivated", what you are asking them to
    embrace has to be acceptable to their sense of reason.
    E.g., expecting folks to adopt postfix notation just
    because you chose to use it is probably a nonstarter
    (i.e., "show me some OTHER reason that justifies its use!").

    Or, the wonky operator set that APL uses...

    Similar to say Chinese, very popular, spoken by billions, yet
    where are its literary masterpieces. Being hieroglyph based there
    are none; you will have to look at alphabet based languages to
    find some.

    One can say the same thing about Unangax̂ -- spoken by ~100!
    Popularity and literary masterpieces are completely different
    axis.

    Hear much latin or ancient greek spoken, recently?

    The Latin alphabet looks pretty popular nowadays :-). Everything
    evolves, including languages. And there are dead ends within them
    which just die out - e.g. roman numbers. Can't see much future in
    any hieroglyph based language though, inventing a symbol for each
    word has been demonstrated to be a bad idea by history.

    Witness the rise of arabic numerals and their efficacy towards
    advancing mathematics.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dimiter_Popoff@21:1/5 to Don Y on Mon Oct 25 02:32:02 2021
    On 10/25/2021 1:47, Don Y wrote:
    ...

    ASM has always been available.

    There is no such language as ASM, there is a wide variety of machines.

    Folks just found it too inefficient
    to solve "big" problems, in reasonable effort.

    Especially with the advent of load/store machines (although C must have
    been helped a lot by the clunky x86 architecture for its popularity), programming in the native assembler for any RISC machine would be
    masochistic at best. Which is why I took the steps I took etc., no
    need to go into that.


    I am not denying this is the best
    language currently available to almost everybody. I just happened to
    have been daring enough to explore my own way/language and have seen
    how much is there to be gained if not having to wrestle a language
    which is just a more complete phrase book than the rest of the
    phrase books (aka high level languages).

    But you only have yourself as a client.

    Yes, but this does not mean much. Looking at pieces I wrote 20 or
    30 years ago - even 10 years ago sometimes - is like reading it
    for the first time for many parts (tens of megabytes of sources, http://tgi-sci.com/misc/scnt21.gif ).

    Most of us have to write code
    (or modify already written code) that others will see/maintain.  It
    does no good to have a "great tool" if no one else uses it! >
    I use (scant!) ASM, a modified ("proprietary") C dialect, SQL and a
    scripting
    language in my current design.  (not counting the tools that generate my documentation).

    Here comes the advantage of an "alphabet" rather than "hieroglyph" based approach/language. A lot less of lookup tables to memorize, you learn
    while going etc. I am quite sure someone like you would get used to it
    quite fast, much much faster than to an unknown high level language.
    In fact it may take you very short to see it is something you have more
    or less been familiar with forever.
    Grasping the big picture of the entire environment and becoming
    really good at writing within it would take longer, obviously.

    ....

    Hear much latin or ancient greek spoken, recently?

    The Latin alphabet looks pretty popular nowadays :-). Everything
    evolves, including languages. And there are dead ends within them
    which just die out - e.g. roman numbers. Can't see much future in
    any hieroglyph based language though, inventing a symbol for each
    word has been demonstrated to be a bad idea by history.

    Witness the rise of arabic numerals and their efficacy towards
    advancing mathematics.

    Yes, another good example of how it is the foundation you step on
    that really matters. Step on the Roman numbers and good luck with
    your math...

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Don Y@21:1/5 to All on Sun Oct 24 18:34:07 2021
    On 10/24/2021 4:32 PM, Dimiter_Popoff wrote:
    On 10/25/2021 1:47, Don Y wrote:
    ...

    ASM has always been available.

    There is no such language as ASM, there is a wide variety of machines.

    Of course there's a language called ASM! It's just target specific!
    It is available for each different processor.

    And highly NONportable!

    Folks just found it too inefficient
    to solve "big" problems, in reasonable effort.

    Especially with the advent of load/store machines (although C must have
    been helped a lot by the clunky x86 architecture for its popularity), programming in the native assembler for any RISC machine would be
    masochistic at best. Which is why I took the steps I took etc., no
    need to go into that.

    I am not denying this is the best
    language currently available to almost everybody. I just happened to
    have been daring enough to explore my own way/language and have seen
    how much is there to be gained if not having to wrestle a language
    which is just a more complete phrase book than the rest of the
    phrase books (aka high level languages).

    But you only have yourself as a client.

    Yes, but this does not mean much. Looking at pieces I wrote 20 or
    30 years ago - even 10 years ago sometimes - is like reading it
    for the first time for many parts (tens of megabytes of sources, http://tgi-sci.com/misc/scnt21.gif ).

    Of course it means something! If someone else has to step into
    your role *tomorrow*, there'd be little/no progress on your codebase
    until they learned your toolchain/language.

    An employer has to expect that any employee can "become unavailable"
    at any time. And, with that, the labors for which they'd previously
    paid, should still retain their value. I've had clients outright ask
    me, "What happens if you get hit by a bus?"

    Most of us have to write code
    (or modify already written code) that others will see/maintain. It
    does no good to have a "great tool" if no one else uses it! >
    I use (scant!) ASM, a modified ("proprietary") C dialect, SQL and a scripting
    language in my current design. (not counting the tools that generate my
    documentation).

    Here comes the advantage of an "alphabet" rather than "hieroglyph" based approach/language. A lot less of lookup tables to memorize, you learn
    while going etc. I am quite sure someone like you would get used to it
    quite fast, much much faster than to an unknown high level language.
    In fact it may take you very short to see it is something you have more
    or less been familiar with forever.
    Grasping the big picture of the entire environment and becoming
    really good at writing within it would take longer, obviously.

    But that can be said of any HLL. That doesn't mean an employer
    wants to pay you to *learn* (some *previous* employer was expected
    to have done that!). They want to have to, at most, train you on
    the needs of their applications/markets.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to All on Mon Oct 25 08:57:14 2021
    On 24/10/2021 13:14, Dimiter_Popoff wrote:
    On 10/24/2021 13:39, Johann Klammer wrote:

    Disable interrupts while accessing the fifo. you really have to.
    alternatively you'll often get away not using a fifo at all,
    unless you're blocking for a long while in some part of the code.


    Why would you do that. The fifo write pointer is only modified by
    the interrupt handler, the read pointer is only modified by the
    interrupted code. Has been done so for times immemorial.

    Although this thread is on how to wrestle a poor
    language to do what you want, sort of how to use a hammer on a screw
    instead of taking the screwdriver, there would be no need to
    mask interrupts with C either.


    There's nothing wrong with the language here - C is perfectly capable of expressing what the OP needs. But getting the "volatile" usage optimal
    here - enough to cover what you need, but not accidentally reducing the efficiency of the code - requires a bit of thought. "volatile" is often misunderstood in C, and it's good that the OP is asking to be sure. C
    also has screwdrivers in its toolbox, they are just buried under all the hammers!

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Niklas Holsti@21:1/5 to Don Y on Mon Oct 25 10:56:08 2021
    On 2021-10-25 0:08, Don Y wrote:

    [snip]


    There are (and have been) many "safer" languages.  Many that are more descriptive (for certain classes of problem).  But, C has survived to
    handle all-of-the-above... perhaps in a suboptimal way but at least
    a manner that can get to the desired solution.

    Look at how few applications SNOBOL handles.  Write an OS in COBOL?  Ada?


    I don't know about COBOL, but typically the real-time kernels ("run-time systems") associated with Ada compilers for bare-board embedded systems
    are written in Ada, with a minor amount of assembly language for the
    most HW-related bits like HW context saving and restoring. I'm pretty
    sure that C-language OS kernels also use assembly for those things.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Niklas Holsti@21:1/5 to All on Mon Oct 25 11:09:07 2021
    On 2021-10-24 23:27, Dimiter_Popoff wrote:
    On 10/24/2021 22:54, Don Y wrote:
    On 10/24/2021 4:14 AM, Dimiter_Popoff wrote:
    Disable interrupts while accessing the fifo. you really have to.
    alternatively you'll often get away not using a fifo at all,
    unless you're blocking for a long while in some part of the code.

    Why would you do that. The fifo write pointer is only modified by
    the interrupt handler, the read pointer is only modified by the
    interrupted code. Has been done so for times immemorial.

    The OPs code doesn't differentiate between FIFO full and empty.

    So he should fix that first, there is no sane reason why not.
    Few things are simpler to do than that.


    [snip]


    Whatever handshakes he makes there is no problem knowing whether
    the fifo is full - just check if the position the write pointer
    will have after putting the next byte matches the read pointer
    at the moment.  Like I said before, few things are simpler than
    that, can't imagine someone working as a programmer being
    stuck at *that*.

    That simple check would require keeping a maximum of only N-1 entries in
    the N-position FIFO buffer, and the OP explicitly said they did not want
    to allocate an unused place in the buffer (which I think is unreasonable
    of the OP, but that is only IMO).

    The simple explanation for the N-1 limit is that the difference between
    two wrap-around pointers into an N-place buffer has at most N different
    values, while there are N+1 possible filling states of the buffer, from
    empty (zero items) to full (N items).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Don Y on Mon Oct 25 09:41:23 2021
    On 24/10/2021 23:08, Don Y wrote:

    The language isn't the problem.  Witness the *millions* (?) of programs written in it, over the past 5 decades.

    The problem is that it never was an assembly language -- even though it
    was treated as such "in days gone by" (because the compiler's were
    just "language translators" and didn't add any OTHER value to the "programming process").


    No - the problem is that some people /thought/ it was supposed to be a
    kind of assembly language. It's a people problem, not a language
    problem. C has all you need to handle code such as the OP's - all it
    takes is for people to understand that they need to use the right
    features of the language.

    It's only recently that compilers have become "independent agents",
    of a sort... adding their own "spin" on the developer's code.


    Baring bugs, compilers do what they are told - in the language
    specified. If programmers don't properly understand the language they
    are using, or think it means more than it does, that's the programmers
    that are at fault - not the language or the compiler. If you go into a
    French bakery and ask for horse dung instead of the end of a baguette,
    that's /your/ fault - not the language's fault, and not the baker's fault.

    Add to that, the idea that optimising compilers are new is equally silly.

    The C language is defined in terms of an "abstract machine". The
    generated code has the same effect "as if" it executed everything you
    wrote - but the abstract machine and the real object code only
    synchronise on the observable behaviour. In practice, that means
    volatile accesses happen exactly as often, with exactly the values and
    exactly the order that you gave in the code. Non-volatile accesses can
    be re-ordered, re-arranged, combined, duplicated, or whatever.

    This has been the situation since C was standardised and since more
    advanced compilers arrived, perhaps 30 years ago.


    C is what it is - a language designed long ago, but which turned out to
    be surprisingly effective and long-lived. It's not perfect, but it is
    pretty good and works well for many situations where you need low-level
    coding or near-optimal efficiency. It's not as safe or advanced as many
    new languages, and it is not a beginners' language - you have to know
    what you are doing in order to write C code correctly. You have to
    understand it and follow its rules, whether you like these rules or not.

    Unfortunately, there are quite a few C programmers who /don't/ know
    these rules. And there is a small but vocal fraction who /do/ know the
    rules, but don't like them and feel the rules should therefore not apply
    - and blame compilers, standards committees, and anyone else when things inevitably go wrong. Some people are always a problem, regardless of
    the language!

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Don Y@21:1/5 to Niklas Holsti on Mon Oct 25 01:19:53 2021
    On 10/25/2021 12:56 AM, Niklas Holsti wrote:
    On 2021-10-25 0:08, Don Y wrote:

    There are (and have been) many "safer" languages. Many that are more
    descriptive (for certain classes of problem). But, C has survived to
    handle all-of-the-above... perhaps in a suboptimal way but at least
    a manner that can get to the desired solution.

    Look at how few applications SNOBOL handles. Write an OS in COBOL? Ada?

    I don't know about COBOL, but typically the real-time kernels ("run-time systems") associated with Ada compilers for bare-board embedded systems are written in Ada, with a minor amount of assembly language for the most HW-related bits like HW context saving and restoring. I'm pretty sure that C-language OS kernels also use assembly for those things.

    Of course you *can* do these things. The question is how often
    they are ACTUALLY done with these other languages.

    "Suitability for a particular task" isn't often the criteria that is
    used to make a selection -- for better or worse. There are countless
    other factors that affect an implementation, depending on the environment
    in which it is undertaken (e.g., designs from academia are considerably different than hobbyist designs which are different from commercial
    designs which are...)

    This is true of other disciplines, as well. How often do you think a hardware design follows a course heavily influenced by the "prejudices"/"preferences"
    of the folks responsible for the design vs. the "best" approach to it?

    Step back yet another level of abstraction and see that even the tools
    chosen to perform those tasks are often not "optimally chosen".

    If you are the sole entity involved in a decision making process, then
    you've (typically) got /carte blanche/. But, in most cases, there are
    other voices -- seats at the table -- that shape the final decisions. It
    pays to lift one's head and see which way the wind is blowing, *today*...

    [By the same token, expecting the past to mirror the present is equally
    naive. People forget that tools and processes have evolved (in the 40+
    years that I've been designing embedded products). And, that the isssues
    folks now face often weren't issues when tools were "stupider" (I've
    probably got $60K of obsolete compilers to prove this -- anyone written
    any C on an 1802 recently? Or, a 2A03? 65816? Z180? 6809?) Don't
    even *think* about finding an Ada compiler for them -- in the past!]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Don Y@21:1/5 to Niklas Holsti on Mon Oct 25 01:28:08 2021
    On 10/25/2021 1:09 AM, Niklas Holsti wrote:
    On 2021-10-24 23:27, Dimiter_Popoff wrote:
    On 10/24/2021 22:54, Don Y wrote:
    On 10/24/2021 4:14 AM, Dimiter_Popoff wrote:
    Disable interrupts while accessing the fifo. you really have to.
    alternatively you'll often get away not using a fifo at all,
    unless you're blocking for a long while in some part of the code.

    Why would you do that. The fifo write pointer is only modified by
    the interrupt handler, the read pointer is only modified by the
    interrupted code. Has been done so for times immemorial.

    The OPs code doesn't differentiate between FIFO full and empty.

    So he should fix that first, there is no sane reason why not.
    Few things are simpler to do than that.


    [snip]


    Whatever handshakes he makes there is no problem knowing whether
    the fifo is full - just check if the position the write pointer
    will have after putting the next byte matches the read pointer
    at the moment. Like I said before, few things are simpler than
    that, can't imagine someone working as a programmer being
    stuck at *that*.

    That simple check would require keeping a maximum of only N-1 entries in the N-position FIFO buffer, and the OP explicitly said they did not want to allocate an unused place in the buffer (which I think is unreasonable of the OP, but that is only IMO).

    The simple explanation for the N-1 limit is that the difference between two wrap-around pointers into an N-place buffer has at most N different values, while there are N+1 possible filling states of the buffer, from empty (zero items) to full (N items).

    But, again, that just deals with the "full check". The easiest way to do
    this is just to check ".in" *after* advancement and inhibit the store if
    it coincides with the ".out" value.

    Checking for a "high water mark" to enable flow control requires more computation (albeit simple) as you have to accommodate the delays in
    that notification reaching the remote sender (lest he continue
    sending and overrun your buffer).

    And, later noting when you've consumed enough of the FIFO's contents
    to reach a "low water mark" and reenable the remote's transmissions.

    [And, if you ever have to deal with more "established" protocols
    that require the sequencing of specific control signals DURING
    a transfer, the ISR quickly becomes very complex!]

    When you start "fleshing out" an ISR in this way, you see the code
    quickly becomes more involved than just pushing bytes into a buffer.
    (and, this should give you pause to rethink what you are doing *in*
    the ISR and what can best be handled out of that "precious"
    environment)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Niklas Holsti@21:1/5 to Don Y on Mon Oct 25 11:52:02 2021
    On 2021-10-25 11:19, Don Y wrote:
    On 10/25/2021 12:56 AM, Niklas Holsti wrote:
    On 2021-10-25 0:08, Don Y wrote:

    There are (and have been) many "safer" languages.  Many that are more
    descriptive (for certain classes of problem).  But, C has survived to
    handle all-of-the-above... perhaps in a suboptimal way but at least
    a manner that can get to the desired solution.

    Look at how few applications SNOBOL handles.  Write an OS in COBOL?
    Ada?

    I don't know about COBOL, but typically the real-time kernels
    ("run-time systems") associated with Ada compilers for bare-board
    embedded systems are written in Ada, with a minor amount of assembly
    language for the most HW-related bits like HW context saving and
    restoring. I'm pretty sure that C-language OS kernels also use
    assembly for those things.

    Of course you *can* do these things.


    Then I misunderstood your (rhetorical?) question.


    The question is how often
    they are ACTUALLY done with these other languages.


    I don't find that question very interesting.

    It is a typical chicken-and-egg, first-to-market conundrum. There is an enormous amount of status-quo-favouring friction in awareness,
    education, tool availability, and legacy code.


    [By the same token, expecting the past to mirror the present is equally naive.  People forget that tools and processes have evolved (in the 40+ years that I've been designing embedded products).  And, that the isssues folks now face often weren't issues when tools were "stupider" (I've
    probably got $60K of obsolete compilers to prove this -- anyone written
    any C on an 1802 recently?  Or, a 2A03?  65816?  Z180?  6809?)  Don't even *think* about finding an Ada compiler for them -- in the past!]


    Well, the Janus/Ada compiler was available for Z80 in its day. There are
    also Ada compilers that use C as an intermediate language, with
    applications for example on TI MSP430's, but those were probably not
    available in the past ages you refer to.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Niklas Holsti@21:1/5 to Don Y on Mon Oct 25 12:06:04 2021
    On 2021-10-25 11:28, Don Y wrote:
    On 10/25/2021 1:09 AM, Niklas Holsti wrote:
    On 2021-10-24 23:27, Dimiter_Popoff wrote:
    On 10/24/2021 22:54, Don Y wrote:
    On 10/24/2021 4:14 AM, Dimiter_Popoff wrote:
    Disable interrupts while accessing the fifo. you really have to.
    alternatively you'll often get away not using a fifo at all,
    unless you're blocking for a long while in some part of the code.

    Why would you do that. The fifo write pointer is only modified by
    the interrupt handler, the read pointer is only modified by the
    interrupted code. Has been done so for times immemorial.

    The OPs code doesn't differentiate between FIFO full and empty.

    So he should fix that first, there is no sane reason why not.
    Few things are simpler to do than that.


        [snip]


    Whatever handshakes he makes there is no problem knowing whether
    the fifo is full - just check if the position the write pointer
    will have after putting the next byte matches the read pointer
    at the moment.  Like I said before, few things are simpler than
    that, can't imagine someone working as a programmer being
    stuck at *that*.

    That simple check would require keeping a maximum of only N-1 entries
    in the N-position FIFO buffer, and the OP explicitly said they did not
    want to allocate an unused place in the buffer (which I think is
    unreasonable of the OP, but that is only IMO).

    The simple explanation for the N-1 limit is that the difference
    between two wrap-around pointers into an N-place buffer has at most N
    different values, while there are N+1 possible filling states of the
    buffer, from empty (zero items) to full (N items).

    But, again, that just deals with the "full check".  The easiest way to do this is just to check ".in" *after* advancement and inhibit the store if
    it coincides with the ".out" value.

    Checking for a "high water mark" to enable flow control requires more computation (albeit simple) as you have to accommodate the delays in
    that notification reaching the remote sender (lest he continue
    sending and overrun your buffer).

    And, later noting when you've consumed enough of the FIFO's contents
    to reach a "low water mark" and reenable the remote's transmissions.

    [And, if you ever have to deal with more "established" protocols
    that require the sequencing of specific control signals DURING
    a transfer, the ISR quickly becomes very complex!]


    Of course. Perhaps you (Don) did not see that I was agreeing with your
    position and objecting to the "it is very simple" stance of Dimiter (considering the OP's expressed constraints).

    Personally I would use critical sections to avoid relying on delicate
    reasoning about interleaved executions. And to allow for easy future complexification of the concurrent activities. The overhead of interrupt disabling and enabling is seldom significant when that can be done
    directly without kernel calls.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Don Y@21:1/5 to Niklas Holsti on Mon Oct 25 02:35:17 2021
    On 10/25/2021 2:06 AM, Niklas Holsti wrote:
    On 2021-10-25 11:28, Don Y wrote:
    On 10/25/2021 1:09 AM, Niklas Holsti wrote:
    On 2021-10-24 23:27, Dimiter_Popoff wrote:
    On 10/24/2021 22:54, Don Y wrote:
    On 10/24/2021 4:14 AM, Dimiter_Popoff wrote:
    Disable interrupts while accessing the fifo. you really have to. >>>>>>> alternatively you'll often get away not using a fifo at all,
    unless you're blocking for a long while in some part of the code. >>>>>>
    Why would you do that. The fifo write pointer is only modified by
    the interrupt handler, the read pointer is only modified by the
    interrupted code. Has been done so for times immemorial.

    The OPs code doesn't differentiate between FIFO full and empty.

    So he should fix that first, there is no sane reason why not.
    Few things are simpler to do than that.


    [snip]


    Whatever handshakes he makes there is no problem knowing whether
    the fifo is full - just check if the position the write pointer
    will have after putting the next byte matches the read pointer
    at the moment. Like I said before, few things are simpler than
    that, can't imagine someone working as a programmer being
    stuck at *that*.

    That simple check would require keeping a maximum of only N-1 entries in the
    N-position FIFO buffer, and the OP explicitly said they did not want to
    allocate an unused place in the buffer (which I think is unreasonable of the
    OP, but that is only IMO).

    The simple explanation for the N-1 limit is that the difference between two >>> wrap-around pointers into an N-place buffer has at most N different values, >>> while there are N+1 possible filling states of the buffer, from empty (zero >>> items) to full (N items).

    But, again, that just deals with the "full check". The easiest way to do
    this is just to check ".in" *after* advancement and inhibit the store if
    it coincides with the ".out" value.

    Checking for a "high water mark" to enable flow control requires more
    computation (albeit simple) as you have to accommodate the delays in
    that notification reaching the remote sender (lest he continue
    sending and overrun your buffer).

    And, later noting when you've consumed enough of the FIFO's contents
    to reach a "low water mark" and reenable the remote's transmissions.

    [And, if you ever have to deal with more "established" protocols
    that require the sequencing of specific control signals DURING
    a transfer, the ISR quickly becomes very complex!]

    Of course. Perhaps you (Don) did not see that I was agreeing with your position
    and objecting to the "it is very simple" stance of Dimiter (considering the OP's expressed constraints).

    Yes, but I was afraid the emphasis would shift away from the "more involved" case (by trivializing the "full" case).

    Personally I would use critical sections to avoid relying on delicate reasoning
    about interleaved executions. And to allow for easy future complexification of
    the concurrent activities. The overhead of interrupt disabling and enabling is
    seldom significant when that can be done directly without kernel calls.

    We (developers, in general) tend to forget how often we cobble
    together solutions from past implementations. And, as those past implementations tend to be lax when it comes to enumerating the
    assumptions under which they were created, we end up propagating
    a bunch of dubious qualifiers that ultimately affect the code's
    performance and "correctness".

    Someone (including ourselves) trying to pilfer code from THIS
    project might incorrectly expect the ISR to protect against buffer
    wrap. Or, implement flow control. Or, be designed for a higher
    data rate than it actually sees (saw!) -- will the buffer size -- and
    task() timing -- be adequate to handle burst transmissions at 115Kbaud?
    If not, where is the upper bound? What if the CPU clock is changed?
    Or, the processor load? ...

    "Steal" several bits of code -- possibly from different projects -- and
    you've got an assortment of such hidden assumptions, all willing to eat
    your lunch! While you remain convinced that none of those things
    can happen!

    My first UART driver had to manage about a dozen control signals as
    the standard had a different intent and interpretation in the mid 70's,
    early 80's (anyone remember TWX? TELEX? DB25s?). Porting it forward
    ended up with a bunch of "issues" that no longer applied. (e.g.,
    RTS/CTS weren't originally used as flow control/handshaking signals as
    they are commonly used, now). *Assuming* a letter revision of the standard
    was benign wrt the timing of signal transitions was folly.

    You only see these things when you lay out all of your assumptions
    in the codebase. And, hope the next guy actually READS what you took
    the time to WRITE!

    [When I wrote my 9 track tape driver, I had ~200 lines of commentary
    just explaining the role of the interface wrt the formatter, transports,
    etc. E.g., when you can read reverse, seek forward, rewind, etc.
    with multiple transports hung off that same interface. Otherwise,
    an observant developer would falsely conclude that the driver was
    riddled with bugs -- as it *facilitated* a multiplicity of
    concurrent operations]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Don Y@21:1/5 to Niklas Holsti on Mon Oct 25 02:50:04 2021
    On 10/25/2021 1:52 AM, Niklas Holsti wrote:
    On 2021-10-25 11:19, Don Y wrote:
    On 10/25/2021 12:56 AM, Niklas Holsti wrote:
    On 2021-10-25 0:08, Don Y wrote:

    There are (and have been) many "safer" languages. Many that are more
    descriptive (for certain classes of problem). But, C has survived to
    handle all-of-the-above... perhaps in a suboptimal way but at least
    a manner that can get to the desired solution.

    Look at how few applications SNOBOL handles. Write an OS in COBOL? Ada? >>>
    I don't know about COBOL, but typically the real-time kernels ("run-time >>> systems") associated with Ada compilers for bare-board embedded systems are >>> written in Ada, with a minor amount of assembly language for the most
    HW-related bits like HW context saving and restoring. I'm pretty sure that >>> C-language OS kernels also use assembly for those things.

    Of course you *can* do these things.

    Then I misunderstood your (rhetorical?) question.

    The question is how often
    they are ACTUALLY done with these other languages.

    I don't find that question very interesting.

    Why not? If a tool isn't used for a purpose for which it *should*
    be "ideal", you have to start wondering "why not?" Was it NOT
    suited to the task? Was it too costly (money and/or experience)?
    How do we not repeat that problem, going forward? I.e., is it
    better to EVOLVE a language to acquire the characteristics of the
    "better" one -- rather than trying to encourage people to
    "jump ship"?

    It is a typical chicken-and-egg, first-to-market conundrum. There is an enormous amount of status-quo-favouring friction in awareness, education, tool
    availability, and legacy code.

    Of course!

    And, there is also the pressure of the market. Do *you* want to be The Guy
    who tries something new and sinks a product's development or market release?
    If your approach proves to be a big hit, will you benefit as much as you'd
    LOSE if it was a flop?

    [By the same token, expecting the past to mirror the present is equally
    naive. People forget that tools and processes have evolved (in the 40+
    years that I've been designing embedded products). And, that the isssues
    folks now face often weren't issues when tools were "stupider" (I've
    probably got $60K of obsolete compilers to prove this -- anyone written
    any C on an 1802 recently? Or, a 2A03? 65816? Z180? 6809?) Don't
    even *think* about finding an Ada compiler for them -- in the past!]

    Well, the Janus/Ada compiler was available for Z80 in its day. There are also Ada compilers that use C as an intermediate language, with applications for example on TI MSP430's, but those were probably not available in the past ages
    you refer to.

    I recall JRT Pascal and PL/M as the "high level" languages, back then.
    C compilers were notoriously bad. You could literally predict the
    code that would be generated for any statement. The whole idea of
    "peephole optimizers" looking for:
    STORE A
    LOAD A
    sequences to elide is testament to how little global knowledge they
    had of the code they were processing.

    Performance? A skilled ASM coder could beat the generated code
    (in time AND space) without breaking into a sweat.

    And, you bought a compiler/assembler/linker/debugger for *each*
    processor -- not just a simple command line switch to alter the
    code generation, etc. Vendors might have a common codebase
    for the tools but built each variant conditionally.

    The limits of the language were largely influenced by the targeted
    hardware -- "helper routines" to support longs, floats, etc.
    ("Oh, did you want that support to be *reentrant*? We assumed
    there would be a single floating point accumulator used throughout
    the code, not one per thread!") Different sizes of addresses
    (e.g., for the Z180, you could have 16b "logical" addresses
    and 24b physical addresses -- mapped into that logical space
    by the compiler's runtime support and linkage editor.)

    Portable code? Maybe -- with quite a bit of work!

    Fast/small? Meh...

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Niklas Holsti on Mon Oct 25 13:49:02 2021
    On 25/10/2021 10:52, Niklas Holsti wrote:


    Well, the Janus/Ada compiler was available for Z80 in its day. There are
    also Ada compilers that use C as an intermediate language, with
    applications for example on TI MSP430's, but those were probably not available in the past ages you refer to.

    Presumably there is gcc-based Ada for the msp430 (as there is for the
    8-bit AVR)? There might not be a full library available, or possibly
    some missing features in the language.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Niklas Holsti@21:1/5 to David Brown on Mon Oct 25 15:16:10 2021
    On 2021-10-25 14:49, David Brown wrote:
    On 25/10/2021 10:52, Niklas Holsti wrote:


    Well, the Janus/Ada compiler was available for Z80 in its day. There are
    also Ada compilers that use C as an intermediate language, with
    applications for example on TI MSP430's, but those were probably not
    available in the past ages you refer to.

    Presumably there is gcc-based Ada for the msp430 (as there is for the
    8-bit AVR)?


    Indeed there seems to be one, or at least work towards one: https://sourceforge.net/p/msp430ada/wiki/Home/.


    There might not be a full library available, or possibly
    some missing features in the language.


    Certainly. I think that Janus/Ada for the Z80 was limited to the
    original Ada (Ada 83), and may well have also had some significant
    missing features. But I believe it was self-hosted on CP/M, quite a feat.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dimiter_Popoff@21:1/5 to Niklas Holsti on Mon Oct 25 16:04:44 2021
    On 10/25/2021 11:09, Niklas Holsti wrote:
    On 2021-10-24 23:27, Dimiter_Popoff wrote:
    On 10/24/2021 22:54, Don Y wrote:
    On 10/24/2021 4:14 AM, Dimiter_Popoff wrote:
    Disable interrupts while accessing the fifo. you really have to.
    alternatively you'll often get away not using a fifo at all,
    unless you're blocking for a long while in some part of the code.

    Why would you do that. The fifo write pointer is only modified by
    the interrupt handler, the read pointer is only modified by the
    interrupted code. Has been done so for times immemorial.

    The OPs code doesn't differentiate between FIFO full and empty.

    So he should fix that first, there is no sane reason why not.
    Few things are simpler to do than that.


       [snip]


    Whatever handshakes he makes there is no problem knowing whether
    the fifo is full - just check if the position the write pointer
    will have after putting the next byte matches the read pointer
    at the moment.  Like I said before, few things are simpler than
    that, can't imagine someone working as a programmer being
    stuck at *that*.

    That simple check would require keeping a maximum of only N-1 entries in
    the N-position FIFO buffer, and the OP explicitly said they did not want
    to allocate an unused place in the buffer (which I think is unreasonable
    of the OP, but that is only IMO).

    Well it might be reasonable if the fifo has a size of two, you know :-).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Niklas Holsti@21:1/5 to All on Mon Oct 25 18:34:48 2021
    On 2021-10-25 16:04, Dimiter_Popoff wrote:
    On 10/25/2021 11:09, Niklas Holsti wrote:
    On 2021-10-24 23:27, Dimiter_Popoff wrote:
    On 10/24/2021 22:54, Don Y wrote:
    On 10/24/2021 4:14 AM, Dimiter_Popoff wrote:
    Disable interrupts while accessing the fifo. you really have to.
    alternatively you'll often get away not using a fifo at all,
    unless you're blocking for a long while in some part of the code.

    Why would you do that. The fifo write pointer is only modified by
    the interrupt handler, the read pointer is only modified by the
    interrupted code. Has been done so for times immemorial.

    The OPs code doesn't differentiate between FIFO full and empty.

    So he should fix that first, there is no sane reason why not.
    Few things are simpler to do than that.


        [snip]


    Whatever handshakes he makes there is no problem knowing whether
    the fifo is full - just check if the position the write pointer
    will have after putting the next byte matches the read pointer
    at the moment.  Like I said before, few things are simpler than
    that, can't imagine someone working as a programmer being
    stuck at *that*.

    That simple check would require keeping a maximum of only N-1 entries
    in the N-position FIFO buffer, and the OP explicitly said they did not
    want to allocate an unused place in the buffer (which I think is
    unreasonable of the OP, but that is only IMO).

    Well it might be reasonable if the fifo has a size of two, you know :-).


    And if each of those two items is large, yes. But here we have a FIFO of
    8-bit characters... few programs are so tight on memory that they cannot
    stand one unused octet.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Don Y@21:1/5 to All on Mon Oct 25 11:02:03 2021
    On 10/25/2021 10:53 AM, Dimiter_Popoff wrote:
    On 10/25/2021 20:43, Don Y wrote:
    On 10/25/2021 8:34 AM, Niklas Holsti wrote:
    And if each of those two items is large, yes. But here we have a FIFO of >>> 8-bit characters... few programs are so tight on memory that they cannot >>> stand one unused octet.

    It's not "unused". Rather, it's roll is that of indicating "full/overrun". >> The OP seems to have decided that this is of no concern -- in *one* app?

    Oh come on, I joked about the fifo of two bytes only because this whole thread is a joke

    My comment applies regardless of the size of the FIFO.

    - pages and pages of C to maintain a fifo, what can be
    more of a joke than this.

    Where do you see "pages and pages of C to maintain a FIFO"?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Don Y@21:1/5 to Niklas Holsti on Mon Oct 25 10:43:38 2021
    On 10/25/2021 8:34 AM, Niklas Holsti wrote:
    And if each of those two items is large, yes. But here we have a FIFO of 8-bit
    characters... few programs are so tight on memory that they cannot stand one unused octet.

    It's not "unused". Rather, it's roll is that of indicating "full/overrun".
    The OP seems to have decided that this is of no concern -- in *one* app?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dimiter_Popoff@21:1/5 to Don Y on Mon Oct 25 20:53:18 2021
    On 10/25/2021 20:43, Don Y wrote:
    On 10/25/2021 8:34 AM, Niklas Holsti wrote:
    And if each of those two items is large, yes. But here we have a FIFO
    of 8-bit characters... few programs are so tight on memory that they
    cannot stand one unused octet.

    It's not "unused".  Rather, it's roll is that of indicating "full/overrun". The OP seems to have decided that this is of no concern -- in *one* app?

    Oh come on, I joked about the fifo of two bytes only because this whole
    thread is a joke - pages and pages of C to maintain a fifo, what can be
    more of a joke than this.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From pozz@21:1/5 to All on Mon Oct 25 19:52:52 2021
    Il 25/10/2021 17:34, Niklas Holsti ha scritto:
    On 2021-10-25 16:04, Dimiter_Popoff wrote:
    On 10/25/2021 11:09, Niklas Holsti wrote:
    On 2021-10-24 23:27, Dimiter_Popoff wrote:
    On 10/24/2021 22:54, Don Y wrote:
    On 10/24/2021 4:14 AM, Dimiter_Popoff wrote:
    Disable interrupts while accessing the fifo. you really have to. >>>>>>> alternatively you'll often get away not using a fifo at all,
    unless you're blocking for a long while in some part of the code. >>>>>>
    Why would you do that. The fifo write pointer is only modified by
    the interrupt handler, the read pointer is only modified by the
    interrupted code. Has been done so for times immemorial.

    The OPs code doesn't differentiate between FIFO full and empty.

    So he should fix that first, there is no sane reason why not.
    Few things are simpler to do than that.


        [snip]


    Whatever handshakes he makes there is no problem knowing whether
    the fifo is full - just check if the position the write pointer
    will have after putting the next byte matches the read pointer
    at the moment.  Like I said before, few things are simpler than
    that, can't imagine someone working as a programmer being
    stuck at *that*.

    That simple check would require keeping a maximum of only N-1 entries
    in the N-position FIFO buffer, and the OP explicitly said they did
    not want to allocate an unused place in the buffer (which I think is
    unreasonable of the OP, but that is only IMO).

    Well it might be reasonable if the fifo has a size of two, you know :-).


    And if each of those two items is large, yes. But here we have a FIFO of 8-bit characters... few programs are so tight on memory that they cannot stand one unused octet.

    When I have a small (<256) power-of-two (16, 32, 64, 128) buffer (and
    this is the case for a UART receiving ring-buffer), I like to use this implementation that works and doesn't waste any element.

    However I know this isn't the best implementation ever and it's a pity
    the thread emphasis has been against this implementation (that was used
    as *one* implementation just to have an example to discuss on).

    The main point was the use of volatile (and other techniques) to
    guarantee a correct compiler output, whatever legal (respect the C
    standard) optimizations the compiler thinks to do.

    It seems to me the arguments againts or for volatile are completely
    indipendent from the implementation of ring-buffer.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Don Y@21:1/5 to pozz on Mon Oct 25 11:10:33 2021
    On 10/25/2021 10:52 AM, pozz wrote:
    However I know this isn't the best implementation ever and it's a pity the thread emphasis has been against this implementation (that was used as *one* implementation just to have an example to discuss on).

    The point is that you need a COMPLETE implementation before you start
    thinking about the amount of "license" the compiler can take with your code.

    Here's *part* of an implementation:

    a = 37;

    Now, should I declare A as volatile? Use the register qualifier?
    What size should the integer A be? Can the optimizer elide this
    statement from my code?

    All sorts of questions whose answers depend on the REST of the
    implementation -- not shown!

    The main point was the use of volatile (and other techniques) to guarantee a correct compiler output, whatever legal (respect the C standard) optimizations
    the compiler thinks to do.

    It seems to me the arguments againts or for volatile are completely indipendent
    from the implementation of ring-buffer.

    It has to do with indicating how YOU (the developer) see the object
    being used (accessed). You, in theory, know more about the role of
    the object than the compiler (because it may be accessed in other modules,
    or, have "stuff" tied to it -- like special hardware, etc.) You need a way
    to tell the compiler that "you know what you are doing" in your use
    of the object and that it should restrain itself from making assumptions
    that might not be true.

    If your example doesn't bring to light those various issues, then
    the decision as to its applicability is moot.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From pozz@21:1/5 to All on Mon Oct 25 20:15:03 2021
    Il 23/10/2021 18:09, David Brown ha scritto:
    [...]
    Marking "in" and "buf" as volatile is /far/ better than using a critical section, and likely to be more efficient than a memory barrier. You can
    also use volatileAccess rather than making buf volatile, and it is often slightly more efficient to cache volatile variables in a local variable
    while working with them.

    I think I got your point, but I'm wondering why there are plenty of
    examples of ring-buffer implementations that don't use volatile at all,
    even if the author explicitly refers to interrupts and multithreading.

    Just an example[1] by Quantum Leaps. It promises to be a *lock-free* (I
    think thread-safe) ring-buffer implementation in the scenario of single producer/single consumer (that is my scenario too).

    In the source code there's no use of volatile. I could call
    RingBuf_put() in my rx uart ISR and call RingBuf_get() in my mainloop code.

    From what I learned from you, this code usually works, but the standard doesn't guarantee it will work with every old, current and future compilers.



    [1] https://github.com/QuantumLeaps/lock-free-ring-buffer

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Niklas Holsti on Mon Oct 25 20:58:25 2021
    On 25/10/2021 17:34, Niklas Holsti wrote:

    And if each of those two items is large, yes. But here we have a FIFO of 8-bit characters... few programs are so tight on memory that they cannot stand one unused octet.

    I remember a program I worked with where the main challenge for the
    final features was not figuring out the implementation, but finding a
    few spare bytes of code space and a couple of spare bits of ram to use.
    And that was with 32 KB ROM and 512 bytes RAM (plus some bits in the
    registers of peripherals that weren't used). That was probably the last
    big assembly program I wrote - non-portability was a killer.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to pozz on Mon Oct 25 20:54:26 2021
    On 25/10/2021 20:15, pozz wrote:
    Il 23/10/2021 18:09, David Brown ha scritto:
    [...]
    Marking "in" and "buf" as volatile is /far/ better than using a critical
    section, and likely to be more efficient than a memory barrier.  You can
    also use volatileAccess rather than making buf volatile, and it is often
    slightly more efficient to cache volatile variables in a local variable
    while working with them.

    I think I got your point, but I'm wondering why there are plenty of
    examples of ring-buffer implementations that don't use volatile at all,
    even if the author explicitly refers to interrupts and multithreading.

    You don't have to use "volatile". You can make correct code here using critical sections - it's just a lot less efficient. (If you have a
    queue where more than one context can be reading it or writing it, then
    you /do/ need some kind of locking mechanism.)

    You can also use memory barriers instead of volatile, but it is likely
    to be slightly less efficient.

    You can also use atomics instead of volatiles, but it is also quite
    likely to be slightly less efficient. If you have an SMP system, on the
    other hand, then you need something more than volatile and compiler
    memory barriers - atomics are quite possibly the most efficient solution
    in that case.

    And sometimes you can make code that doesn't need any special treatment
    at all, because you know the way it is being called. If the two ends of
    your buffer are handled by tasks in a cooperative multi-tasking
    scenario, then there is no problem - you don't need to worry about
    volatile or any alternatives. If you know your interrupt can't occur
    while the other end of the buffer is being handled, that can reduce your
    need for volatile. (In particular, that can also avoid complications if
    you have counter variables that are bigger than the processor can handle atomically - usually not a problem for a 32-bit Cortex-M, but often
    important on an 8-bit AVR.)

    If you know, for a fact, that the code will be compiled by a weak
    compiler or with weak optimisation, or that the "get" and "put"
    implementations will always be in a separately compiled unit from code
    calling these functions and you'll never use any kind of cross-unit optimisations, then you can get often away without using volatile.


    Just an example[1] by Quantum Leaps. It promises to be a *lock-free* (I
    think thread-safe) ring-buffer implementation in the scenario of single producer/single consumer (that is my scenario too).

    It's lock-free, but not safe in the face of modern optimisation (gcc has
    had LTO for many years, and a lot of high-end commercial embedded
    compilers have used such techniques for decades). And I'd want to study
    it in detail and think a lot before accepting that it is safe to use its
    16-bit counters on an 8-bit AVR. That could be fixed by just changing
    the definition of the RingBufCtr type, which is a nice feature in the code.


    In the source code there's no use of volatile. I could call
    RingBuf_put() in my rx uart ISR and call RingBuf_get() in my mainloop code.


    You don't want to call functions from an ISR if you can avoid it, unless
    the functions are defined in the same unit and can be inlined. On many processors (less so on the Cortex-M) calling an external function from
    an ISR means a lot of overhead to save and restore the so-called
    "volatile" registers (no relation to the C keyword "volatile"), usually completely unnecessarily.

    From what I learned from you, this code usually works, but the standard doesn't guarantee it will work with every old, current and future
    compilers.


    Yes, that's a fair summary.

    It might be good enough for some purposes. But since "volatile" will
    cost nothing in code efficiency but greatly increase the portability and
    safety of the code, I'd recommend using it. And I am certainly in
    favour of thinking carefully about these things - as you did in the
    first place, which is why we have this thread.



    [1] https://github.com/QuantumLeaps/lock-free-ring-buffer

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Niklas Holsti@21:1/5 to pozz on Mon Oct 25 21:33:44 2021
    On 2021-10-25 20:52, pozz wrote:
    Il 25/10/2021 17:34, Niklas Holsti ha scritto:
    On 2021-10-25 16:04, Dimiter_Popoff wrote:
    On 10/25/2021 11:09, Niklas Holsti wrote:
    On 2021-10-24 23:27, Dimiter_Popoff wrote:
    On 10/24/2021 22:54, Don Y wrote:
    On 10/24/2021 4:14 AM, Dimiter_Popoff wrote:
    Disable interrupts while accessing the fifo. you really have to. >>>>>>>> alternatively you'll often get away not using a fifo at all,
    unless you're blocking for a long while in some part of the code. >>>>>>>
    Why would you do that. The fifo write pointer is only modified by >>>>>>> the interrupt handler, the read pointer is only modified by the
    interrupted code. Has been done so for times immemorial.

    The OPs code doesn't differentiate between FIFO full and empty.


    (I suspect something is not quite right with the attributions of the
    quotations above -- Dimiter probably did not suggest disabling
    interrupts -- but no matter.)

    [snip]


    When I have a small (<256) power-of-two (16, 32, 64, 128) buffer (and
    this is the case for a UART receiving ring-buffer), I like to use this implementation that works and doesn't waste any element.

    However I know this isn't the best implementation ever and it's a pity
    the thread emphasis has been against this implementation (that was used
    as *one* implementation just to have an example to discuss on).

    The main point was the use of volatile (and other techniques) to
    guarantee a correct compiler output, whatever legal (respect the C
    standard) optimizations the compiler thinks to do.

    It seems to me the arguments againts or for volatile are completely indipendent from the implementation of ring-buffer.


    Of course "volatile" is needed, in general, whenever anything is written
    in one thread and read in another. The issue, I think, is when
    "volatile" is _enough_.

    I feel that detection of a full buffer (FIFO overflow) is required for a
    proper ring buffer implementation, and that has implications for the
    data structure needed, and that has implications for whether critical
    sections are needed.

    If the FIFO implementation is based on just two pointers (read and
    write), and each pointer is modified by just one of the two threads
    (main thread = reader, and interrupt handler = writer), and those
    modifications are both "volatile" AND atomic (which has not been
    discussed so far, IIRC...), then one can do without a critical region.
    But then detection of a full buffer needs one "wasted" element in the
    buffer.

    To avoid the wasted element, one could add a "full"/"not full" Boolean
    flag. But that flag would be modified by both threads, and should be
    modified atomically together with the pointer modifications, which (I
    think) means that a critical section is needed.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dimiter_Popoff@21:1/5 to Niklas Holsti on Mon Oct 25 22:09:17 2021
    On 10/25/2021 21:33, Niklas Holsti wrote:
    ....

    If the FIFO implementation is based on just two pointers (read and
    write), and each pointer is modified by just one of the two threads
    (main thread = reader, and interrupt handler = writer), and those modifications are both "volatile" AND atomic (which has not been
    discussed so far, IIRC...), then one can do without a critical region.
    But then detection of a full buffer needs one "wasted" element in the
    buffer.

    Why atomic? No need for that unless more than one interrupted task would
    want to read from the fifo at the same time, which is nonsense. [I once
    wasted a day looking at a garbled input from an auxiliary HV to a netMCA
    only to discover I had left a shell to start via the same UART during
    boot, through an outdated (but available) driver accessing the same
    UART the standard driver through which the HV was used....
    This across half the planet, customer was in South Africa. Not
    an experience anybody would ask for, I can tell you :)].
    Just like there is no need to mask interrupts, as you mentioned I
    had said before.

    To avoid the wasted element, one could add a "full"/"not full" Boolean
    flag. But that flag would be modified by both threads, and should be
    modified atomically together with the pointer modifications, which (I
    think) means that a critical section is needed.

    Now this is where atomic access is necessary - for no good reason in
    this case, as mentioned before, but if one wants to bang their head
    in the wall this is the proper way to do it.
    As for "volatile" I can't say much, but if this is the way to make
    the compiler access every time the address declared such instead of
    using some stale data it has then it would be needed of
    course.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Niklas Holsti@21:1/5 to All on Mon Oct 25 22:53:08 2021
    On 2021-10-25 22:09, Dimiter_Popoff wrote:
    On 10/25/2021 21:33, Niklas Holsti wrote:
    ....

    If the FIFO implementation is based on just two pointers (read and
    write), and each pointer is modified by just one of the two threads
    (main thread = reader, and interrupt handler = writer), and those
    modifications are both "volatile" AND atomic (which has not been
    discussed so far, IIRC...), then one can do without a critical region.
    But then detection of a full buffer needs one "wasted" element in the
    buffer.

    Why atomic?


    If the read/write pointers/indices are, say, 16 bits, but the processor
    has only 8-bit store/load instructions, updating a pointer/index happens non-atomically, 8 bits at a time, and the interrupt handler can read a half-updated value if the interrupt happens in the middle of an update.
    That would certainly mess up the comparison between the read and write
    points in the interrupt handler.

    In the OP's code, I suppose (but I don't recall) that the indices are 8
    bits, so probably atomically readable and writable.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dimiter_Popoff@21:1/5 to Niklas Holsti on Mon Oct 25 23:02:34 2021
    On 10/25/2021 22:53, Niklas Holsti wrote:
    On 2021-10-25 22:09, Dimiter_Popoff wrote:
    On 10/25/2021 21:33, Niklas Holsti wrote:
    ....

    If the FIFO implementation is based on just two pointers (read and
    write), and each pointer is modified by just one of the two threads
    (main thread = reader, and interrupt handler = writer), and those
    modifications are both "volatile" AND atomic (which has not been
    discussed so far, IIRC...), then one can do without a critical
    region. But then detection of a full buffer needs one "wasted"
    element in the buffer.

    Why atomic?


    If the read/write pointers/indices are, say, 16 bits, but the processor
    has only 8-bit store/load instructions, updating a pointer/index happens non-atomically, 8 bits at a time, and the interrupt handler can read a half-updated value if the interrupt happens in the middle of an update.
    That would certainly mess up the comparison between the read and write
    points in the interrupt handler.

    In the OP's code, I suppose (but I don't recall) that the indices are 8
    bits, so probably atomically readable and writable.


    Ah, well, this is a possible scenario in a multicore system (or single
    core if the two bytes are written by separate opcodes).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From antispam@math.uni.wroc.pl@21:1/5 to Don Y on Mon Oct 25 21:32:25 2021
    Don Y <blockedofcourse@foo.invalid> wrote:
    On 10/24/2021 4:14 AM, Dimiter_Popoff wrote:
    Disable interrupts while accessing the fifo. you really have to.
    alternatively you'll often get away not using a fifo at all,
    unless you're blocking for a long while in some part of the code.

    Why would you do that. The fifo write pointer is only modified by
    the interrupt handler, the read pointer is only modified by the
    interrupted code. Has been done so for times immemorial.

    The OPs code doesn't differentiate between FIFO full and empty.

    If you read carefuly what he wrote you would know that he does.
    The trick he uses is that his indices may point outside buffer:
    empty is equal indices, full is difference equal to buffer
    size. Of course his approach has its own limitations, like
    buffer size being power of 2 and with 8 bit indices maximal
    buffer size is 128.

    --
    Waldek Hebisch

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Clifford Heath@21:1/5 to Niklas Holsti on Tue Oct 26 08:43:12 2021
    On 25/10/21 7:09 pm, Niklas Holsti wrote:
    On 2021-10-24 23:27, Dimiter_Popoff wrote:
    On 10/24/2021 22:54, Don Y wrote:
    On 10/24/2021 4:14 AM, Dimiter_Popoff wrote:
    Disable interrupts while accessing the fifo. you really have to.
    alternatively you'll often get away not using a fifo at all,
    unless you're blocking for a long while in some part of the code.

    Why would you do that. The fifo write pointer is only modified by
    the interrupt handler, the read pointer is only modified by the
    interrupted code. Has been done so for times immemorial.

    The OPs code doesn't differentiate between FIFO full and empty.

    So he should fix that first, there is no sane reason why not.
    Few things are simpler to do than that.


       [snip]
    Whatever handshakes he makes there is no problem knowing whether
    the fifo is full - just check if the position the write pointer
    will have after putting the next byte matches the read pointer
    at the moment.  Like I said before, few things are simpler than
    that, can't imagine someone working as a programmer being
    stuck at *that*.

    That simple check would require keeping a maximum of only N-1 entries in
    the N-position FIFO buffer, and the OP explicitly said they did not want
    to allocate an unused place in the buffer (which I think is unreasonable
    of the OP, but that is only IMO).

    In my opinion too. If you're going to waste a memory cell, why not use
    it for a count variable instead of an unused element?

    CH

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From pozz@21:1/5 to All on Mon Oct 25 23:46:23 2021
    Il 25/10/2021 20:33, Niklas Holsti ha scritto:
    On 2021-10-25 20:52, pozz wrote:
    Il 25/10/2021 17:34, Niklas Holsti ha scritto:
    On 2021-10-25 16:04, Dimiter_Popoff wrote:
    On 10/25/2021 11:09, Niklas Holsti wrote:
    On 2021-10-24 23:27, Dimiter_Popoff wrote:
    On 10/24/2021 22:54, Don Y wrote:
    On 10/24/2021 4:14 AM, Dimiter_Popoff wrote:
    Disable interrupts while accessing the fifo. you really have to. >>>>>>>>> alternatively you'll often get away not using a fifo at all, >>>>>>>>> unless you're blocking for a long while in some part of the code. >>>>>>>>
    Why would you do that. The fifo write pointer is only modified by >>>>>>>> the interrupt handler, the read pointer is only modified by the >>>>>>>> interrupted code. Has been done so for times immemorial.

    The OPs code doesn't differentiate between FIFO full and empty.


    (I suspect something is not quite right with the attributions of the quotations above -- Dimiter probably did not suggest disabling
    interrupts -- but no matter.)

       [snip]


    When I have a small (<256) power-of-two (16, 32, 64, 128) buffer (and
    this is the case for a UART receiving ring-buffer), I like to use this
    implementation that works and doesn't waste any element.

    However I know this isn't the best implementation ever and it's a pity
    the thread emphasis has been against this implementation (that was
    used as *one* implementation just to have an example to discuss on).

    The main point was the use of volatile (and other techniques) to
    guarantee a correct compiler output, whatever legal (respect the C
    standard) optimizations the compiler thinks to do.

    It seems to me the arguments againts or for volatile are completely
    indipendent from the implementation of ring-buffer.


    Of course "volatile" is needed, in general, whenever anything is written
    in one thread and read in another. The issue, I think, is when
    "volatile" is _enough_.

    I feel that detection of a full buffer (FIFO overflow) is required for a proper ring buffer implementation, and that has implications for the
    data structure needed, and that has implications for whether critical sections are needed.

    If the FIFO implementation is based on just two pointers (read and
    write), and each pointer is modified by just one of the two threads
    (main thread = reader, and interrupt handler = writer), and those modifications are both "volatile" AND atomic (which has not been
    discussed so far, IIRC...), then one can do without a critical region.
    But then detection of a full buffer needs one "wasted" element in the
    buffer.

    Yeah, this is exactly the topic of my original post. Anyway it seems
    what you say isn't always correct. As per C standard, the compiler could reorder instructions that involve non-volatile data. So, even in your simplified scenario (atomic access for indexes), volatile for the head
    only (that ISR changes) is not sufficient.

    The function called in the mainloop and that get data from the buffer
    access three variables: head (changed in ISR), tail (not changed in ISR)
    and buf[] (written in ISR ad read in mainloop).

    The get function firstly check if some data is available in the FIFO and
    *next* read from buf[]. However compiler could rearrange instructions so reading from buf[] at first and then checking FIFO empty condition.
    If the compiler goes this way, errors could occur during execution.

    My original question was exactly if this could happen (without breaking
    C specifications) and, if yes, how to avoid this: volatile? critical
    section? memory barrier?

    David Brown said this is possible and suggested to access both head and
    buf[] as volatile in get() function, forcing the compiler to respect the
    order of instructions.


    To avoid the wasted element, one could add a "full"/"not full" Boolean
    flag. But that flag would be modified by both threads, and should be
    modified atomically together with the pointer modifications, which (I
    think) means that a critical section is needed.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to All on Tue Oct 26 00:05:43 2021
    On 25/10/2021 22:02, Dimiter_Popoff wrote:
    On 10/25/2021 22:53, Niklas Holsti wrote:
    On 2021-10-25 22:09, Dimiter_Popoff wrote:
    On 10/25/2021 21:33, Niklas Holsti wrote:
    ....

    If the FIFO implementation is based on just two pointers (read and
    write), and each pointer is modified by just one of the two threads
    (main thread = reader, and interrupt handler = writer), and those
    modifications are both "volatile" AND atomic (which has not been
    discussed so far, IIRC...), then one can do without a critical
    region. But then detection of a full buffer needs one "wasted"
    element in the buffer.

    Why atomic?


    If the read/write pointers/indices are, say, 16 bits, but the
    processor has only 8-bit store/load instructions, updating a
    pointer/index happens non-atomically, 8 bits at a time, and the
    interrupt handler can read a half-updated value if the interrupt
    happens in the middle of an update. That would certainly mess up the
    comparison between the read and write points in the interrupt handler.

    In the OP's code, I suppose (but I don't recall) that the indices are
    8 bits, so probably atomically readable and writable.


    Ah, well, this is a possible scenario in a multicore system (or single
    core if the two bytes are written by separate opcodes).

    For the AVR, writing 16-bit values is not atomic - but the OP used 8-bit counters (which are more appropriate for the AVR anyway, as they don't
    have enough memory to spend on big buffers).

    For multi-core systems you can have added complications. Memory
    accesses are always seen in assembly-code order on one core, regardless
    of how they may be re-ordered by buffers, caches, out-of-order
    execution, speculative execution, etc. (CPU designers sometimes have to
    work quite hard to achieve this, but anything else would be impossible
    to work with.) However, the order seen by other cores could be
    different. So "volatile" accesses are no longer enough - you need to
    use C11/C++11 atomics, or the equivalent.

    (Don't use C11/C++11 atomics on gcc for the Cortex-M or AVR, at least
    not with anything that can't be done with a single read or write
    instruction - the library that comes with gcc is deeply flawed.)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Don Y@21:1/5 to antispam@math.uni.wroc.pl on Mon Oct 25 15:24:38 2021
    On 10/25/2021 2:32 PM, antispam@math.uni.wroc.pl wrote:
    Don Y <blockedofcourse@foo.invalid> wrote:
    On 10/24/2021 4:14 AM, Dimiter_Popoff wrote:
    Disable interrupts while accessing the fifo. you really have to.
    alternatively you'll often get away not using a fifo at all,
    unless you're blocking for a long while in some part of the code.

    Why would you do that. The fifo write pointer is only modified by
    the interrupt handler, the read pointer is only modified by the
    interrupted code. Has been done so for times immemorial.

    The OPs code doesn't differentiate between FIFO full and empty.

    If you read carefuly what he wrote you would know that he does.
    The trick he uses is that his indices may point outside buffer:
    empty is equal indices, full is difference equal to buffer

    Doesn't matter as any index can increase by any amount and
    invalidate the "reality" of the buffer's contents (i.e.
    actual number of characters that have been tranfered to
    that region of memory).

    Buffer size is 128, for example. in is 127, out is 127.
    What's that mean? Can you tell me what has happened prior
    to this point in time? Have 127 characters been received?
    Or, 383? Or, 1151?

    How many characters have been removed from the buffer?
    (same numeric examples).

    Repeat for .in being 129, 227, 255, etc.

    Remember, there is nothing that GUARANTEES that the uart
    task is keeping up with the input data rate. So, the buffer
    could wrap 50 times and it's instantaneous state (visible
    via .in and .out) would appear unchanged to that task!

    If you wanted to rely on .in != .out to indicate the
    presence of data REGARDLESS OF WRAPS, then you'd need the
    size of each to be "significantly larger" than the maximum
    fill rate of the buffer to ensure THEY can't wrap.

    size. Of course his approach has its own limitations, like
    buffer size being power of 2 and with 8 bit indices maximal
    buffer size is 128.

    The biggest practical limitation is that of expectations of
    other developers who may inherit (or copy) his code expecting
    the FIFO to be "well behaved".

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Damon@21:1/5 to pozz on Mon Oct 25 20:31:12 2021
    On 10/25/21 2:15 PM, pozz wrote:
    Il 23/10/2021 18:09, David Brown ha scritto:
    [...]
    Marking "in" and "buf" as volatile is /far/ better than using a critical
    section, and likely to be more efficient than a memory barrier.  You can
    also use volatileAccess rather than making buf volatile, and it is often
    slightly more efficient to cache volatile variables in a local variable
    while working with them.

    I think I got your point, but I'm wondering why there are plenty of
    examples of ring-buffer implementations that don't use volatile at all,
    even if the author explicitly refers to interrupts and multithreading.

    Just an example[1] by Quantum Leaps. It promises to be a *lock-free* (I
    think thread-safe) ring-buffer implementation in the scenario of single producer/single consumer (that is my scenario too).

    In the source code there's no use of volatile. I could call
    RingBuf_put() in my rx uart ISR and call RingBuf_get() in my mainloop code.

    From what I learned from you, this code usually works, but the standard doesn't guarantee it will work with every old, current and future
    compilers.



    [1] https://github.com/QuantumLeaps/lock-free-ring-buffer

    The issue with not using 'volatile' (or some similar memory barrier) is
    that without it, the implementation is allowed to delay the actual write
    of the results into the variable.

    If optimization is limited to just within a single translation unit, you
    can force it to work by having the execution path leave the translation
    unit, but with whole program optimization, it is theoretically possible
    that the implementation sees that the thread of execution NEVER needs it
    to be spilled out of the registers to memory, so the ISR will never see
    the change.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Don Y@21:1/5 to antispam@math.uni.wroc.pl on Tue Oct 26 17:52:59 2021
    On 10/26/2021 5:20 PM, antispam@math.uni.wroc.pl wrote:
    Don Y <blockedofcourse@foo.invalid> wrote:
    On 10/25/2021 2:32 PM, antispam@math.uni.wroc.pl wrote:
    Don Y <blockedofcourse@foo.invalid> wrote:
    On 10/24/2021 4:14 AM, Dimiter_Popoff wrote:
    Disable interrupts while accessing the fifo. you really have to.
    alternatively you'll often get away not using a fifo at all,
    unless you're blocking for a long while in some part of the code.

    Why would you do that. The fifo write pointer is only modified by
    the interrupt handler, the read pointer is only modified by the
    interrupted code. Has been done so for times immemorial.

    The OPs code doesn't differentiate between FIFO full and empty.

    If you read carefuly what he wrote you would know that he does.
    The trick he uses is that his indices may point outside buffer:
    empty is equal indices, full is difference equal to buffer

    Doesn't matter as any index can increase by any amount and
    invalidate the "reality" of the buffer's contents (i.e.
    actual number of characters that have been tranfered to
    that region of memory).

    AFAIK OP considers this not a problem in his application.

    And I don't think I have to test for division by zero -- as
    *my* code is the code that is passing numerator and denominator
    to that operator, right?

    Can you remember all of the little assumptions you've made in
    any non-trivial piece of code -- a week later? a month later?
    6 months later (when a bug manifests or a feature upgrade
    is requested)?

    Do not check the inputs of routines for validity -- assume everything is correct (cuz YOU wrote it to be so, right?).

    Do not handle error conditions -- because they can't exist (because
    you wrote the code and feel confident that you've anticipated
    every contingency -- including those for future upgrades).

    Ignore compiler warnings -- surely you know better than a silly
    "generic" program!

    Would you hire someone who viewed your product's quality (and
    your reputation) in this regard?

    Of course, if such changes were a problem he would need to
    add test preventing writing to full buffer (he already have
    test preventing reading from empty buffer).

    Buffer size is 128, for example. in is 127, out is 127.
    What's that mean?

    Empty buffer.

    No, it means you can't sort out *if* there have been any characters
    received, based solely on this fact (and, what other facts are there
    to observe?)

    Can you tell me what has happened prior
    to this point in time? Have 127 characters been received?
    Or, 383? Or, 1151?

    Does not matter.

    Of course it does! Something has happened that the code MIGHT have
    detected in other circumstances (e.g., if uart_task had been invoked
    more frequently). The world has changed and the code doesn't know it.
    Why write code that only *sometimes* works?

    How many characters have been removed from the buffer?
    (same numeric examples).

    The same as has been stored. Point is that received is
    always bigger or equal to removed and does not exceed
    removed by more than 128. So you can exactly recover
    difference between received and removed.

    If it can wrap, then "some data" can look like "no data".
    If "no data", then NOTHING has been received -- from the
    viewpoint of the code.

    Tell me what prevents 256 characters from being received
    after .in (and .out) are initially 0 -- without any
    indication of their presence. What "limits" the difference
    to "128"? Do you see any conditionals in the code that
    do so? Is there some magic in the hardware that enforces
    this?

    This is how you end up with bugs in your code. The sorts
    of bugs that you can witness -- with your own eyes -- and
    never reproduce (until the code has been released and
    lots of customers' eyes witness it as well).

    The biggest practical limitation is that of expectations of
    other developers who may inherit (or copy) his code expecting
    the FIFO to be "well behaved".

    Well, personally I would avoid storing to full buffer. And
    even on small MCU it is not clear for me if his "savings"
    are worth it. But his core design is sound.

    Concerning other developers, I always working on assumption
    that code is "as is" and any claim what it is doing are of
    limited value unless there is convincing argument (proof
    or outline of proof) what it is doing.

    Ever worked on 100KLoC projects? 500KLoC? Do you personally examine
    the entire codebase before you get started? Do you purchase source
    licenses for every library that you rely upon in your design?
    (or, do you just assume software vendors are infallible?)

    How would you feel if a fellow worker told you "yeah, the previous
    guy had a habit of cutting corners in his FIFO management code"?
    Or, "the previous guy always assumed malloc would succeed and
    didn't even build an infrastructure to address the possibility
    of it failing"

    You could, perhaps, grep(1) for "malloc" or "FIFO" and manually
    examine those code fragments. What about division operators?
    Or, verifying that data types never overflow their limits? Or...

    Fact that code
    worked well in past system(s) is rather unconvincing.
    I have seen small (few lines) pieces of code that contained
    multiple bugs. And that code was in "production" use
    for several years and passed its tests.

    Certainly code like FIFO-s where there are multiple tradeofs
    and actual code tends to be relatively small deserves
    examination before re-use.

    It's not "FIFO code". It's a UART driver. Do you examine every piece
    of code that might *contain* a FIFO? How do you know that there *is* a FIFO
    in a piece of code -- without manually inspecting it? What if it is a
    FIFO mechanism but not explicitly named as a FIFO?

    One wants to be able to move towards the goal of software *components*.
    You don't want to have to inspect the design of every *diode* that
    you use; you want to look at it's overall specifications and decide
    if those fit your needs.

    Unlikely that this code will describe itself as "works well enough
    SOME of the time..."

    And, when/if you stumble on such faults, good luck explaining to
    your customer why it's going to take longer to fix and retest the
    *existing* codebase before you can get on with your modifications...

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From antispam@math.uni.wroc.pl@21:1/5 to Don Y on Wed Oct 27 00:20:21 2021
    Don Y <blockedofcourse@foo.invalid> wrote:
    On 10/25/2021 2:32 PM, antispam@math.uni.wroc.pl wrote:
    Don Y <blockedofcourse@foo.invalid> wrote:
    On 10/24/2021 4:14 AM, Dimiter_Popoff wrote:
    Disable interrupts while accessing the fifo. you really have to.
    alternatively you'll often get away not using a fifo at all,
    unless you're blocking for a long while in some part of the code.

    Why would you do that. The fifo write pointer is only modified by
    the interrupt handler, the read pointer is only modified by the
    interrupted code. Has been done so for times immemorial.

    The OPs code doesn't differentiate between FIFO full and empty.

    If you read carefuly what he wrote you would know that he does.
    The trick he uses is that his indices may point outside buffer:
    empty is equal indices, full is difference equal to buffer

    Doesn't matter as any index can increase by any amount and
    invalidate the "reality" of the buffer's contents (i.e.
    actual number of characters that have been tranfered to
    that region of memory).

    AFAIK OP considers this not a problem in his application.
    Of course, if such changes were a problem he would need to
    add test preventing writing to full buffer (he already have
    test preventing reading from empty buffer).

    Buffer size is 128, for example. in is 127, out is 127.
    What's that mean?

    Empty buffer.

    Can you tell me what has happened prior
    to this point in time? Have 127 characters been received?
    Or, 383? Or, 1151?

    Does not matter.

    How many characters have been removed from the buffer?
    (same numeric examples).

    The same as has been stored. Point is that received is
    always bigger or equal to removed and does not exceed
    removed by more than 128. So you can exactly recover
    difference between received and removed.

    The biggest practical limitation is that of expectations of
    other developers who may inherit (or copy) his code expecting
    the FIFO to be "well behaved".

    Well, personally I would avoid storing to full buffer. And
    even on small MCU it is not clear for me if his "savings"
    are worth it. But his core design is sound.

    Concerning other developers, I always working on assumption
    that code is "as is" and any claim what it is doing are of
    limited value unless there is convincing argument (proof
    or outline of proof) what it is doing. Fact that code
    worked well in past system(s) is rather unconvincing.
    I have seen small (few lines) pieces of code that contained
    multiple bugs. And that code was in "production" use
    for several years and passed its tests.

    Certainly code like FIFO-s where there are multiple tradeofs
    and actual code tends to be relatively small deserves
    examination before re-use.

    --
    Waldek Hebisch

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From antispam@math.uni.wroc.pl@21:1/5 to Don Y on Wed Oct 27 05:22:55 2021
    Don Y <blockedofcourse@foo.invalid> wrote:
    On 10/26/2021 5:20 PM, antispam@math.uni.wroc.pl wrote:
    Don Y <blockedofcourse@foo.invalid> wrote:
    On 10/25/2021 2:32 PM, antispam@math.uni.wroc.pl wrote:
    Don Y <blockedofcourse@foo.invalid> wrote:
    On 10/24/2021 4:14 AM, Dimiter_Popoff wrote:
    Disable interrupts while accessing the fifo. you really have to. >>>>>> alternatively you'll often get away not using a fifo at all,
    unless you're blocking for a long while in some part of the code. >>>>>
    Why would you do that. The fifo write pointer is only modified by
    the interrupt handler, the read pointer is only modified by the
    interrupted code. Has been done so for times immemorial.

    The OPs code doesn't differentiate between FIFO full and empty.

    If you read carefuly what he wrote you would know that he does.
    The trick he uses is that his indices may point outside buffer:
    empty is equal indices, full is difference equal to buffer

    Doesn't matter as any index can increase by any amount and
    invalidate the "reality" of the buffer's contents (i.e.
    actual number of characters that have been tranfered to
    that region of memory).

    AFAIK OP considers this not a problem in his application.

    And I don't think I have to test for division by zero -- as
    *my* code is the code that is passing numerator and denominator
    to that operator, right?

    Well, I do not test for zero if I know that divisor must be
    nonzero. To put it differently, having zero in such place
    is a bug and there is already enough machinery so that
    such bug will not remain undetected. Having extra test
    adds no value.

    OTOH is zero is possible, then handling it is part of program
    logic and test is needed to take correct action.

    Can you remember all of the little assumptions you've made in
    any non-trivial piece of code -- a week later? a month later?
    6 months later (when a bug manifests or a feature upgrade
    is requested)?

    Well, my normal practice is that there are no "little assumptions".
    To put it differently, code is structured to make things clear,
    even if this requires more code than some "clever" solution.
    There may be "big assumptions", that is highly nontrivial facts
    used by the code. Some of them are considered "well known",
    with proper naming in code it is easy to recall them years later.
    Some deserve comments/referece. In most of may coding I have
    pretty comfortable situation: for human there is quite clear
    what is valid and what is invalid. So code makes a lot of
    effort to handle valid (but possibly quite unusual) cases

    Do not check the inputs of routines for validity -- assume everything is correct (cuz YOU wrote it to be so, right?).

    Well, correct inputs are part of contract. Some things (like
    array indices inside bounds) are checked, but in general you can
    expect garbage if you pass incorrect input. Most of my code is
    of sort that called routine can not really check validity of input
    (there are complex invariants). Note: here I am talking mostly
    about my non-embedded code (which is majority of my coding).
    In most of may coding I have pretty comfortable situation: for
    human there is quite clear what is valid and what is invalid.
    So code makes a lot of effort to handle valid (but possibly quite
    unusual) cases. User input is normally checked to give sensible
    error message, but some things are deemed to tricky/expensive
    to check. Other routines are deemed "system level", and here
    there us up to user/caller to respect the contract.

    My embedded code consists of rather small systems, and normally
    there are no explicit validity checks. To clarify: when system
    receives commands it recognizes and handles valid commands.
    So there is implicit check: anything not recognized as valid
    is invalid. OTOH frequently there is nothing to do in case
    of errors: if there are no display to print error message,
    no persistent store to log erreor and shuting down is not helpful,
    then what else potential error handler would do?

    I do not check if 12-bit ADC really returns numbers in range.
    My 'print_byte' routine takes integer argument and blindly
    truncates it to 8-bit without worring about possible
    spurious upper bits. "Safety critical" folks my be worried
    by such practice, but my embedded code is fairly non-critical.

    Do not handle error conditions -- because they can't exist (because
    you wrote the code and feel confident that you've anticipated
    every contingency -- including those for future upgrades).

    Ignore compiler warnings -- surely you know better than a silly
    "generic" program!

    Would you hire someone who viewed your product's quality (and
    your reputation) in this regard?

    Well, you do not know what OP code is doing. I would prefer
    my code to be robust and I feel that I am doing resonably
    well here. OTOH, coming back to serial comunication, it
    is not hard to design communication protocal such that in
    normal operation there is no possibility for buffer
    overflow. It would still make sense to add a single line
    to say drop excess characters. But it does not make
    sense to make big story of lack of this line. In particular
    issue that OP wanted to discuss is still valid.

    Of course, if such changes were a problem he would need to
    add test preventing writing to full buffer (he already have
    test preventing reading from empty buffer).

    Buffer size is 128, for example. in is 127, out is 127.
    What's that mean?

    Empty buffer.

    No, it means you can't sort out *if* there have been any characters
    received, based solely on this fact (and, what other facts are there
    to observe?)

    Of course you can connect to system and change values of variables
    in debugger, so specific values mean nothing. I am telling
    you what to protocal is. If all part of system (including parts
    that OP skipped) obey the protocal, then you have meaning above.
    If something misbehaves (say cosmic ray flipped a bit), it does
    not mean that protocal is incorrect. Simply _if_ probability
    of misbehaviour is too high you need to fix the system (add
    radiation shielding, appropiate seal to avoid tampering with
    internals, extra checks inside, etc). But what/if to fix
    something is for OP to decide.

    Can you tell me what has happened prior
    to this point in time? Have 127 characters been received?
    Or, 383? Or, 1151?

    Does not matter.

    Of course it does! Something has happened that the code MIGHT have
    detected in other circumstances (e.g., if uart_task had been invoked
    more frequently). The world has changed and the code doesn't know it.
    Why write code that only *sometimes* works?

    All code works only sometimes. Parafrazing famous answer to
    Napoleon: fisrt you need a processor. There are a lot of
    conditons so that code works as intended. Granted, I would
    not skip needed check in real code. But this is obvious
    thing to add. You are somewhat making OP code as "broken
    beyond repair". Well, as discussion showed, OP had problem
    using "volatile" and that IMHO is much more important to
    fix.

    How many characters have been removed from the buffer?
    (same numeric examples).

    The same as has been stored. Point is that received is
    always bigger or equal to removed and does not exceed
    removed by more than 128. So you can exactly recover
    difference between received and removed.

    If it can wrap, then "some data" can look like "no data".
    If "no data", then NOTHING has been received -- from the
    viewpoint of the code.

    Tell me what prevents 256 characters from being received
    after .in (and .out) are initially 0 -- without any
    indication of their presence. What "limits" the difference
    to "128"? Do you see any conditionals in the code that
    do so? Is there some magic in the hardware that enforces
    this?

    That is the protocol. How to avoid violation is different
    matter: dropping characters _may_ be solution. But dropping
    characters means that some data is lost, and how to deal
    with lost data is different issue. As is OP code will loose
    some old data. It is OP problem to decide which failure
    mode is more problematic and how much extra checks are
    needed.

    This is how you end up with bugs in your code. The sorts
    of bugs that you can witness -- with your own eyes -- and
    never reproduce (until the code has been released and
    lots of customers' eyes witness it as well).

    IME it is issues that you can not prodict that catch you.
    The above is obvious issue, and should not be a problem
    (unless designer is seriously incompenent and misjudged
    what can happen).

    The biggest practical limitation is that of expectations of
    other developers who may inherit (or copy) his code expecting
    the FIFO to be "well behaved".

    Well, personally I would avoid storing to full buffer. And
    even on small MCU it is not clear for me if his "savings"
    are worth it. But his core design is sound.

    Concerning other developers, I always working on assumption
    that code is "as is" and any claim what it is doing are of
    limited value unless there is convincing argument (proof
    or outline of proof) what it is doing.

    Ever worked on 100KLoC projects? 500KLoC? Do you personally examine
    the entire codebase before you get started?

    Of course I do not read all code before start. But I accept
    risc that code may turn out to be faulty and I may be forced
    to fix or abandon it. My main project has 450K wc lines.
    I know that parts are wrong and I am working on fixing that
    (which will probably involve substantial rewrite). I worked
    a little on gcc and I can tell you that only sure thing in
    such projects is that there are bugs. Of course, despite
    bugs gcc is quite useful. But I also met Modula 2 compiler
    that carefuly checked programs for violation of language
    rules, but miscompiled nested function calls.

    Do you purchase source
    licenses for every library that you rely upon in your design?
    (or, do you just assume software vendors are infallible?)

    Well, for several years I work exclusively with open source code.
    I see a lot of defects. While my experience with comercial codes
    is limited I do not think that commercial codes have less defects
    than open source ones. In fact, there are reasons to suspect
    that there are more defects in commercial codes.

    How would you feel if a fellow worker told you "yeah, the previous
    guy had a habit of cutting corners in his FIFO management code"?
    Or, "the previous guy always assumed malloc would succeed and
    didn't even build an infrastructure to address the possibility
    of it failing"

    Well, there is a lot of bad code. Sometimes best solution is simply
    to throw it out. In other cases (likely in your malloc scenario above)
    there may be simple workaround (replace malloc by checking version).

    You could, perhaps, grep(1) for "malloc" or "FIFO" and manually
    examine those code fragments.

    Yes, that one of possible appraches.

    What about division operators?

    I have a C parser. In desperation I could try to search parse
    tree or transform program. Or, more likely decide that program
    is broken beyond repair.

    Or, verifying that data types never overflow their limits? Or...

    Well, one thing is to look at structure of program. Code may
    look complicated, but some programs are reasonably testable:
    few random inputs can give some confidence that "main"
    execution path computes correct values. Then you look if
    you can hit limits. Actually, much of my coding is in
    arbitrary precision, so overflow is impossible. Instead
    program may run out of memory. But there parts for speed
    use fixed precision. If I correctly computed limits
    overflow is impossible. But this is big if.

    Fact that code
    worked well in past system(s) is rather unconvincing.
    I have seen small (few lines) pieces of code that contained
    multiple bugs. And that code was in "production" use
    for several years and passed its tests.

    Certainly code like FIFO-s where there are multiple tradeofs
    and actual code tends to be relatively small deserves
    examination before re-use.

    It's not "FIFO code". It's a UART driver. Do you examine every piece
    of code that might *contain* a FIFO? How do you know that there *is* a FIFO in a piece of code -- without manually inspecting it? What if it is a
    FIFO mechanism but not explicitly named as a FIFO?

    One wants to be able to move towards the goal of software *components*.
    You don't want to have to inspect the design of every *diode* that
    you use; you want to look at it's overall specifications and decide
    if those fit your needs.

    Sure, I would love to see really reusable components. But IMHO we
    are quite far from that. There are some things which are reusable
    if you accept modest to severe overhead. For example things tends
    to compose nicely if you dynamically allocate everything and use
    garbage collection. But performace cost may be substantial.
    And in embedded setting garbage collection may be unacceptable.
    In some cases I have found out that I can get much better
    speed joing things that could be done as composition of library
    operations into single big routine. In other cases I fixed
    bugs by replacing composition of library routines by a single
    routine: there were interactions making simple composition
    incorrect. Correct alterantive was single routine.

    As I wrote my embedded programs are simple and small. But I
    use almost no external libraries. Trying some existing libraries
    I have found out that some produce rather large programs, linking
    in a lot of unneeded stuff. Of course, writing for scratch
    will not scale to bigger programs. OTOH, I feel that with
    proper tooling it would be possible to retain efficiency and
    small code size at least for large class of microntroller
    programs (but existing tools and libraries do not support this).

    Unlikely that this code will describe itself as "works well enough
    SOME of the time..."

    And, when/if you stumble on such faults, good luck explaining to
    your customer why it's going to take longer to fix and retest the
    *existing* codebase before you can get on with your modifications...

    Commercial vendors like to say how good their progam are. But
    market reality is that program my be quite bad and still sell.

    --
    Waldek Hebisch

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Don Y@21:1/5 to antispam@math.uni.wroc.pl on Fri Oct 29 15:36:49 2021
    On 10/26/2021 10:22 PM, antispam@math.uni.wroc.pl wrote:
    One wants to be able to move towards the goal of software *components*.
    You don't want to have to inspect the design of every *diode* that
    you use; you want to look at it's overall specifications and decide
    if those fit your needs.

    Sure, I would love to see really reusable components. But IMHO we
    are quite far from that.

    Do you use the standard libraries? Aren't THEY components?
    You rely on the compiler to decide how to divide X by Y -- instead
    of writing your own division routine. How often do you reimplement
    ?printf() to avoid all of the bloat that typically accompanies it?
    (when was the last time you needed ALL of those format specifiers
    in an application? And modifiers?

    There are some things which are reusable
    if you accept modest to severe overhead.

    What you need is components with varying characteristics.
    You can buy diodes with all sorts of current carrying capacities,
    PIVs, package styles, etc. But, they all still perform the
    same function. Why so many different part numbers? Why not
    just use the biggest, baddest diode in ALL your circuits?

    I.e., we readily accept differences in "standard components"
    in other disciplines; why not when it comes to software
    modules?

    For example things tends
    to compose nicely if you dynamically allocate everything and use
    garbage collection. But performace cost may be substantial.
    And in embedded setting garbage collection may be unacceptable.
    In some cases I have found out that I can get much better
    speed joing things that could be done as composition of library
    operations into single big routine.

    Sure, but now you're tuning a solution to a specific problem.
    I've designed custom chips to solve particular problems.
    But, they ONLY solve those particular problems! OTOH,
    I use lots of OTC components in my designs because those have
    been designed (for the most part) with an eye towards
    meeting a variety of market needs.

    In other cases I fixed
    bugs by replacing composition of library routines by a single
    routine: there were interactions making simple composition
    incorrect. Correct alterantive was single routine.

    As I wrote my embedded programs are simple and small. But I
    use almost no external libraries. Trying some existing libraries
    I have found out that some produce rather large programs, linking
    in a lot of unneeded stuff.

    Because they try to address a variety of solution spaces without
    trying to be "optimal" for any. You trade flexibility/capability
    for speed/performance/etc.

    Of course, writing for scratch
    will not scale to bigger programs. OTOH, I feel that with
    proper tooling it would be possible to retain efficiency and
    small code size at least for large class of microntroller
    programs (but existing tools and libraries do not support this).

    Templates are an attempt in this direction. Allowing a class of
    problems to be solved once and then tailored to the specific
    application.

    But, personal experience is where you win the most. You write
    your second or third UART driver and start realizing that you
    could leverage a previous design if you'd just thought it out
    more fully -- instead of tailoring it to the specific needs
    of the original application.

    And, as you EXPECT to be reusing it in other applications (as
    evidenced by the fact that it's your third time writing the same
    piece of code!), you anticipate what those *might* need and
    think about how to implement those features "economically".

    It's rare that an application is *so* constrained that it can't
    afford a couple of extra lines of code, here and there. If
    you've considered efficiency in the design of your algorithms,
    then these little bits of inefficiency will be below the noise floor.

    Unlikely that this code will describe itself as "works well enough
    SOME of the time..."

    And, when/if you stumble on such faults, good luck explaining to
    your customer why it's going to take longer to fix and retest the
    *existing* codebase before you can get on with your modifications...

    Commercial vendors like to say how good their progam are. But
    market reality is that program my be quite bad and still sell.

    The same is true of FOSS -- despite the claim that many eyes (may)
    have looked at it (suggesting that bugs would have been caught!)

    From "KLEE: Unassisted and Automatic Generation of High-Coverage
    Tests for Complex Systems Programs":

    KLEE finds important errors in heavily-tested code. It
    found ten fatal errors in COREUTILS (including three
    that had escaped detection for 15 years), which account
    for more crashing bugs than were reported in 2006, 2007
    and 2008 combined. It further found 24 bugs in BUSYBOX, 21
    bugs in MINIX, and a security vulnerability in HISTAR– a
    total of 56 serious bugs.

    Ooops! I wonder how many FOSS *eyes* missed those errors?

    Every time you reinvent a solution, you lose much of the benefit
    of the previous TESTED solution.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From antispam@math.uni.wroc.pl@21:1/5 to Don Y on Sun Oct 31 22:54:22 2021
    Don Y <blockedofcourse@foo.invalid> wrote:
    On 10/26/2021 10:22 PM, antispam@math.uni.wroc.pl wrote:
    One wants to be able to move towards the goal of software *components*.
    You don't want to have to inspect the design of every *diode* that
    you use; you want to look at it's overall specifications and decide
    if those fit your needs.

    Sure, I would love to see really reusable components. But IMHO we
    are quite far from that.

    Do you use the standard libraries?

    Yes, I uses libraries when appropriate.

    Aren't THEY components?

    Well, some folks expect more from components than from
    traditional libraries. Some evan claim to deliver.
    However, libraries have limitations and ATM I see nothing
    that fundamentally change situation.

    You rely on the compiler to decide how to divide X by Y -- instead
    of writing your own division routine.

    Well, normally in C code I relay on compiler provied division.
    To say the truth, my MCU code uses division sparingly, only
    when I can not avoid it. OTOH I also use languages with
    multiprecision integers. In one case I use complier provided
    routines, but I am provider of modifed compiler and modification
    includes replacement of division routine. In other case I
    override compiler supplied division routine by my own (which
    in turn sends real work to external library).

    How often do you reimplement
    ?printf() to avoid all of the bloat that typically accompanies it?

    I did that once (for OS kernel where standard library would not
    work). If needed I can reuse it. On PC-s I am not worried by
    bloat due to printf. OTOH, on MCU-s I am not sure if I ever used
    printf. Rather, printing was done by specialized routines
    either library provided or my own.

    (when was the last time you needed ALL of those format specifiers
    in an application? And modifiers?

    There are some things which are reusable
    if you accept modest to severe overhead.

    What you need is components with varying characteristics.
    You can buy diodes with all sorts of current carrying capacities,
    PIVs, package styles, etc. But, they all still perform the
    same function. Why so many different part numbers? Why not
    just use the biggest, baddest diode in ALL your circuits?

    I heard such electronic analogies many times. But they miss
    important point: there is no way for me to make my own diode,
    I am stuck with what is available on the market. And diode
    is logically pretty simple component, yet we need many kinds.

    I.e., we readily accept differences in "standard components"
    in other disciplines; why not when it comes to software
    modules?

    Well, software is _much_ more compilcated than physical
    engineering artifacts. Physical thing may have 10000 joints,
    but if joints are identical, then this is moral equivalent of
    simple loop that just iterates fixed number of times.
    At software level number of possible pre-composed blocks
    is so large that it is infeasible to deliver all of them.
    Classic trick it to parametrize. However even if you
    parametrize there are hundreds of design decisions going
    into relatively small piece of code. If you expose all
    design decisions then user as well may write his/her own
    code because complexity will be similar. So normaly
    parametrization is limited and there will be users who
    find hardcoded desion choices inadequate.

    Another things is that current tools are rather weak
    at supporting parametrization.

    For example things tends
    to compose nicely if you dynamically allocate everything and use
    garbage collection. But performace cost may be substantial.
    And in embedded setting garbage collection may be unacceptable.
    In some cases I have found out that I can get much better
    speed joing things that could be done as composition of library
    operations into single big routine.

    Sure, but now you're tuning a solution to a specific problem.
    I've designed custom chips to solve particular problems.
    But, they ONLY solve those particular problems! OTOH,
    I use lots of OTC components in my designs because those have
    been designed (for the most part) with an eye towards
    meeting a variety of market needs.

    Maybe I made wrong impression, I think some explanation is in
    place here. I am trying to make my code reusable. For my
    problems performance is important part of reusablity: our
    capability to solve problem is limited by performance and with
    better perfomance users can solve bigger problems. I am
    re-using code that I can and I would re-use more if I could
    but there there are technical obstacles. Also, while I am
    trying to make my code reusable, there are intrusive
    design decision which may interfere with your possiobility
    and willingness to re-use.

    In slightly different spirit: in another thread you wrote
    about accessing disc without OS file cache. Here I
    normaly depend on OS and OS file caching is big thing.
    It is not perfect, but OS (OK, at least Linux) is doing
    this resonably well I have no temptation to avoid it.
    And I appreciate that with OS cache performance is
    usually much better that would be "without cache".
    OTOH, I routinly avoid stdio for I/O critical things
    (so no printf in I/O critical code).

    In other cases I fixed
    bugs by replacing composition of library routines by a single
    routine: there were interactions making simple composition
    incorrect. Correct alterantive was single routine.

    As I wrote my embedded programs are simple and small. But I
    use almost no external libraries. Trying some existing libraries
    I have found out that some produce rather large programs, linking
    in a lot of unneeded stuff.

    Because they try to address a variety of solution spaces without
    trying to be "optimal" for any. You trade flexibility/capability
    for speed/performance/etc.

    I think that this is more subtle: libraries frequently force some
    way of doing things. Which may be good if you try to quickly roll
    solution and are within capabilities of library. But if you
    need/want different design, then library may be too inflexible
    to deliver it.

    Of course, writing for scratch
    will not scale to bigger programs. OTOH, I feel that with
    proper tooling it would be possible to retain efficiency and
    small code size at least for large class of microntroller
    programs (but existing tools and libraries do not support this).

    Templates are an attempt in this direction. Allowing a class of
    problems to be solved once and then tailored to the specific
    application.

    Yes, templates could help. But they also have problems. One
    of them is that (among others) I would like to target STM8
    and I have no C++ compiler for STM8. My idea is to create
    custom "optimizer/generator" for (annotated) C code.
    ATM it is vapourware, but I think it is feasible with
    reasonable effort.

    But, personal experience is where you win the most. You write
    your second or third UART driver and start realizing that you
    could leverage a previous design if you'd just thought it out
    more fully -- instead of tailoring it to the specific needs
    of the original application.

    And, as you EXPECT to be reusing it in other applications (as
    evidenced by the fact that it's your third time writing the same
    piece of code!), you anticipate what those *might* need and
    think about how to implement those features "economically".

    It's rare that an application is *so* constrained that it can't
    afford a couple of extra lines of code, here and there. If
    you've considered efficiency in the design of your algorithms,
    then these little bits of inefficiency will be below the noise floor.

    Well, I am not talking about "couple of extra lines". Rather
    about IMO substantial fixed overhead. As I wrote, one of my
    targets is STM8 with 8k flash, another is MSP430 with 16k flash,
    another is STM32 with 16k flash (there are also bigger targets).
    One of libraries/frameworks for STM32 after activating few featurs
    pulled in about 16k code, this is substantial overhead given
    how little features I needed. Other folks reported that for
    trivial programs vendor supplied frameworks pulled close to 30k
    code. That may be fine if you have bigger device and need features,
    but for smaller MCU-s it may be difference between not fitting into
    device or (without library) having plenty of free space.

    When I tried it Free RTOS for STM32 needed about 8k flash. Which
    is fine if you need RTOS. But ATM my designs run without RTOS.

    I have found libopencm3 to have small overhead. But is routines
    are doing so little that direct register access may give simpler
    code.

    Unlikely that this code will describe itself as "works well enough
    SOME of the time..."

    And, when/if you stumble on such faults, good luck explaining to
    your customer why it's going to take longer to fix and retest the
    *existing* codebase before you can get on with your modifications...

    Commercial vendors like to say how good their progam are. But
    market reality is that program my be quite bad and still sell.

    The same is true of FOSS -- despite the claim that many eyes (may)
    have looked at it (suggesting that bugs would have been caught!)

    From "KLEE: Unassisted and Automatic Generation of High-Coverage
    Tests for Complex Systems Programs":

    KLEE finds important errors in heavily-tested code. It
    found ten fatal errors in COREUTILS (including three
    that had escaped detection for 15 years), which account
    for more crashing bugs than were reported in 2006, 2007
    and 2008 combined. It further found 24 bugs in BUSYBOX, 21
    bugs in MINIX, and a security vulnerability in HISTAR? a
    total of 56 serious bugs.

    Ooops! I wonder how many FOSS *eyes* missed those errors?

    Open source folks tend to be more willing to talk about bugs.
    And the above nicely shows that there is a lot of bugs, most
    waiting to by discovered.

    Every time you reinvent a solution, you lose much of the benefit
    of the previous TESTED solution.

    TESTED part works for simple repeatable tasks. But if you have
    complex task it is quite likely that you will be the first
    person with given use case. gcc is borderline case: if you
    throw really new code at it you can expect to see bugs.
    gcc user community it large and there is resonable chance that
    sombody wrote earlier code which is sufficiently similar to
    yours to catch troubles. But there are domains that are at
    least as complicated as compilation and have much smaller
    user community. You may find out that there are _no_ code
    that could be reasonably re-used. Were you ever in situation
    when you looked how some "standard library" solves a tricky
    problem and realized that in fact library does not solve
    the problem?

    --
    Waldek Hebisch

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Don Y@21:1/5 to antispam@math.uni.wroc.pl on Sun Oct 31 20:37:25 2021
    On 10/31/2021 3:54 PM, antispam@math.uni.wroc.pl wrote:
    Aren't THEY components?

    Well, some folks expect more from components than from
    traditional libraries. Some evan claim to deliver.
    However, libraries have limitations and ATM I see nothing
    that fundamentally change situation.

    A component is something that you can use as a black box,
    without having to reinvent it. It is the epitome of reuse.

    How often do you reimplement
    ?printf() to avoid all of the bloat that typically accompanies it?

    I did that once (for OS kernel where standard library would not
    work). If needed I can reuse it. On PC-s I am not worried by
    bloat due to printf. OTOH, on MCU-s I am not sure if I ever used
    printf. Rather, printing was done by specialized routines
    either library provided or my own.

    You can also create a ?printf() that you can configure at build time to
    support the modifiers and specifiers that you know you will need.

    Just like you can configure a UART driver to support a FIFO size defined
    at configuration, hardware handshaking, software flowcontrol, the
    high and low water marks for each of those (as they can be different),
    the character to send to request the remote to stop transmitting,
    the character you send to request resumption of transmission, which
    character YOU will recognize as requesting your Tx channel to pause,
    the character (or condition) you will recognize to resume your Tx,
    whether or not you will sample the condition codes in the UART, how
    you read/write the data register, how you read/write the status register,
    etc.

    While these sound like lots of options, they are all relatively
    trivial additions to the code.

    (when was the last time you needed ALL of those format specifiers
    in an application? And modifiers?

    There are some things which are reusable
    if you accept modest to severe overhead.

    What you need is components with varying characteristics.
    You can buy diodes with all sorts of current carrying capacities,
    PIVs, package styles, etc. But, they all still perform the
    same function. Why so many different part numbers? Why not
    just use the biggest, baddest diode in ALL your circuits?

    I heard such electronic analogies many times. But they miss
    important point: there is no way for me to make my own diode,

    Sure there is! It is just not an efficient way of spending your
    resources when you have so many OTS offerings available.

    You can design your own processor. Why do you "settle" for an
    OTS device (ANS: because there is so little extra added value
    you will typically gain from rolling your own vs. the "inefficiency"
    of using a COTS offering)

    I am stuck with what is available on the market. And diode
    is logically pretty simple component, yet we need many kinds.

    I.e., we readily accept differences in "standard components"
    in other disciplines; why not when it comes to software
    modules?

    Well, software is _much_ more compilcated than physical
    engineering artifacts. Physical thing may have 10000 joints,
    but if joints are identical, then this is moral equivalent of
    simple loop that just iterates fixed number of times.

    This is the argument in favor of components. You'd much rather
    read a comprehensive specification ("datasheet") for a software
    component than have to read through all of the code that implements
    it. What if it was implemented in some programming language in
    which you aren't expert? What if it was a binary "BLOB" and
    couldn't be inspected?

    At software level number of possible pre-composed blocks
    is so large that it is infeasible to deliver all of them.

    You don't have to deliver all of them. When you wire a circuit,
    you still have to *solder* connections, don't you? The
    components don't magically glue themselves together...

    Classic trick it to parametrize. However even if you
    parametrize there are hundreds of design decisions going
    into relatively small piece of code. If you expose all
    design decisions then user as well may write his/her own
    code because complexity will be similar. So normaly
    parametrization is limited and there will be users who
    find hardcoded desion choices inadequate.

    Another things is that current tools are rather weak
    at supporting parametrization.

    Look at a fleshy UART driver and think about how you would decompose
    it into N different variants that could be "compile time configurable".
    You'll be surprised as to how easy it is. Even if the actual UART
    hardware differs from instance to instance.

    For example things tends
    to compose nicely if you dynamically allocate everything and use
    garbage collection. But performace cost may be substantial.
    And in embedded setting garbage collection may be unacceptable.
    In some cases I have found out that I can get much better
    speed joing things that could be done as composition of library
    operations into single big routine.

    Sure, but now you're tuning a solution to a specific problem.
    I've designed custom chips to solve particular problems.
    But, they ONLY solve those particular problems! OTOH,
    I use lots of OTC components in my designs because those have
    been designed (for the most part) with an eye towards
    meeting a variety of market needs.

    Maybe I made wrong impression, I think some explanation is in
    place here. I am trying to make my code reusable. For my
    problems performance is important part of reusablity: our
    capability to solve problem is limited by performance and with
    better perfomance users can solve bigger problems. I am
    re-using code that I can and I would re-use more if I could
    but there there are technical obstacles. Also, while I am
    trying to make my code reusable, there are intrusive
    design decision which may interfere with your possiobility
    and willingness to re-use.

    If you don't know where the design is headed, then you can't
    pick the components that it will need.

    I approach a design from the top (down) and bottom (up). This
    lets me gauge the types of information that I *may* have
    available from the hardware -- so I can sort out how to
    approach those limitations from above. E.g., if I can't
    control the data rate of a comm channel, then I either have
    to ensure I can catch every (complete) message *or* design a
    protocol that lets me detect when I've missed something.

    There are costs to both approaches. If I dedicate resource to
    ensuring I don't miss anything, then some other aspect of the
    design will bear that cost. If I rely on detecting missed
    messages, then I have to put a figure on their relative
    likelihood so my device doesn't fail to provide its desired
    functionality (because it is always missing one or two characters
    out of EVERY message -- and, thus, sees NO messages).

    In slightly different spirit: in another thread you wrote
    about accessing disc without OS file cache. Here I
    normaly depend on OS and OS file caching is big thing.
    It is not perfect, but OS (OK, at least Linux) is doing
    this resonably well I have no temptation to avoid it.
    And I appreciate that with OS cache performance is
    usually much better that would be "without cache".
    OTOH, I routinly avoid stdio for I/O critical things
    (so no printf in I/O critical code).

    My point about the cache was that it is of no value in my case;
    I'm not going to revisit a file once I've seen it the first
    time (so why hold onto that data?)

    In other cases I fixed
    bugs by replacing composition of library routines by a single
    routine: there were interactions making simple composition
    incorrect. Correct alterantive was single routine.

    As I wrote my embedded programs are simple and small. But I
    use almost no external libraries. Trying some existing libraries
    I have found out that some produce rather large programs, linking
    in a lot of unneeded stuff.

    Because they try to address a variety of solution spaces without
    trying to be "optimal" for any. You trade flexibility/capability
    for speed/performance/etc.

    I think that this is more subtle: libraries frequently force some
    way of doing things. Which may be good if you try to quickly roll
    solution and are within capabilities of library. But if you
    need/want different design, then library may be too inflexible
    to deliver it.

    Use a different diode.

    Of course, writing for scratch
    will not scale to bigger programs. OTOH, I feel that with
    proper tooling it would be possible to retain efficiency and
    small code size at least for large class of microntroller
    programs (but existing tools and libraries do not support this).

    Templates are an attempt in this direction. Allowing a class of
    problems to be solved once and then tailored to the specific
    application.

    Yes, templates could help. But they also have problems. One
    of them is that (among others) I would like to target STM8
    and I have no C++ compiler for STM8. My idea is to create
    custom "optimizer/generator" for (annotated) C code.
    ATM it is vapourware, but I think it is feasible with
    reasonable effort.

    But, personal experience is where you win the most. You write
    your second or third UART driver and start realizing that you
    could leverage a previous design if you'd just thought it out
    more fully -- instead of tailoring it to the specific needs
    of the original application.

    And, as you EXPECT to be reusing it in other applications (as
    evidenced by the fact that it's your third time writing the same
    piece of code!), you anticipate what those *might* need and
    think about how to implement those features "economically".

    It's rare that an application is *so* constrained that it can't
    afford a couple of extra lines of code, here and there. If
    you've considered efficiency in the design of your algorithms,
    then these little bits of inefficiency will be below the noise floor.

    Well, I am not talking about "couple of extra lines". Rather
    about IMO substantial fixed overhead. As I wrote, one of my
    targets is STM8 with 8k flash, another is MSP430 with 16k flash,
    another is STM32 with 16k flash (there are also bigger targets).
    One of libraries/frameworks for STM32 after activating few featurs
    pulled in about 16k code, this is substantial overhead given
    how little features I needed. Other folks reported that for
    trivial programs vendor supplied frameworks pulled close to 30k

    A "framework" is considerably more than a set of individually
    selectable components. I've designed products with 2KB of code and
    128 bytes of RAM. The "components" were ASM modules instead of
    HLL modules. Each told me how big it was, how much RAM it required,
    how deep the stack penetration when invoked, how many T-states
    (worst case) to execute, etc.

    So, before I designed the hardware, I knew what I would need
    by way of ROM/RAM (before the days of FLASH) and could commit
    the hardware to foil without fear of running out of "space" or
    "time".

    code. That may be fine if you have bigger device and need features,
    but for smaller MCU-s it may be difference between not fitting into
    device or (without library) having plenty of free space.

    Sure. But a component will have a datasheet that tells you what
    it provides and at what *cost*.

    When I tried it Free RTOS for STM32 needed about 8k flash. Which
    is fine if you need RTOS. But ATM my designs run without RTOS.

    RTOS is a commonly misused term. Many are more properly called
    MTOSs (they provide no real timeliness guarantees, just multitasking primitives).

    IMO, the advantages of writing in a multitasking environment so
    far outweigh the "costs" of an MTOS that it behooves one to consider
    how to shoehorn that functionality into EVERY design.

    When writing in a HLL, there are complications that impose
    constraints on how the MTOS provides its services. But, for small
    projects written in ASM, you can gain the benefits of an MTOS
    for very few bytes of code (and effectively zero RAM).

    I have found libopencm3 to have small overhead. But is routines
    are doing so little that direct register access may give simpler
    code.

    Unlikely that this code will describe itself as "works well enough
    SOME of the time..."

    And, when/if you stumble on such faults, good luck explaining to
    your customer why it's going to take longer to fix and retest the
    *existing* codebase before you can get on with your modifications...

    Commercial vendors like to say how good their progam are. But
    market reality is that program my be quite bad and still sell.

    The same is true of FOSS -- despite the claim that many eyes (may)
    have looked at it (suggesting that bugs would have been caught!)

    From "KLEE: Unassisted and Automatic Generation of High-Coverage
    Tests for Complex Systems Programs":

    KLEE finds important errors in heavily-tested code. It
    found ten fatal errors in COREUTILS (including three
    that had escaped detection for 15 years), which account
    for more crashing bugs than were reported in 2006, 2007
    and 2008 combined. It further found 24 bugs in BUSYBOX, 21
    bugs in MINIX, and a security vulnerability in HISTAR? a
    total of 56 serious bugs.

    Ooops! I wonder how many FOSS *eyes* missed those errors?

    Open source folks tend to be more willing to talk about bugs.
    And the above nicely shows that there is a lot of bugs, most
    waiting to by discovered.

    Part of the problem is ownership of the codebase. You are
    more likely to know where your own bugs lie -- and, more
    willing to fix them ("pride of ownership"). When a piece
    of code is shared, over time, there seems to be less incentive
    for folks to tackle big -- often dubious -- issues as the
    "reward" is minimal (i.e., you may not own the code when the bug
    eventually becomes a problem)

    Every time you reinvent a solution, you lose much of the benefit
    of the previous TESTED solution.

    TESTED part works for simple repeatable tasks. But if you have
    complex task it is quite likely that you will be the first
    person with given use case. gcc is borderline case: if you
    throw really new code at it you can expect to see bugs.
    gcc user community it large and there is resonable chance that
    sombody wrote earlier code which is sufficiently similar to
    yours to catch troubles. But there are domains that are at
    least as complicated as compilation and have much smaller
    user community. You may find out that there are _no_ code
    that could be reasonably re-used. Were you ever in situation
    when you looked how some "standard library" solves a tricky
    problem and realized that in fact library does not solve
    the problem?

    As I said, your *personal* experience tells you where YOU will
    likely benefit. I did a stint with a company that manufactured telecommunications kit. We had all sorts of bizarre interface
    protocols with which we had to contend (e.g., using RLSD as
    a hardware "pacing" signal). So, it was worthwhile to spend
    time developing a robust UART driver (and handler, above it)
    as you *knew* the next project would likely have need of it,
    in some form or other.

    If you're working free-lance and client A needs a BITBLTer
    for his design, you have to decide how likely client B
    (that you haven't yet met) will be to need the same sort
    of module/component.

    For example, I've never (until recently) needed to interface
    to a disk controller in a product. So, I don't have a
    ready-made "component" in my bag-of-tricks. When I look
    at a new project, I "take inventory" of what I am likely to
    need... and compare that to what I know I have "in stock".
    If there's a lot of overlap, then my confidence in my bid
    goes up. If there'a a lot of new ground that I'll have to
    cover, then it goes down (and the price goes up!).

    Reuse helps you better estimate new projects, especially as
    projects grow in complexity.

    [There's nothing worse than having to upgrade someone else's
    design that didn't plan for the future. It's as if you
    have to redesign the entire product from scratch --- despite
    the fact that it *seems* to work, "as is" (but, not "as desired"!]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From antispam@math.uni.wroc.pl@21:1/5 to Don Y on Thu Nov 11 04:34:05 2021
    Don Y <blockedofcourse@foo.invalid> wrote:
    On 10/31/2021 3:54 PM, antispam@math.uni.wroc.pl wrote:
    Aren't THEY components?

    Well, some folks expect more from components than from
    traditional libraries. Some evan claim to deliver.
    However, libraries have limitations and ATM I see nothing
    that fundamentally change situation.

    A component is something that you can use as a black box,
    without having to reinvent it. It is the epitome of reuse.

    How often do you reimplement
    ?printf() to avoid all of the bloat that typically accompanies it?

    I did that once (for OS kernel where standard library would not
    work). If needed I can reuse it. On PC-s I am not worried by
    bloat due to printf. OTOH, on MCU-s I am not sure if I ever used
    printf. Rather, printing was done by specialized routines
    either library provided or my own.

    You can also create a ?printf() that you can configure at build time to support the modifiers and specifiers that you know you will need.

    Just like you can configure a UART driver to support a FIFO size defined
    at configuration, hardware handshaking, software flowcontrol, the
    high and low water marks for each of those (as they can be different),
    the character to send to request the remote to stop transmitting,
    the character you send to request resumption of transmission, which
    character YOU will recognize as requesting your Tx channel to pause,
    the character (or condition) you will recognize to resume your Tx,
    whether or not you will sample the condition codes in the UART, how
    you read/write the data register, how you read/write the status register, etc.

    While these sound like lots of options, they are all relatively
    trivial additions to the code.

    (when was the last time you needed ALL of those format specifiers
    in an application? And modifiers?

    There are some things which are reusable
    if you accept modest to severe overhead.

    What you need is components with varying characteristics.
    You can buy diodes with all sorts of current carrying capacities,
    PIVs, package styles, etc. But, they all still perform the
    same function. Why so many different part numbers? Why not
    just use the biggest, baddest diode in ALL your circuits?
    <snip>
    I am stuck with what is available on the market. And diode
    is logically pretty simple component, yet we need many kinds.

    I.e., we readily accept differences in "standard components"
    in other disciplines; why not when it comes to software
    modules?

    Well, software is _much_ more compilcated than physical
    engineering artifacts. Physical thing may have 10000 joints,
    but if joints are identical, then this is moral equivalent of
    simple loop that just iterates fixed number of times.

    This is the argument in favor of components. You'd much rather
    read a comprehensive specification ("datasheet") for a software
    component than have to read through all of the code that implements
    it.

    Well, if there is simple to use component that performs what
    you need, then using it is fine. However, for many tasks
    once component is flexible enough to cover both your and
    my needs its specification may be longer and more tricky
    than code doing task at hand.

    What if it was implemented in some programming language in
    which you aren't expert? What if it was a binary "BLOB" and
    couldn't be inspected?

    There are many reasons when existing code can not be reused.
    Concerning BLOB-s, I am trying to avoid them and in first
    order approximation I am not using them. One (serious IMO)
    problem with BLOB-s is that sooner or later they will be
    incompatible with other things (OS/other libraries/my code).
    Very old source code usually can be run on modern systems
    with modest effort. BLOB-s normally would require much
    more effort.

    At software level number of possible pre-composed blocks
    is so large that it is infeasible to deliver all of them.

    You don't have to deliver all of them. When you wire a circuit,
    you still have to *solder* connections, don't you? The
    components don't magically glue themselves together...

    Yes, one needs to make connections. In fact, in programming
    most work is "making connections". So you want something
    which is simple to connect. In other words, you can all
    parts of your design to play nicely together. With code
    deliverd by other folks that is not always the case.

    Classic trick it to parametrize. However even if you
    parametrize there are hundreds of design decisions going
    into relatively small piece of code. If you expose all
    design decisions then user as well may write his/her own
    code because complexity will be similar. So normaly
    parametrization is limited and there will be users who
    find hardcoded desion choices inadequate.

    Another things is that current tools are rather weak
    at supporting parametrization.

    Look at a fleshy UART driver and think about how you would decompose
    it into N different variants that could be "compile time configurable". You'll be surprised as to how easy it is. Even if the actual UART
    hardware differs from instance to instance.

    UART-s are simple. And yet some things are tricky: in C to have
    "compile time configurable" buffer size you need to use macros.
    Works, but in a sense UART implementation "leaks" to user code.

    For example things tends
    to compose nicely if you dynamically allocate everything and use
    garbage collection. But performace cost may be substantial.
    And in embedded setting garbage collection may be unacceptable.
    In some cases I have found out that I can get much better
    speed joing things that could be done as composition of library
    operations into single big routine.

    Sure, but now you're tuning a solution to a specific problem.
    I've designed custom chips to solve particular problems.
    But, they ONLY solve those particular problems! OTOH,
    I use lots of OTC components in my designs because those have
    been designed (for the most part) with an eye towards
    meeting a variety of market needs.

    Maybe I made wrong impression, I think some explanation is in
    place here. I am trying to make my code reusable. For my
    problems performance is important part of reusablity: our
    capability to solve problem is limited by performance and with
    better perfomance users can solve bigger problems. I am
    re-using code that I can and I would re-use more if I could
    but there there are technical obstacles. Also, while I am
    trying to make my code reusable, there are intrusive
    design decision which may interfere with your possiobility
    and willingness to re-use.

    If you don't know where the design is headed, then you can't
    pick the components that it will need.

    Well, there are routine tasks, for them it is natural to
    re-use existing code. There are new tasks that are "almost"
    routine, than one can come with good design at the start.
    But in a sense "interesting" tasks are when at start you
    have only limited understanding. In such case it is hard
    to know "where the design is headed", except that it is
    likely to change. Of course, customer may be dissatisfied
    if you tell "I will look at the problem and maybe I will
    find solution". But lack of understanding is normal
    in research (at starting point), and I think that software
    houses also do risky projects hoping that big win on succesful
    ones will cover losses on failures.

    I approach a design from the top (down) and bottom (up). This
    lets me gauge the types of information that I *may* have
    available from the hardware -- so I can sort out how to
    approach those limitations from above. E.g., if I can't
    control the data rate of a comm channel, then I either have
    to ensure I can catch every (complete) message *or* design a
    protocol that lets me detect when I've missed something.

    Well, with UART there will be some fixed transmission rate
    (with wrong clock frequency UART would be unable to receive
    anything). I would expect MCU to be able to receive all
    incoming characters (OK, assuming hardware UART with drivier
    using high priority interrupt). So, detecting that you got too
    much should not be too hard. OTOH, sensibly handling
    excess input is different issue: if characters are coming
    faster than you can process them, then either your CPU is
    underpowered or there is some failure causing excess transmission.
    In either case specific application will dictate what
    should be avoided.

    There are costs to both approaches. If I dedicate resource to
    ensuring I don't miss anything, then some other aspect of the
    design will bear that cost. If I rely on detecting missed
    messages, then I have to put a figure on their relative
    likelihood so my device doesn't fail to provide its desired
    functionality (because it is always missing one or two characters
    out of EVERY message -- and, thus, sees NO messages).

    My thinking goes toward using relatively short messages and
    buffer big enough for two messages. If there is need for
    high speed I would go for continous messages and DMA
    transfers (using break interrupt to discover end of message
    in case of variable length messages). So device should
    be able to get all messages and in case of excess message
    trafic whole message could be dropped (possibly looking
    first for some high priority messages). Of course, there
    may be some externaly mandated message format and/or
    communitation protocal making DMA inappropriate.
    Still, assuming interrupts, all characters should reach
    interrupt handler, causing possibly some extra CPU
    load. The only possiblity of unnoticed loss of characters
    would be blocking interrupts too long. If interrupts can
    be blocked for too long, then I would expect loss of whole
    messages. In such case protocol should have something like
    "dont talk to me for next 100 miliseconds, I will be busy"
    to warn other nodes and request silence. Now, if you
    need to faithfully support sillyness like Modbus RTU timeouts,
    then I hope that you are adequatly paid...

    In slightly different spirit: in another thread you wrote
    about accessing disc without OS file cache. Here I
    normaly depend on OS and OS file caching is big thing.
    It is not perfect, but OS (OK, at least Linux) is doing
    this resonably well I have no temptation to avoid it.
    And I appreciate that with OS cache performance is
    usually much better that would be "without cache".
    OTOH, I routinly avoid stdio for I/O critical things
    (so no printf in I/O critical code).

    My point about the cache was that it is of no value in my case;
    I'm not going to revisit a file once I've seen it the first
    time (so why hold onto that data?)

    Well, OS "cache" has many functions. One of them is read-ahead,
    another is scheduling of requests to minimize seek time.
    And beside data there is also meta-data. OS functions need
    access to meta-data and OS-es are designed under assumption
    that there is decent cache hit rate on meta-data access.

    In other cases I fixed
    bugs by replacing composition of library routines by a single
    routine: there were interactions making simple composition
    incorrect. Correct alterantive was single routine.

    As I wrote my embedded programs are simple and small. But I
    use almost no external libraries. Trying some existing libraries
    I have found out that some produce rather large programs, linking
    in a lot of unneeded stuff.

    Because they try to address a variety of solution spaces without
    trying to be "optimal" for any. You trade flexibility/capability
    for speed/performance/etc.

    I think that this is more subtle: libraries frequently force some
    way of doing things. Which may be good if you try to quickly roll
    solution and are within capabilities of library. But if you
    need/want different design, then library may be too inflexible
    to deliver it.

    Use a different diode.

    Well, when needed I use my own library.

    Of course, writing for scratch
    will not scale to bigger programs. OTOH, I feel that with
    proper tooling it would be possible to retain efficiency and
    small code size at least for large class of microntroller
    programs (but existing tools and libraries do not support this).

    Templates are an attempt in this direction. Allowing a class of
    problems to be solved once and then tailored to the specific
    application.

    Yes, templates could help. But they also have problems. One
    of them is that (among others) I would like to target STM8
    and I have no C++ compiler for STM8. My idea is to create
    custom "optimizer/generator" for (annotated) C code.
    ATM it is vapourware, but I think it is feasible with
    reasonable effort.

    But, personal experience is where you win the most. You write
    your second or third UART driver and start realizing that you
    could leverage a previous design if you'd just thought it out
    more fully -- instead of tailoring it to the specific needs
    of the original application.

    And, as you EXPECT to be reusing it in other applications (as
    evidenced by the fact that it's your third time writing the same
    piece of code!), you anticipate what those *might* need and
    think about how to implement those features "economically".

    It's rare that an application is *so* constrained that it can't
    afford a couple of extra lines of code, here and there. If
    you've considered efficiency in the design of your algorithms,
    then these little bits of inefficiency will be below the noise floor.

    Well, I am not talking about "couple of extra lines". Rather
    about IMO substantial fixed overhead. As I wrote, one of my
    targets is STM8 with 8k flash, another is MSP430 with 16k flash,
    another is STM32 with 16k flash (there are also bigger targets).
    One of libraries/frameworks for STM32 after activating few featurs
    pulled in about 16k code, this is substantial overhead given
    how little features I needed. Other folks reported that for
    trivial programs vendor supplied frameworks pulled close to 30k

    A "framework" is considerably more than a set of individually
    selectable components. I've designed products with 2KB of code and
    128 bytes of RAM. The "components" were ASM modules instead of
    HLL modules. Each told me how big it was, how much RAM it required,
    how deep the stack penetration when invoked, how many T-states
    (worst case) to execute, etc.

    Nice, but I am not sure how practical this would be in modern
    times. I have C code and can resonably estimate resource use.
    But there are changable parameters which may enable/disable
    some parts. And size/speed/stack use depends on compiler
    optimizations. So there is variation. And there are traps.
    Linker transitively pulls dependencies, it there are "false"
    dependencies, they can pull much more than strictly needed.
    One example of "false" dependence are (or maybe were) C++
    VMT-s. Namely, any use of object/class pulled VMT which in
    turn pulled all ancestors and methods. If unused methods
    referenced other classes that could easily cascade. In both
    cases authors of libraries probably thought that provided
    "goodies" justified size (intended targets were larger).

    So, before I designed the hardware, I knew what I would need
    by way of ROM/RAM (before the days of FLASH) and could commit
    the hardware to foil without fear of running out of "space" or
    "time".

    code. That may be fine if you have bigger device and need features,
    but for smaller MCU-s it may be difference between not fitting into
    device or (without library) having plenty of free space.

    Sure. But a component will have a datasheet that tells you what
    it provides and at what *cost*.

    My 16x2 text LCD routine may pull I2C driver. If I2C is not needed
    anyway, this is additional cost, otherwise cost is shared.
    LCD routine depends also on timer. Both timer and I2C affect
    MCU initialization. So even in very simple situations total
    cost is rather complex. And libraries that I tried presumably
    were not "components" in your sense, you had to link the program
    to learn total size. Documentation mentioned dependencies,
    when they affected correctness but otherwise not. To say
    the truth, when library supports hundreds or thousends of different
    targets (combinations of CPU core, RAM/ROM sizes, peripherial
    configurations) with different compilers, then there is hard
    to make exact statements.

    IMO, in ideal world for "standard" MCU functionality we would
    have configuration tool where user can specify needed
    functionality and tool would generate semi-custom code
    and estimate its resource use. MCU vendor tools attempt to
    offer something like this, but reports I heard were rather
    unfavourable, in particular it seems that vendors simply
    deliver thick library that supports "everything", and
    linking to this library causes code bloat.

    When I tried it Free RTOS for STM32 needed about 8k flash. Which
    is fine if you need RTOS. But ATM my designs run without RTOS.

    RTOS is a commonly misused term. Many are more properly called
    MTOSs (they provide no real timeliness guarantees, just multitasking primitives).

    Well, Free RTOS comes with "no warranty", but AFAICS they make
    honest effort to have good real time behaviour. In particular,
    code paths trough Free RTOS from events to user code are of
    bounded and rather short length. User code still may be
    delayed by interrupts/process priorities, but they give resonable
    explanation. So it is up to user to code things in way that gives
    needed real-time behaviour, but Free RTOS normally will not spoil it
    and may help.

    IMO, the advantages of writing in a multitasking environment so
    far outweigh the "costs" of an MTOS that it behooves one to consider
    how to shoehorn that functionality into EVERY design.

    When writing in a HLL, there are complications that impose
    constraints on how the MTOS provides its services. But, for small
    projects written in ASM, you can gain the benefits of an MTOS
    for very few bytes of code (and effectively zero RAM).

    Well, looking at books and articles I did not find convincing
    argument/example showing that one really need multitasking for
    small systems. I tend to think rather in terms of collection
    of coupled finite state machines (or if you prefer Petri net).
    State machines transition in response to events and may generate
    events. Each finite state machine could be a task. But it is
    not clear if it should. Some transitions are simple and should
    be fast and that I would do in interrupt handlers. Some
    other are triggered in regular way from other machines and
    are naturally handled by function calls. Some need queues.
    The whole thing fits resonably well in "super loop" paradigm.

    I have found one issue that at first glance "requires"
    multitasking. Namely, when one wants to put system in
    sleep mode when there is no work natural "super loop"
    approach looks like

    if (work_to_do) {
    do_work();
    } else {
    wait_for_interrupt();
    }

    where 'work_to_do' is flag which may be set by interrupt handlers.
    But there is nasty race condition, if interrupt comes between
    test for 'work_to_do' and 'wait_for_interrupt': despite
    having work to do system will go to sleep and only wake on
    next interrupt (which depending on specific requirements may
    be harmless or disaster). I was unable to find simple code
    that avoids this race. With multitasking kernel race vanishes:
    there is idle task which is only doing 'wait_for_interrupt'
    and OS scheduler passes control to worker tasks when there is
    work to do. But when one looks how multitasker avoids race,
    then it is clear that crucial point is doing control transfer
    via return from interrupt. More precisely, variables are
    tested with interrupts disabled and after decision is made
    return from interrupt transfers control. Important point is
    that if interrupt comes after control transfer interrupt handler
    will re-do test before returning to user code. So what is needed
    is piece of low-level code that uses return from interrupt for
    control transfer and all interrupt handlers need to jump to
    this code when finished. The rest (usually majority) of
    multitasker is not needed...

    Unlikely that this code will describe itself as "works well enough
    SOME of the time..."

    And, when/if you stumble on such faults, good luck explaining to
    your customer why it's going to take longer to fix and retest the
    *existing* codebase before you can get on with your modifications...

    Commercial vendors like to say how good their progam are. But
    market reality is that program my be quite bad and still sell.

    The same is true of FOSS -- despite the claim that many eyes (may)
    have looked at it (suggesting that bugs would have been caught!)

    From "KLEE: Unassisted and Automatic Generation of High-Coverage
    Tests for Complex Systems Programs":

    KLEE finds important errors in heavily-tested code. It
    found ten fatal errors in COREUTILS (including three
    that had escaped detection for 15 years), which account
    for more crashing bugs than were reported in 2006, 2007
    and 2008 combined. It further found 24 bugs in BUSYBOX, 21
    bugs in MINIX, and a security vulnerability in HISTAR? a
    total of 56 serious bugs.

    Ooops! I wonder how many FOSS *eyes* missed those errors?

    Open source folks tend to be more willing to talk about bugs.
    And the above nicely shows that there is a lot of bugs, most
    waiting to by discovered.

    Part of the problem is ownership of the codebase. You are
    more likely to know where your own bugs lie -- and, more
    willing to fix them ("pride of ownership"). When a piece
    of code is shared, over time, there seems to be less incentive
    for folks to tackle big -- often dubious -- issues as the
    "reward" is minimal (i.e., you may not own the code when the bug
    eventually becomes a problem)

    Ownership may cause problems: there is tendency to "solve"
    problems locally, that is in code that given person "owns".
    This is good if there is easy local solution. However, this
    may also lead to ugly workarounds that really do not work
    well, while problem is easily solvable in different part
    ("owned" by different programmer). I have seen such thing
    several times, looking at whole codebase after some effort
    it was possible to do simple fix, while there were workarounds
    in different ("wrong") places. I had no contact with
    original authors, but it seems that workarounds were due to
    "ownership".

    --
    Waldek Hebisch

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Don Y@21:1/5 to antispam@math.uni.wroc.pl on Fri Nov 19 16:21:49 2021
    On 11/10/2021 9:34 PM, antispam@math.uni.wroc.pl wrote:
    Don Y <blockedofcourse@foo.invalid> wrote:

    Classic trick it to parametrize. However even if you
    parametrize there are hundreds of design decisions going
    into relatively small piece of code. If you expose all
    design decisions then user as well may write his/her own
    code because complexity will be similar. So normaly
    parametrization is limited and there will be users who
    find hardcoded desion choices inadequate.

    Another things is that current tools are rather weak
    at supporting parametrization.

    Look at a fleshy UART driver and think about how you would decompose
    it into N different variants that could be "compile time configurable".
    You'll be surprised as to how easy it is. Even if the actual UART
    hardware differs from instance to instance.

    UART-s are simple. And yet some things are tricky: in C to have
    "compile time configurable" buffer size you need to use macros.
    Works, but in a sense UART implementation "leaks" to user code.

    You can configure using manifest constants, conditional compilation,
    or even run-time switches. Or, by linking against different
    "support" routines. How and where the configuration "leaks"
    into user code is a function of the configuration mechanisms that
    you decide to employ.

    E.g., You'd likely NOT design your network stack to be tightly integrated
    with your choice of NIC (all else being equal) -- simply because you'd
    want to be able to reuse the stack with some *other* NIC without having
    to rewrite it.

    OTOH, it's not unexpected to want to isolate the caching of ARP results
    in an "application specific" manner as you'll likely know the sorts (and number!) of clients/services with which the device in question will be connecting. So, that (sub)module can be replaced with something most appropriate to the application yet with a "standardized" interface to
    the stack itself (*YOU* define that standard)

    All of these require decisions up-front; you can't expect to be able to retrofit an existing piece of code (cheaply) to support a more modular/configurable implementation in the future.

    But, personal experience teaches you what you are likely to need
    by way of flexibility/configurability. Most folks tend to eork
    in a very narrow set of application domains. Chances are, the
    network stack you design for an embedded product will be considerably
    different than one for a desktop OS. If you plan to straddle
    both domains, then the configurability challenge is greater!

    There are costs to both approaches. If I dedicate resource to
    ensuring I don't miss anything, then some other aspect of the
    design will bear that cost. If I rely on detecting missed
    messages, then I have to put a figure on their relative
    likelihood so my device doesn't fail to provide its desired
    functionality (because it is always missing one or two characters
    out of EVERY message -- and, thus, sees NO messages).

    My thinking goes toward using relatively short messages and
    buffer big enough for two messages.

    You can also design with the intent of parsing messages before they are complete and "reducing" them along the way. This is particularly
    important if messages can have varying length *or* there is a possibility
    for the ass end of a message to get dropped (how do you know when the
    message is complete? Imagine THE USER misconfiguring your device
    to expect CRLFs and the traffic only contains newlines; the terminating
    CRLF never arrives!)

    [At the limit case, a message reduces to a concept -- that is represented
    in some application specific manner: "Start the motor", "Clear the screen", etc.]

    Barcodes are messages (character sequences) of a sort. I typically
    process a barcode at several *concurrent* levels:
    - an ISR that captures the times of transitions (black->white->black)
    - a task that reduces the data captured by the ISR into "bar widths"
    - a task that aggregates bar widths to form characters
    - a task that parses character sequences to determine valid messages
    - an application layer interpretation (or discard) of that message
    This allows each layer to decide when the data on which it relies
    does not represent a valid barcode and discard some (or all) of it...
    without waiting for a complete message to be present. So, the
    resources that were consumed by that (partial?) message are
    freed earlier.

    As such, there is never a "start time" nor "end time" for a barcode
    message -- because you don't want the user to have to "do something"
    to tell you that he is now going to scan a barcode (otherwise, the
    efficiency of using barcodes is subverted).

    [Think about the sorts of applications that use barcodes; how many
    require the user to tell the device "here comes a barcode, please start
    your decoder algorithm NOW!"]

    As users can abuse the barcode reader (there is nothing preventing them
    from continuously scanning barcodes, in violation of any "protocol"
    that the product may *intend*), you have to tolerate the case where
    the data arrives faster than it can be consumed. *Knowing* where
    (in the event stream) you may have "lost" some data (transitions,
    widths, characters or messages) lets you resync to a less pathological
    event stream later (when the user starts "behaving properly")

    If there is need for
    high speed I would go for continous messages and DMA
    transfers (using break interrupt to discover end of message
    in case of variable length messages). So device should
    be able to get all messages and in case of excess message
    trafic whole message could be dropped (possibly looking
    first for some high priority messages). Of course, there
    may be some externaly mandated message format and/or
    communitation protocal making DMA inappropriate.
    Still, assuming interrupts, all characters should reach
    interrupt handler, causing possibly some extra CPU
    load. The only possiblity of unnoticed loss of characters
    would be blocking interrupts too long. If interrupts can
    be blocked for too long, then I would expect loss of whole
    messages. In such case protocol should have something like
    "dont talk to me for next 100 miliseconds, I will be busy"
    to warn other nodes and request silence. Now, if you
    need to faithfully support sillyness like Modbus RTU timeouts,
    then I hope that you are adequatly paid...

    IMO, the advantages of writing in a multitasking environment so
    far outweigh the "costs" of an MTOS that it behooves one to consider
    how to shoehorn that functionality into EVERY design.

    When writing in a HLL, there are complications that impose
    constraints on how the MTOS provides its services. But, for small
    projects written in ASM, you can gain the benefits of an MTOS
    for very few bytes of code (and effectively zero RAM).

    Well, looking at books and articles I did not find convincing argument/example showing that one really need multitasking for
    small systems.

    The advantages of multitasking lie in problem decomposition.
    Smaller problems are easier to "get right", in isolation.
    The *challenge* of multitasking is coordinating the interactions
    between these semi-concurrent actors. Experience teaches you how
    to partition a "job".

    I want to blink a light at 1 Hz and check for a button to be
    pressed which will start some action that may be lengthy. I
    can move the light blink into an ISR (which GENERALLY is a ridiculous
    use of that "resource") to ensure the 1Hz timeliness is maintained
    regardless of what the "lengthy" task may be doing, at the time.

    Or, I can break the lengthy task into smaller chunks that
    are executed sequentially with "peeks" at the "light timer"
    between each of those segments.

    sequence1 := sequence2 := sequence3 := sequence4 := 0;
    while (FOREVER) {
    task1:
    case sequence1++ {
    0 => do_task1_step0;
    1 => do_task1_step1;
    2 => do_task1_step2;
    ...
    }

    do_light;

    task2:
    case sequence2++ {
    0 => do_task2_step0;
    1 => do_task2_step1;
    2 => do_task2_step2;
    ...
    }

    do_light;

    task3:
    switch sequence3++ {
    0 => do_task3_step0;
    1 => do_task3_step1;
    2 => do_task3_step2;
    ...
    }

    do_light;

    ...
    }

    When you need to do seven (or fifty) other "lengthy actions"
    concurrently (each of which may introduce other "blinking
    lights" or timeliness constraints), its easier (less brittle)
    to put a structure in place that lets those competing actions
    share the processor without requiring the developer to
    micromanage at this level.

    [50 tasks isn't an unusual load in a small system; video arcade
    games from the early 80's -- 8 bit processors, kilobytes of
    ROM+RAM -- would typically treat each object on the screen
    (including bullets!) as a separate process]

    The above example has low overhead for the apparent concurrency.
    But, pushes all of the work onto the developer's lap. He has
    to carefully size each "step" of each "task" to ensure the
    overall system is responsive.

    A nicer approach is to just let an MTOS handle the switching
    between tasks. But, this comes at a cost of additional run-time
    overhead (e.g., arbitrary context switches).

    I tend to think rather in terms of collection
    of coupled finite state machines (or if you prefer Petri net).
    State machines transition in response to events and may generate
    events. Each finite state machine could be a task. But it is
    not clear if it should. Some transitions are simple and should
    be fast and that I would do in interrupt handlers. Some
    other are triggered in regular way from other machines and
    are naturally handled by function calls. Some need queues.
    The whole thing fits resonably well in "super loop" paradigm.

    I use FSMs for UIs and message parsing. They let the structure
    of the code "rise to the top" where it is more visible (to another
    developer) instead of burying it in subroutines and function calls.

    "Event sources" create events which are consumed by FSMs, as
    needed. So, a "power monitor" could generate POWER_FAIL, LOW_BATTERY, POWER_RESTORED, etc. events while a "keypad decoder" could put out
    ENTER, CLEAR, ALPHA_M, NUMERIC_5, etc. events.

    Because there is nothing *special* about an "event", *ANY* piece of
    code can generate them. Their significance assigns based on where
    they are "placed" (in memory) and who/what can "see" them. So,
    you can use an FSM to parse a message (using "received characters"
    as an ordered stream of events) and "signal" MESSAGE_COMPLETE to
    another FSM that is awaiting "messages" (along with a pointer to the
    completed message)

    From "KLEE: Unassisted and Automatic Generation of High-Coverage
    Tests for Complex Systems Programs":

    KLEE finds important errors in heavily-tested code. It
    found ten fatal errors in COREUTILS (including three
    that had escaped detection for 15 years), which account
    for more crashing bugs than were reported in 2006, 2007
    and 2008 combined. It further found 24 bugs in BUSYBOX, 21
    bugs in MINIX, and a security vulnerability in HISTAR? a
    total of 56 serious bugs.

    Ooops! I wonder how many FOSS *eyes* missed those errors?

    Open source folks tend to be more willing to talk about bugs.
    And the above nicely shows that there is a lot of bugs, most
    waiting to by discovered.

    Part of the problem is ownership of the codebase. You are
    more likely to know where your own bugs lie -- and, more
    willing to fix them ("pride of ownership"). When a piece
    of code is shared, over time, there seems to be less incentive
    for folks to tackle big -- often dubious -- issues as the
    "reward" is minimal (i.e., you may not own the code when the bug
    eventually becomes a problem)

    Ownership may cause problems: there is tendency to "solve"
    problems locally, that is in code that given person "owns".
    This is good if there is easy local solution. However, this
    may also lead to ugly workarounds that really do not work
    well, while problem is easily solvable in different part
    ("owned" by different programmer). I have seen such thing
    several times, looking at whole codebase after some effort
    it was possible to do simple fix, while there were workarounds
    in different ("wrong") places. I had no contact with
    original authors, but it seems that workarounds were due to
    "ownership".

    You are *always* at the mercy of the code's owner. Just as folks
    are at YOUR mercy for the code that you (currently) exert ownership
    over. The best compliments you'll receive are from folks who
    inherit your codebase and can appreciate its structure and
    consistency. Conversely, your worst nightmares will be inheriting
    a codebase that was "hacked together", willy-nilly, by some number
    of predecessors with no real concern over their "product" (code).

    E.g., For FOSS projects, ownership isn't just a matter of who takes "responsibility" for coordinating/merging diffs into the
    codebase but, also, who has a compatible "vision" for the
    codebase, going forward. You'd not want a radically different
    vision from one owner to the next as this leads to gyrations in
    the codebase that will be seen as instability by its users
    (i.e., other developers).

    I use PostgreSQL in my current design. I have no desire to
    *develop* the RDBMS software -- let folks who understand that
    sort of thing work their own magic on the codebase. I can add
    value *elsewhere* in my designs.

    But, I eventually have to take ownership of *a* version of the
    software as I can't expect the "real owners" to maintain some
    version that *I* find useful, possibly years from now. Once
    I assume ownership of that chosen release, it will be my
    priorities and skillset that drive how it evolves. I can
    choose to cherry pick "fixes" from the main branch and back-port
    them into the version I've adopted. Or, decide to live with
    some particular set of problems/bugs/shortcomings.

    If I am prudent, I will attempt to adopt the "style" of the
    original developers in fitting any changes that I make to
    that codebase. I'd want my changes to "blend in" and seem
    consistent with that which preceded them.

    Folks following the main distribution would likely NOT be interested
    in the changes that I choose to embrace as they'll likely have
    different goals than I. But that doesn't say my ownership is
    "toxic", just that it doesn't suit the needs of (most) others.

    ---

    I've got to bow out of this conversation. I made a commitment to
    release 6 designs to manufacturing before year end. As it stands,
    now, it looks like I'll only have time enough for four of them as
    I got "distracted", spending the past few weeks gallavanting (but
    it was wicked fun!).

    OTOH, It won't be fun starting the new year two weeks "behind"... :<

    [Damn holidays eat into my work time. And, no excuse on my part;
    it's not like I didn't KNOW they were coming!! :< ]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)