• Are packed structs slower AND safe?

    From pozz@21:1/5 to All on Tue Oct 26 13:28:50 2021
    In one of my projects that run on a Cortex-M0+ MCU, I have a few arrays
    of structs. Now I need to increase the size of the arrays, but I'm out
    of RAM, so I'm searching for ways to save some space in RAM.

    One simple way is to pack the structs, for example with

    __attribute__((packed))

    in gcc. I can save some padding bytes (that waste some memory) for each
    element of the array, so the total amount of saved space could be enough.

    I know the use of a packed struct forces the compiler to generate a
    slower code, because of misaligned accesses of its members.
    However, besides having a slower code, is the result correct in any case?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to pozz on Tue Oct 26 13:57:01 2021
    On 26/10/2021 13:28, pozz wrote:
    In one of my projects that run on a Cortex-M0+ MCU, I have a few arrays
    of structs. Now I need to increase the size of the arrays, but I'm out
    of RAM, so I'm searching for ways to save some space in RAM.

    One simple way is to pack the structs, for example with

      __attribute__((packed))

    in gcc. I can save some padding bytes (that waste some memory) for each element of the array, so the total amount of saved space could be enough.

    I know the use of a packed struct forces the compiler to generate a
    slower code, because of misaligned accesses of its members.
    However, besides having a slower code, is the result correct in any case?

    Maybe, maybe not - depending on your assumptions, and the details of the
    code.

    Many processors support unaligned access, meaning the code size will be
    the same even if "packed" results in having unaligned accesses. In this
    case, code size is the same but execution time may be lower for the
    actual access. The Cortex-M4, for example, is fine with unaligned access.

    The Cortex-M0+, on the other hand, does not support it. So instead of a
    single 32-bit load, you can quickly end up with a mess:

    #include <stdint.h>

    typedef struct {
    uint32_t b;
    } su;

    typedef struct {
    uint32_t b;
    } __attribute__((packed)) sp;

    int foou(su * p) { return p-> b; }
    int foop(sp * p) { return p-> b; }

    foou:
    ldr r0, [r0]
    bx lr
    foop:
    ldrb r2, [r0, #1]
    ldrb r1, [r0]
    ldrb r3, [r0, #2]
    lsls r2, r2, #8
    ldrb r0, [r0, #3]
    orrs r1, r2
    lsls r3, r3, #16
    orrs r3, r1
    lsls r0, r0, #24
    orrs r0, r3
    bx lr


    This is big, slow, and may be incorrect if you expected atomic loads and stores.


    I have very rarely seen "packed" used in a way that is actually useful.
    Often when people think they need to pack a struct, they can get better results with more careful re-arrangement of the fields, or perhaps
    "manually" packing with bit-fields or other ways of combining fields.
    Another alternative is that instead of having an array of structs, you
    can split the data up into two or three arrays each containing part of
    the data. (This is often done on big systems with cache, as it can lead
    to massive speed increases.)

    Throwing "packed" onto structs is one of the last places I would look
    for some extra ram space, not one of the first.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to pozz on Tue Oct 26 16:11:00 2021
    On 26/10/2021 15:22, pozz wrote:
    Il 26/10/2021 13:57, David Brown ha scritto:
    [...]
    Another alternative is that instead of having an array of structs, you
    can split the data up into two or three arrays each containing part of
    the data.  (This is often done on big systems with cache, as it can lead
    to massive speed increases.)

    Can you explain better this? Thanks


    Change:

    typedef struct {
    uint32_t counter;
    bool valid;
    } counted_thing;

    counted_thing things[1000];

    into:

    uint32_t thing_counters[1000];
    bool thing_valids[1000];

    That turns an 8000 byte array into two arrays of 4000 bytes and 1000
    bytes respectively.

    No padding, inefficient packing, or wasted space. Indeed, with a bit
    more effort the thing_valids[] array could perhaps be packed into 125 bits.

    This is a big issue in game programming - there has been a move away
    from nice C++ objects that are held in an array, to integrating the
    array handling with the object handling so that the data can be arranged
    in a more cache-friendly manner. It's especially useful when your
    objects have critical data that is accessed a lot (such as position) and
    less critical data that is more rarely used (such as cost or name) -
    separate arrays means you can run through the entire array of object
    positions without filling up your caches with low-priority name data.

    But in your case, you can use it to save space in ram (and possibly make stepping through the data a little more efficient as the stride sizes
    are more likely to be powers of two).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From pozz@21:1/5 to All on Tue Oct 26 15:22:18 2021
    Il 26/10/2021 13:57, David Brown ha scritto:
    [...]
    Another alternative is that instead of having an array of structs, you
    can split the data up into two or three arrays each containing part of
    the data. (This is often done on big systems with cache, as it can lead
    to massive speed increases.)

    Can you explain better this? Thanks

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From pozz@21:1/5 to All on Tue Oct 26 16:22:03 2021
    Il 26/10/2021 16:11, David Brown ha scritto:
    On 26/10/2021 15:22, pozz wrote:
    Il 26/10/2021 13:57, David Brown ha scritto:
    [...]
    Another alternative is that instead of having an array of structs, you
    can split the data up into two or three arrays each containing part of
    the data.  (This is often done on big systems with cache, as it can lead >>> to massive speed increases.)

    Can you explain better this? Thanks


    Change:

    typedef struct {
    uint32_t counter;
    bool valid;
    } counted_thing;

    counted_thing things[1000];

    into:

    uint32_t thing_counters[1000];
    bool thing_valids[1000];

    That turns an 8000 byte array into two arrays of 4000 bytes and 1000
    bytes respectively.

    Now I got your point.


    No padding, inefficient packing, or wasted space. Indeed, with a bit
    more effort the thing_valids[] array could perhaps be packed into 125 bits.

    This is a big issue in game programming - there has been a move away
    from nice C++ objects that are held in an array, to integrating the
    array handling with the object handling so that the data can be arranged
    in a more cache-friendly manner. It's especially useful when your
    objects have critical data that is accessed a lot (such as position) and
    less critical data that is more rarely used (such as cost or name) -
    separate arrays means you can run through the entire array of object positions without filling up your caches with low-priority name data.

    But in your case, you can use it to save space in ram (and possibly make stepping through the data a little more efficient as the stride sizes
    are more likely to be powers of two).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Brett@21:1/5 to pozz on Wed Oct 27 02:40:31 2021
    pozz <pozzugno@gmail.com> wrote:
    In one of my projects that run on a Cortex-M0+ MCU, I have a few arrays
    of structs. Now I need to increase the size of the arrays, but I'm out
    of RAM, so I'm searching for ways to save some space in RAM.

    One simple way is to pack the structs, for example with

    __attribute__((packed))

    in gcc. I can save some padding bytes (that waste some memory) for each element of the array, so the total amount of saved space could be enough.

    I know the use of a packed struct forces the compiler to generate a
    slower code, because of misaligned accesses of its members.
    However, besides having a slower code, is the result correct in any case?


    Post the structure with the variable names changed to Greek gods.
    Add comments to each line with the range of values stored in each variable.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)