• Bit swizzling

    From Rick C. Hodgin@21:1/5 to All on Sat Sep 5 12:05:07 2020
    Are there any algorithms which take a known-at-compile-time sequence
    of bitwise operations on an 8-bit to 64-bit quantity, and optimize
    them down to their minimal set of operations?

    For example, if I have an 8-bit byte and I want to swizzle the bits
    thusly:

    Input: 07 06 05 04 03 02 01 00
    Output: 05 04 07 02 01 03 00 06

    I can easily swizzle the data using a brute force method:

    v = get_value();
    o = 0;
    swizzle1(o, v, 0, 6);
    swizzle1(o, v, 1, 0);
    swizzle1(o, v, 2, 3);
    swizzle1(o, v, 3, 1);
    swizzle1(o, v, 4, 2);
    swizzle1(o, v, 5, 7);
    swizzle1(o, v, 6, 4);
    swizzle1(o, v, 7, 5);

    // Untested, off the top of my head
    void swizzle(unsigned char& o, unsigned char v, int s, int d)
    {
    o |= (((v & (1 << s)) >> s) << d);
    }

    And, of course, that algorithm can be optimized based on relative
    values of s and d, and if s is 0, etc.

    However, there also exists a minimal set of steps to complete that
    swizzling because in the operation above, bits 5 and 4 are used in
    sequence. It could be done using a swizzle2() operation, for example
    and handle 2 bits at a time.

    In addition, given the bit operator abilities that exist on various
    CPUs there are potentially other combinations that exist behind an
    operation, such as bitswap, where the order of bits flips or mirrors
    across the center position.

    Are there any existing algorithms which examine the operations that
    must be conducted and then create an optimized / minimal sequence of
    mechanical steps to conduct it given a constrained set of features
    (such as those present on a given CPU)?

    Thank you in advance.

    --
    Rick C. Hodgin

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kaz Kylheku@21:1/5 to Rick C. Hodgin on Sat Sep 5 20:15:50 2020
    On 2020-09-05, Rick C. Hodgin <rick.c.hodgin@gmail.com> wrote:
    Are there any existing algorithms which examine the operations that
    must be conducted and then create an optimized / minimal sequence of mechanical steps to conduct it given a constrained set of features
    (such as those present on a given CPU)?

    For mapping 8 bit quantities to other 8 bit quantities, you can always
    use a 256 byte look up table.

    Of course, it's not practical for larger spaces. Still, it may be possibel to identify the possibility of involving multiple smaller look-up tables.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From davidlovemore@gmail.com@21:1/5 to Rick C. Hodgin on Sun Sep 6 00:31:53 2020
    On Saturday, September 5, 2020 at 5:45:43 PM UTC+1, Rick C. Hodgin wrote:
    Are there any algorithms which take a known-at-compile-time sequence
    of bitwise operations on an 8-bit to 64-bit quantity, and optimize
    them down to their minimal set of operations?

    There's some good resources on bit permutations including this online calculator here:

    http://programming.sirrida.de/calcperm.php

    For:
    Input: 07 06 05 04 03 02 01 00
    Output: 05 04 07 02 01 03 00 06

    It gives:

    x = bit_permute_step_simple(x, 0x55555555, 1); // Bit index complement 0
    x = bit_permute_step_simple(x, 0x33333333, 2); // Bit index complement 1
    x = bit_permute_step_simple(x, 0x0f0f0f0f, 4); // Bit index complement 2

    where

    t_bits bit_permute_step_simple(t_bits x, t_bits m, t_uint shift) {
    return ((x & m) << shift) | ((x >> shift) & m);
    }

    Approx 12 cycles on superscalar processor.

    Source code for a more sophisticated permutation code generator is also available linked from the page.

    Some notes here on the online generator:

    http://programming.sirrida.de/bit_perm.html#calculator

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Martin Ward@21:1/5 to Rick C. Hodgin on Mon Sep 7 12:08:23 2020
    On 05/09/2020 17:05, Rick C. Hodgin wrote:
    Are there any existing algorithms which examine the operations that
    must be conducted and then create an optimized / minimal sequence of mechanical steps to conduct it given a constrained set of features
    (such as those present on a given CPU)?

    The process you are describing is called "Superoptimization":
    finding the optimal code sequence for one loop-free sequence
    of instructions.

    https://en.wikipedia.org/wiki/Superoptimization

    The term superoptimization was first coined by Alexia Massalin in the
    1987 paper "Superoptimizer: A Look at the Smallest Program":

    http://www.stanford.edu/class/cs343/resources/superoptimizer.pdf

    Back in 1980 there was a discussion in the magazine
    "Pratical Electronics" regarding the shortest sequence of
    instructions needed to reverse the bits in the accumulator
    for a 6800 system (an operation that is needed in some Fast Fourier
    Transform algorithms).

    One solution (given in June 1980, page 70) consisted of just
    two instructions and used no additional data.
    The instructions were:

    STAA PORTA write to output port
    LDAB PORTB read from input port

    with the output port being wired directly to the input port with
    the bits reversed!

    --
    Martin

    Dr Martin Ward | Email: martin@gkc.org.uk | http://www.gkc.org.uk G.K.Chesterton site: http://www.gkc.org.uk/gkc | Erdos number: 4

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rick C. Hodgin@21:1/5 to Martin Ward on Mon Sep 7 10:55:08 2020
    Thank you, Martin.

    I saw a Google developer talk on C/C++ where they discussed generating
    optimum code for a given logic flow (as dictated by source code).

    I remember thinking, what if your base logic breakout isn't optimum?
    What if the way the developer was designing the algorithm wasn't optimum?
    What if the entire premise of what you're trying to do needed to be first broken down into fundamentals, and then re-assembled in a truly optimum
    form?

    There are cases where, if you know specific input ranges, you could sub- stitute the algorithm with something completely unlike the actual function,
    yet yield the same results.

    One such idea is writing to a port, and reading back from another port,
    where the external thing does the processing for you as a co-processor or custom ASIC.  The actual 6800 algorithm as dictated by the developer may
    have been in C code or other lagugae, and would've used Nn asm opcodes to conduct the same workload with even an optimizing compiler.  But an intel- ligent compiler, one that could break down the operations to fundamentals, understand what's trying to be accomplished, and then re-engineer the
    logic to accomodate the abilities which exist (such as those extra read/
    write ports), is what's really quired.

    In some algorithm cases, the nature of what you're trying to do may be
    handled fully by a completely unexpected series of instructions.

    I've long desired for my CAlive language to support that type of opti- mization.  Walter Banks and I used to discuss the possibility, and had
    he lived we would've continued down that line.  It goes along with
    another fundamental idea I have for processing that I would like to
    explore someday ... if there's time.

    I appreciate your response.

    --
    Rick C. Hodgin


    On 9/7/20 7:08 AM, Martin Ward wrote:
    On 05/09/2020 17:05, Rick C. Hodgin wrote:
    Are there any existing algorithms which examine the operations that
    must be conducted and then create an optimized / minimal sequence of
    mechanical steps to conduct it given a constrained set of features
    (such as those present on a given CPU)?

    The process you are describing is called "Superoptimization":
    finding the optimal  code sequence for one loop-free sequence
    of instructions.

    https://en.wikipedia.org/wiki/Superoptimization

    The term superoptimization was first coined by Alexia Massalin in the
    1987 paper "Superoptimizer: A Look at the Smallest Program":

    http://www.stanford.edu/class/cs343/resources/superoptimizer.pdf

    Back in 1980 there was a discussion in the magazine
    "Pratical Electronics" regarding the shortest sequence of
    instructions needed to reverse the bits in the accumulator
    for a 6800 system (an operation that is needed in some Fast Fourier Transform algorithms).

    One solution (given in June 1980, page 70) consisted of just
    two instructions and used no additional data.
    The instructions were:

    STAA PORTA  write to output port
    LDAB PORTB  read from input port

    with the output port being wired directly to the input port with
    the bits reversed!

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Hans-Peter Diettrich@21:1/5 to All on Tue Sep 8 01:45:41 2020
    Am 07.09.2020 um 16:55 schrieb Rick C. Hodgin:

    One such idea is writing to a port, and reading back from another port,
    where the external thing does the processing for you as a co-processor or custom ASIC.

    In a more general approach you connect a ROM between the output and
    input port, for any possible mapping of n-into-n bits. Nowadays a table
    in flash memory is better applicable.

    DoDi

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tom Crick@21:1/5 to Martin Ward on Tue Sep 8 09:35:29 2020
    On Mon, 7 Sep 2020 at 18:57, Martin Ward <martin@gkc.org.uk> wrote:

    On 05/09/2020 17:05, Rick C. Hodgin wrote:
    Are there any existing algorithms which examine the operations that
    must be conducted and then create an optimized / minimal sequence of mechanical steps to conduct it given a constrained set of features
    (such as those present on a given CPU)?

    The process you are describing is called "Superoptimization":
    finding the optimal code sequence for one loop-free sequence
    of instructions. ...

    Back in the distant past (2009), I did my PhD on superoptimisation —
    provably optimal code generation using answer set programming:

    https://proftomcrick.com/2012/02/18/three-papers-on-superoptimisation/

    Still an area with lots of potential (IMHO)...

    Best wishes,

    Tom

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From gah4@21:1/5 to Rick C. Hodgin on Thu Sep 10 10:34:18 2020
    On Saturday, September 5, 2020 at 9:45:43 AM UTC-7, Rick C. Hodgin wrote:
    Are there any algorithms which take a known-at-compile-time sequence
    of bitwise operations on an 8-bit to 64-bit quantity, and optimize
    them down to their minimal set of operations?

    For example, if I have an 8-bit byte and I want to swizzle the bits
    thusly:

    Input: 07 06 05 04 03 02 01 00
    Output: 05 04 07 02 01 03 00 06

    There is a lot of work, and many algorithms, for logic optimization,
    or minimization, that, for example, will find the optimal combination
    of NAND and NOR gates to evaluate some logical operation.

    This is used to compile Verilog or VHDL into either FPGA bit files
    or ASIC masks. On the other hand, logic minimization tends to
    believe that you can wire anything to anything else. That is, bit
    swizzle is just wiring. The question you ask doesn't come up.
    (Later on, routing has to get all the wires to the appropriate
    place, but that is true in general, not just for this case.)

    Note that it isn't just adjacent bits, but bits with the same spacing
    in the input and output, such that you can mask with AND, shift,
    and combine with OR. I suspect that with the operations usually
    available: AND, OR, XOR, and shift, it wouldn't be hard to find an
    algorithm for the optimal case. If you add more operations,
    maybe allow also add and multiply, it gets interesting.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rick C. Hodgin@21:1/5 to All on Thu Sep 10 17:13:24 2020
    On 9/10/20 1:34 PM, gah4 wrote:
    On Saturday, September 5, 2020 at 9:45:43 AM UTC-7, Rick C. Hodgin wrote:
    Are there any algorithms which take a known-at-compile-time sequence
    of bitwise operations on an 8-bit to 64-bit quantity, and optimize
    them down to their minimal set of operations? ...

    There is a lot of work, and many algorithms, for logic optimization,
    or minimization, that, for example, will find the optimal combination
    of NAND and NOR gates to evaluate some logical operation. ...

    I have a preliminary algorithm that works, but it's still not fully
    optimized and is a little clunky.

    I want a way to abstract the logic and work with it there.

    I haven't put in any additional time on this algorithm yet, due to
    some life things happening. But I plan to come back to it.

    --
    Rick C. Hodgin

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)