• 8086 register allocation

    From Alexei A. Frounze@21:1/5 to All on Sun May 9 14:28:39 2021
    For the fun of it I've implemented a greedy bottom-up local register
    allocator for the intel 8086 CPU, which is known for its non-uniform
    use of registers in ALU instructions and memory operands (https://github.com/alexfru/regal86).

    The motivation for it is that I haven't seen any such register
    allocator in e.g. the "Dragon" book and may other texts on compilers
    I've read or skimmed. And even today I can't quite come up with a
    proper web search request that would return info on such allocators.
    I've found one paper that very briefly talks about an allocator like
    this for the IBM 370. This gives a sense that this stuff is very
    yesterday in terms of both CPU architectures and modern compiler
    algorithms since many seem to delegate this particular aspect of
    register allocation to graph (pre)coloring and modern CPUs are more
    uniform in terms of register use (or there are simply more registers
    to use).

    Are there any other texts that introduce and explain such register
    allocators? Compiler Construction by William M. Waite and Gerhard Goos
    (section 10.2.2 "Targeting") is a bit too short on the matter. But
    people somehow have done things like this in the past.

    Thanks,
    Alex
    [I don't recall anything in a textbook about it. x86 register
    allocators were pretty ad-hoc due to all of the special cases
    where an operand had to be in a specific register. -John]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Fernando@21:1/5 to All on Mon May 10 04:46:31 2021
    Hi Alex,

    I read part of your code. Very nice! The core algorithm (disregarding the x86 specific part) seems like 'local register allocation' using Belady's heuristic for spilling. I have a class about it here:

    https://youtu.be/XmNXeNCGSIA

    There has been some academic work about models for register allocation for x86. Two papers follow below:

    * Register Allocation by Puzzle Solving, https://llvm.org/pubs/2008-06-PLDI-PuzzleSolving.pdf

    * A generalized algorithm for graph-coloring register allocation, http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.10.3076&rep=rep1&type=pdf

    Regards,

    Fernando

    For the fun of it I've implemented a greedy bottom-up local register allocator for the intel 8086 CPU, which is known for its non-uniform
    use of registers in ALU instructions and memory operands (https://github.com/alexfru/regal86).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From gah4@21:1/5 to alexf...@gmail.com on Mon May 10 14:49:27 2021
    On Sunday, May 9, 2021 at 5:04:18 PM UTC-7, alexf...@gmail.com wrote:
    For the fun of it I've implemented a greedy bottom-up local register allocator for the intel 8086 CPU, which is known for its non-uniform
    use of registers in ALU instructions and memory operands (https://github.com/alexfru/regal86).

    Reminds me of what, as far as I know, is also an unsolved problem:
    register allocation for the x87.

    First the 8087 was designed not knowing if it should be a stack or
    register machine, so stack registers are addressable. (Even though
    the addresses keep changing.)

    It was also designed to have a virtual stack, which would spill to memory
    on overflow, and back on underflow. That sounds nice, but it seems that
    no-one tried to write the interrupt routine before the hardware was built,
    and that it actually isn't possible. It seems that it isn't possible to get some
    of the state bits set, such that it all works like a seamless virtual stack.

    The 80287 is the same processing logic with a different bus interface
    (and separate clock). As well as I know, things were redesigned later,
    but it seems that there still is no virtual stack.

    There are description of some compilers that give up on compiling any
    code that needs more than eight registers.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Hans-Peter Diettrich@21:1/5 to All on Tue May 11 03:35:11 2021
    On 5/10/21 11:49 PM, gah4 wrote:

    It was also designed to have a virtual stack, which would spill to memory
    on overflow, and back on underflow. That sounds nice, but it seems that no-one tried to write the interrupt routine before the hardware was built, and that it actually isn't possible. It seems that it isn't possible to get some
    of the state bits set, such that it all works like a seamless virtual stack.

    How efficient is a virtual stack? Did anybody try to use ordinary
    (integral...) registers with interrupt driven spilling?

    A stack machine is convenient for calculations. Before the stack
    overflows the compiler can save intermediate results, as with any other architecture of limited register count.

    DoDi
    [Normal stack machines have the top few entries in registers and do the spilling to memory in hardware. The x87 stack has 8 registers, which is
    a lot for a stack machine, but the spilling was broken. You can address
    into the stack but you can't really use it as a register machine. -John]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From gah4@21:1/5 to All on Mon May 10 21:19:27 2021
    (Snip on x87 register allocation)

    [Normal stack machines have the top few entries in registers and do the spilling to memory in hardware. The x87 stack has 8 registers, which is
    a lot for a stack machine, but the spilling was broken. You can address
    into the stack but you can't really use it as a register machine. -John]

    It was some years ago, but I read the story about the gcc code generator,
    which is designed for register machines. So, they don't quite treat it
    as a register machine, but not a stack machine, either.

    At any point in the code, there should be a known (constant) number
    of items on the stack, so you can address the registers using that number.
    I think that is what Intel expected, when they wrote that you can use
    it as a register machine. It would be less fun in assembly, though.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Hans-Peter Diettrich@21:1/5 to Hans-Peter Diettrich on Tue May 11 09:35:17 2021
    On 5/11/21 3:35 AM, Hans-Peter Diettrich wrote:

    A stack machine is convenient for calculations. Before the stack
    overflows the compiler can save intermediate results, as with any other architecture of limited register count.

    DoDi
    [Normal stack machines have the top few entries in registers and do the spilling to memory in hardware.  The x87 stack has 8 registers, which is
    a lot for a stack machine, but the spilling was broken.   You can address into the stack but you can't really use it as a register machine. -John]

    FORTH coders know how inconvenient for humans is accessing "local
    variables" in such a stack. But a compiler can track the content of the
    stack.

    I had some problems in understanding "spilling". After re-reading the
    80287 instruction set I found that only ST(0) is usable for memory
    load/store operations, making it hard to code load/store the other end
    of the register stack. Thanks, John, for the kick... ;-)

    DoDi

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Koenig@21:1/5 to gah4@u.washington.edu on Tue May 11 06:44:07 2021
    gah4 <gah4@u.washington.edu> schrieb:

    [80x87]

    There are description of some compilers that give up on compiling any
    code that needs more than eight registers.

    Even worse. If I remember correctly, Turbo Pascal 3 simply
    generated wrong code if the number of stack slots was exceeded.
    The user just had to make sure that the formulas were not too
    complex.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)