• STEP compiler generator

    From gah4@21:1/5 to All on Tue Jun 14 00:00:35 2022
    The program called STEP, that I worked on as part of a
    summer undergraduate project is described here:

    http://www.bitsavers.org/pdf/stanford/slac/The_STEP_Processor.pdf

    According to its manual, it is a:

    "Recursive descent compiler generator",

    and so possibly an answer to the previous question about compiler
    generators.

    In one program it has a parser generator, interpreter for the
    generated parser, replacement procedure (what Bison calls action)
    compiler, and interpreter for compiled replacement procedures.

    STEP originated from the same group that did Mortran, and might be
    considered the next generation. It is designed as a macro processor
    that can fully parse its arguments.

    It can be used as a more ordinary macro processor, though with error
    checking on arguments, but to use it as a compiler generator, you
    write one macro with the whole input program as its argument. It is
    then parsed, while generating the appropriate output.

    As usual for compilers (and macro processors) it is written using
    itself, to allow for a language that is a structured, and otherwise
    improved language based on Fortran 66. It is well designed for writing
    source to source compilers, as source language translators. Less well
    designed for generating assembly code or machine code.

    It took a little while to get it running again, as the version I
    started with had some bugs that the version I had so many years ago
    did not have. (Partly because I wanted it to run with subscript checks
    on.)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From gah4@21:1/5 to All on Fri Jun 17 15:13:33 2022
    (I wrote)

    http://www.bitsavers.org/pdf/stanford/slac/The_STEP_Processor.pdf

    I wasn't expecting a lot of discussion, but I thought someone might
    say something.

    I was thinking, though, about how it might be done today.

    Taking the core parser generator from Bison, that is without the I/O
    routines so it could be included in another program for a start.

    Then a new program to use the generated parsing table, similar to the
    way STEP interprets its generated parser.

    A new language to write actions instead of generating C code.
    Bison parsers depend on the output being processed by a C compiler.
    I suspect that most Bison users use only a small subset of C in their
    actions, so one could define a subset, or a new language for writing
    actions. Either one would allow for a simple compiler and the ability
    to interpret its output.

    One thing about STEP, is that as each rule and replacement procedure
    is compiled, is linked with the running system, and is immediately
    available for use. That is unlike the usual use of Bison, where the
    whole thing is compiled at the end.
    [Reminds me of IMP72 where you could put new syntax rules in your program
    and either make them a macro or call internal routines. It seemed like a
    good idea at the time but what it meant was that no two IMP programs were written in the same language and they tended to be unreadable. -John]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Christopher F Clark@21:1/5 to All on Wed Jun 22 00:20:59 2022
    I'm not quite certain what problem you are trying to explain how to solve. Thus, this reply has been delayed and may be off track.

    In one program it has a parser generator, interpreter for the
    generated parser, replacement procedure (what Bison calls action)
    compiler, and interpreter for compiled replacement procedures.

    This sounds very much like the model used in Racket, where one
    incrementally defines a new language which gets interpreted down to scheme
    and executed as scheme (and Racket itself is written in scheme, in that way
    if I understand correctly). This is all done by what the lisp people call "hygenic macros" which are ways of manipulating an AST that has been represented as S-expressions.

    With a C interpreter (and there are such things) and a lexer and parser generator written in C, one could essentially do the same thing, the same
    way.

    However, as our routinely wise moderator points out, the result is an idiosyncratic language that is one of a kind and no one but the author
    really understands. This is truly how to build a tower of Babel.

    More importantly, if you are trying to solve the problem of writing more of
    a compiler as something one can generate, you haven't actually solved any interesting problem. Your code will still be imperative. You haven't introduced any new model that actually makes some part of the compilation process easier to reason about.

    Compare this to the visitor pattern, which ANTLR generates. This separates walking the parser tree (or AST) from the actions to be performed. That
    may not seem like a big step, but it does remove some of the cognitive
    load. It can be taken farther, but the pattern itself (documented before
    ANTLR implemented it in the "Gang of Four" book on Design Patterns). And
    those patterns do make various common actions easier to reason about,
    because the pattern has taken and standardized it, thus encapsulating that
    part of the cognitive load. This could be taken farther, but it is a good step.

    Attribute grammars are another way of reducing cognitive load, if you make
    the attribute expressions separate and independent. That means in each attribute expression you are only thinking about one issue. Only when the attribute expressions intersect or are dependent upon each other does the reasoning become more complex.

    -------

    Now, a different interpretation of what you are trying to achieve is some
    kind of portability. Starting with either a lisp/scheme or C interpreter,
    you would have something portable and relatively easy to convert into some other programming language. Because you would still be interpreting it, it would be a complete bootstrapping effort, where the output of the
    compilation was a translation of the original language to a different
    language.

    However, the reason translations from one programming language to another
    is difficult is not about the ordinary code. That part is easy. It's
    about the semantic edge cases and things like I/O where the semantics are buried in some runtime library. If two numbers added together overflow,
    what happens? What happens if you index off the edge of an array? What
    data types convert into each other and what is the result of the
    conversion? There are myriads of questions at that level, which determine
    what programs actually do and which ones are legal. That's where UNCOL projects die.

    ------

    And, both of those guesses about what STEP does that is unique might be wrong......

    -- ******************************************************************************

    Chris Clark email:
    christopher.f.clark@compiler-resources.com
    Compiler Resources, Inc. Web Site: http://world.std.com/~compres
    23 Bailey Rd voice: (508) 435-5016
    Berlin, MA 01503 USA twitter: @intel_chris ------------------------------------------------------------------------------

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From gah4@21:1/5 to All on Tue Jun 21 18:53:40 2022
    On Tuesday, June 21, 2022 at 4:24:10 PM UTC-7, Christopher F Clark wrote:

    (snip, I wrote)
    In one program it has a parser generator, interpreter for the
    generated parser, replacement procedure (what Bison calls action)
    compiler, and interpreter for compiled replacement procedures.

    This sounds very much like the model used in Racket, where one
    incrementally defines a new language which gets interpreted down to scheme and executed as scheme (and Racket itself is written in scheme, in that way if I understand correctly). This is all done by what the lisp people call "hygenic macros" which are ways of manipulating an AST that has been represented as S-expressions.

    I would have to look at in detail, but it does sound similar.

    With a C interpreter (and there are such things) and a lexer and parser generator written in C, one could essentially do the same thing, the same way.

    I think I have heard rumors about C interpreters, but never saw one.

    However, as our routinely wise moderator points out, the result is an idiosyncratic language that is one of a kind and no one but the author
    really understands. This is truly how to build a tower of Babel.

    Well, you can say that about C with some uses of the preprocessor.

    (There is an old story, maybe untrue, about a Pascal user doing:

    #define BEGIN {
    #define END }

    I believe the idea is to (mostly) not do that, at least not more
    than one does with the C preprocessor.

    I was remembering in another group, about how Pascal has numeric
    statement labels, and one of the features of Knuth's WEB, used to
    write TeX, is macros to give them nice names. You could do that
    with other languages, and with a very simple processor, maybe
    even sed.

    Hopefully it makes the code more readable, but maybe not.

    In any case, it was designed around the time of Mortran, and one
    suggested use is an improved Mortran processor. Fully parsing input
    allows for more appropriate error messages, for one.

    More importantly, if you are trying to solve the problem of writing more of
    a compiler as something one can generate, you haven't actually solved any interesting problem. Your code will still be imperative. You haven't introduced any new model that actually makes some part of the compilation process easier to reason about.

    Mostly I believe it is useful as a source to source compiler, and
    especially one that can be written fast, maybe to be used once.

    (snip)

    Attribute grammars are another way of reducing cognitive load, if you make the attribute expressions separate and independent. That means in each attribute expression you are only thinking about one issue. Only when the attribute expressions intersect or are dependent upon each other does the reasoning become more complex.

    Now, a different interpretation of what you are trying to achieve is some kind of portability. Starting with either a lisp/scheme or C interpreter,
    you would have something portable and relatively easy to convert into some other programming language. Because you would still be interpreting it, it would be a complete bootstrapping effort, where the output of the
    compilation was a translation of the original language to a different language.

    Yes, I believe you could take much of bison, and much of a C
    interpreter, and put them together into one program.

    However, the reason translations from one programming language to another
    is difficult is not about the ordinary code. That part is easy. It's
    about the semantic edge cases and things like I/O where the semantics are buried in some runtime library.

    When I was in high school, I was interested in having a BASIC to Fortran converter. I had some BASIC programs that were interesting.

    Then I read the BNF description of HP-2000 BASIC, and, without knowing
    anything else about compilers, thought about how to write a recursive
    descent compiler.

    When I was in college, the one class I really wanted to take, was
    the compiler class. Putting EE classes and the compiler class on
    my suggested schedule got me into applied physics.

    In any case, with such a processor one can fully parse the input,
    and do some translations that would otherwise be difficult.

    But yes, I suspect that many languages would not be so easy.

    If two numbers added together overflow,
    what happens? What happens if you index off the edge of an array? What
    data types convert into each other and what is the result of the
    conversion? There are myriads of questions at that level, which determine what programs actually do and which ones are legal. That's where UNCOL projects die.

    Reminds me, a few times I have complained on comp.lang.fortran
    about the removal of REAL variables in DO loops. They were added,
    and then removed. But for translation from BASIC, especially by
    hand, it is nice to have them. (Many BASIC systems only have
    floating point variables.)

    The current STEP only has the output processor that formats
    for fixed-form Fortran. The manual says that there is another one,
    but I don't see it. Might not be hard to write, though.

    Otherwise, some of the difference between STEP and Bison
    is the difference in input representation. For one, STEP generates
    a recursive descent compiler, so there are specific ways to write
    the macros to avoid head recursion. STEP was written close to
    the time of yacc, so there might not have been a lot of cross
    fertilization.
    [For something much worse than BEGIN and END, the Bourne shell was
    written with macros that made C look remarkably like Algol 68.
    See https://www.tuhs.org/cgi-bin/utree.pl?file=V7/usr/src/cmd/sh
    For a C interpreter see http://www.softintegration.com/products/
    The eel extension language in emacs clone Epsilon is very much
    like C also.
    -John]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From gah4@21:1/5 to Christopher F Clark on Tue Jun 21 21:37:02 2022
    On Tuesday, June 21, 2022 at 4:24:10 PM UTC-7, Christopher F Clark wrote:

    I'm not quite certain what problem you are trying to explain how to solve. Thus, this reply has been delayed and may be off track.

    Yes, I didn't expect so much interest in 45 year old software.

    (Though last year I had IBM's ECAP running, which is closer to 60
    years old. It is Fortran IV, translated from Fortran II.)

    So, yes, the core of Bison, the processor for generated parsers,
    and C interpreter together in one program would be about right.

    In case no-one thought of it before, it is now thought of.

    I believe STEP was someone's PhD project. Maybe someone else
    needs a project.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)