Forum: >>> Magnum BBS <<<

STEP compiler generator

From gah4@21:1/5 to All on Tue Jun 14 00:00:35 2022

The program called STEP, that I worked on as part of a
summer undergraduate project is described here:

http://www.bitsavers.org/pdf/stanford/slac/The_STEP_Processor.pdf

According to its manual, it is a:

"Recursive descent compiler generator",

and so possibly an answer to the previous question about compiler
generators.

In one program it has a parser generator, interpreter for the
generated parser, replacement procedure (what Bison calls action)
compiler, and interpreter for compiled replacement procedures.

STEP originated from the same group that did Mortran, and might be
considered the next generation. It is designed as a macro processor
that can fully parse its arguments.

It can be used as a more ordinary macro processor, though with error
checking on arguments, but to use it as a compiler generator, you
write one macro with the whole input program as its argument. It is
then parsed, while generating the appropriate output.

As usual for compilers (and macro processors) it is written using
itself, to allow for a language that is a structured, and otherwise
improved language based on Fortran 66. It is well designed for writing
source to source compilers, as source language translators. Less well
designed for generating assembly code or machine code.

It took a little while to get it running again, as the version I
started with had some bugs that the version I had so many years ago
did not have. (Partly because I wanted it to run with subscript checks
on.)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From gah4@21:1/5 to All on Fri Jun 17 15:13:33 2022

(I wrote)

http://www.bitsavers.org/pdf/stanford/slac/The_STEP_Processor.pdf

I wasn't expecting a lot of discussion, but I thought someone might
say something.

I was thinking, though, about how it might be done today.

Taking the core parser generator from Bison, that is without the I/O
routines so it could be included in another program for a start.

Then a new program to use the generated parsing table, similar to the
way STEP interprets its generated parser.

A new language to write actions instead of generating C code.
Bison parsers depend on the output being processed by a C compiler.
I suspect that most Bison users use only a small subset of C in their
actions, so one could define a subset, or a new language for writing
actions. Either one would allow for a simple compiler and the ability
to interpret its output.

One thing about STEP, is that as each rule and replacement procedure
is compiled, is linked with the running system, and is immediately
available for use. That is unlike the usual use of Bison, where the
whole thing is compiled at the end.
[Reminds me of IMP72 where you could put new syntax rules in your program
and either make them a macro or call internal routines. It seemed like a
good idea at the time but what it meant was that no two IMP programs were written in the same language and they tended to be unreadable. -John]

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Christopher F Clark@21:1/5 to All on Wed Jun 22 00:20:59 2022

I'm not quite certain what problem you are trying to explain how to solve. Thus, this reply has been delayed and may be off track.

In one program it has a parser generator, interpreter for the
generated parser, replacement procedure (what Bison calls action)
compiler, and interpreter for compiled replacement procedures.

This sounds very much like the model used in Racket, where one
incrementally defines a new language which gets interpreted down to scheme
and executed as scheme (and Racket itself is written in scheme, in that way
if I understand correctly). This is all done by what the lisp people call "hygenic macros" which are ways of manipulating an AST that has been represented as S-expressions.

With a C interpreter (and there are such things) and a lexer and parser generator written in C, one could essentially do the same thing, the same
way.

However, as our routinely wise moderator points out, the result is an idiosyncratic language that is one of a kind and no one but the author
really understands. This is truly how to build a tower of Babel.

More importantly, if you are trying to solve the problem of writing more of
a compiler as something one can generate, you haven't actually solved any interesting problem. Your code will still be imperative. You haven't introduced any new model that actually makes some part of the compilation process easier to reason about.

Compare this to the visitor pattern, which ANTLR generates. This separates walking the parser tree (or AST) from the actions to be performed. That
may not seem like a big step, but it does remove some of the cognitive
load. It can be taken farther, but the pattern itself (documented before
ANTLR implemented it in the "Gang of Four" book on Design Patterns). And
those patterns do make various common actions easier to reason about,
because the pattern has taken and standardized it, thus encapsulating that
part of the cognitive load. This could be taken farther, but it is a good step.

Attribute grammars are another way of reducing cognitive load, if you make
the attribute expressions separate and independent. That means in each attribute expression you are only thinking about one issue. Only when the attribute expressions intersect or are dependent upon each other does the reasoning become more complex.

-------

Now, a different interpretation of what you are trying to achieve is some
kind of portability. Starting with either a lisp/scheme or C interpreter,
you would have something portable and relatively easy to convert into some other programming language. Because you would still be interpreting it, it would be a complete bootstrapping effort, where the output of the
compilation was a translation of the original language to a different
language.

However, the reason translations from one programming language to another
is difficult is not about the ordinary code. That part is easy. It's
about the semantic edge cases and things like I/O where the semantics are buried in some runtime library. If two numbers added together overflow,
what happens? What happens if you index off the edge of an array? What
data types convert into each other and what is the result of the
conversion? There are myriads of questions at that level, which determine
what programs actually do and which ones are legal. That's where UNCOL projects die.

------

And, both of those guesses about what STEP does that is unique might be wrong......

-- ******************************************************************************

Chris Clark email:
christopher.f.clark@compiler-resources.com
Compiler Resources, Inc. Web Site: http://world.std.com/~compres
23 Bailey Rd voice: (508) 435-5016
Berlin, MA 01503 USA twitter: @intel_chris ------------------------------------------------------------------------------

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From gah4@21:1/5 to All on Tue Jun 21 18:53:40 2022

On Tuesday, June 21, 2022 at 4:24:10 PM UTC-7, Christopher F Clark wrote:

(snip, I wrote)

In one program it has a parser generator, interpreter for the
generated parser, replacement procedure (what Bison calls action)
compiler, and interpreter for compiled replacement procedures.

This sounds very much like the model used in Racket, where one
incrementally defines a new language which gets interpreted down to scheme and executed as scheme (and Racket itself is written in scheme, in that way if I understand correctly). This is all done by what the lisp people call "hygenic macros" which are ways of manipulating an AST that has been represented as S-expressions.

I would have to look at in detail, but it does sound similar.

With a C interpreter (and there are such things) and a lexer and parser generator written in C, one could essentially do the same thing, the same way.

I think I have heard rumors about C interpreters, but never saw one.

However, as our routinely wise moderator points out, the result is an idiosyncratic language that is one of a kind and no one but the author
really understands. This is truly how to build a tower of Babel.

Well, you can say that about C with some uses of the preprocessor.

(There is an old story, maybe untrue, about a Pascal user doing:

#define BEGIN {
#define END }

I believe the idea is to (mostly) not do that, at least not more
than one does with the C preprocessor.

I was remembering in another group, about how Pascal has numeric
statement labels, and one of the features of Knuth's WEB, used to
write TeX, is macros to give them nice names. You could do that
with other languages, and with a very simple processor, maybe
even sed.

Hopefully it makes the code more readable, but maybe not.

In any case, it was designed around the time of Mortran, and one
suggested use is an improved Mortran processor. Fully parsing input
allows for more appropriate error messages, for one.

More importantly, if you are trying to solve the problem of writing more of
a compiler as something one can generate, you haven't actually solved any interesting problem. Your code will still be imperative. You haven't introduced any new model that actually makes some part of the compilation process easier to reason about.

Mostly I believe it is useful as a source to source compiler, and
especially one that can be written fast, maybe to be used once.

(snip)

Attribute grammars are another way of reducing cognitive load, if you make the attribute expressions separate and independent. That means in each attribute expression you are only thinking about one issue. Only when the attribute expressions intersect or are dependent upon each other does the reasoning become more complex.

Now, a different interpretation of what you are trying to achieve is some kind of portability. Starting with either a lisp/scheme or C interpreter,
you would have something portable and relatively easy to convert into some other programming language. Because you would still be interpreting it, it would be a complete bootstrapping effort, where the output of the
compilation was a translation of the original language to a different language.

Yes, I believe you could take much of bison, and much of a C
interpreter, and put them together into one program.

However, the reason translations from one programming language to another
is difficult is not about the ordinary code. That part is easy. It's
about the semantic edge cases and things like I/O where the semantics are buried in some runtime library.

When I was in high school, I was interested in having a BASIC to Fortran converter. I had some BASIC programs that were interesting.

Then I read the BNF description of HP-2000 BASIC, and, without knowing
anything else about compilers, thought about how to write a recursive
descent compiler.

When I was in college, the one class I really wanted to take, was
the compiler class. Putting EE classes and the compiler class on
my suggested schedule got me into applied physics.

In any case, with such a processor one can fully parse the input,
and do some translations that would otherwise be difficult.

But yes, I suspect that many languages would not be so easy.

If two numbers added together overflow,
what happens? What happens if you index off the edge of an array? What
data types convert into each other and what is the result of the
conversion? There are myriads of questions at that level, which determine what programs actually do and which ones are legal. That's where UNCOL projects die.

Reminds me, a few times I have complained on comp.lang.fortran
about the removal of REAL variables in DO loops. They were added,
and then removed. But for translation from BASIC, especially by
hand, it is nice to have them. (Many BASIC systems only have
floating point variables.)

The current STEP only has the output processor that formats
for fixed-form Fortran. The manual says that there is another one,
but I don't see it. Might not be hard to write, though.

Otherwise, some of the difference between STEP and Bison
is the difference in input representation. For one, STEP generates
a recursive descent compiler, so there are specific ways to write
the macros to avoid head recursion. STEP was written close to
the time of yacc, so there might not have been a lot of cross
fertilization.
[For something much worse than BEGIN and END, the Bourne shell was
written with macros that made C look remarkably like Algol 68.
See https://www.tuhs.org/cgi-bin/utree.pl?file=V7/usr/src/cmd/sh
For a C interpreter see http://www.softintegration.com/products/
The eel extension language in emacs clone Epsilon is very much
like C also.
-John]

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From gah4@21:1/5 to Christopher F Clark on Tue Jun 21 21:37:02 2022

On Tuesday, June 21, 2022 at 4:24:10 PM UTC-7, Christopher F Clark wrote:

I'm not quite certain what problem you are trying to explain how to solve. Thus, this reply has been delayed and may be off track.

Yes, I didn't expect so much interest in 45 year old software.

(Though last year I had IBM's ECAP running, which is closer to 60
years old. It is Fortran IV, translated from Fortran II.)

So, yes, the core of Bison, the processor for generated parsers,
and C interpreter together in one program would be about right.

In case no-one thought of it before, it is now thought of.

I believe STEP was someone's PhD project. Maybe someone else
needs a project.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	300
Nodes:	16 (2 / 14)
Uptime:	77:01:28
Calls:	6,716
Files:	12,247
Messages:	5,357,726

STEP compiler generator

Who's Online

System Info