• Are there "compiler generators"?

    From Roger L Costello@21:1/5 to All on Sat May 28 22:27:53 2022
    Hi Folks,

    There are lexer generators. Flex is a lexer generator.

    There are parser generators. Bison is a parser generator.

    Are there compiler generators?

    Page 52 of the book "Crafting a Compiler with C" says this in the chapter titled "Scanning--Theory and Practice":

    Programming a scanner generator is an example of nonprocedural programming. That is, unlike ordinary programming, which we call procedural, we do not tell a scanner generator "how" do scan but simply "what" we want scanned. This is a higher-level approach and in many ways a more natural one. Much recent
    research in computer science is directed toward nonprocedural programming styles. (Database query languages and Prolog, a "logic" programming language, are nonprocedural.) Nonprocedural programming is most successful in limited domains, such as scanning, where the range of implementation decisions that must be automatically made is limited. Nonetheless, a long-standing (and as
    yet unrealized) goal of computer scientists is to generate an entire compiler from a specification of the properties of the source language and target computer.

    That was written in 1991. Is it still true in 2022--there are no compiler generators?

    /Roger
    [There are certainly programs that will generate a combined lexer and parser but there's a lot more to a compiler. -John]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Roger L Costello on Sun May 29 06:45:21 2022
    Roger L Costello <costello@mitre.org> writes:
    Are there compiler generators?

    Apart from scanner generators and parser generators, there are also:

    * generators of attribute grammar evaluators (e.g., Ox). They use an
    attribute grammar as input.

    * generators of instruction selectors (e.g., burg and friends). Burg
    uses a tree grammar as input.

    * generators of virtual machine interpreters and other code dealing
    with virtual machine instructions (e.g., vmgen). Vmgen uses a
    description of the virtual machine instructions as inputs.

    * There is also some work on generating type checkers.

    All of these could be classified as non-procedural, although they tend
    to include some procedural code.

    You need to add substantial amounts of glue code to integrate the
    pieces generated by these generators into a compiler, especially for
    languages that do not quite fit in the generators' molds. Plus, there
    are pieces such as symbol tables and register allocation that are not
    covered by these generators.

    I have also seen a paper by a French group (don't remember the names)
    in the early 1990s where a generator could generate a complete Pascal
    compiler from a specification. My impression, however, was that the
    generator could only generate compilers for languages that are
    relatively close to Pascal, and I saw no good way to make it much more flexible.

    - anton
    --
    M. Anton Ertl
    anton@mips.complang.tuwien.ac.at
    http://www.complang.tuwien.ac.at/anton/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jan Ziak@21:1/5 to Roger L Costello on Sat May 28 23:52:54 2022
    On Sunday, May 29, 2022 at 4:16:21 AM UTC+2, Roger L Costello wrote:
    Hi Folks,

    There are lexer generators. Flex is a lexer generator.

    There are parser generators. Bison is a parser generator.

    Are there compiler generators? ...

    If you like reading about something which might be practical in a
    hundred years: https://people.idsia.ch/~juergen/goedelmachine.html
    (1st paper in the series: https://arxiv.org/abs/cs/0309048)

    Some practical examples available today, having some similarity to the above method, are PyPy and GraalVM.

    There is also http://www.general-game-playing.de for example.

    -atom

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Robin Vowels@21:1/5 to All on Sun May 29 13:34:28 2022
    "There's "A Compiler Generator" by McKeeman, Horning, and Wortman.

    ----- Original Message -----
    From: "Roger L Costello" <costello@mitre.org>
    Sent: Sunday, May 29, 2022 8:27 AM


    There are lexer generators. Flex is a lexer generator.

    There are parser generators. Bison is a parser generator.

    Are there compiler generators?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Koenig@21:1/5 to Roger L Costello on Sun May 29 09:14:51 2022
    Roger L Costello <costello@mitre.org> schrieb:
    Is it still true in 2022--there are no compiler
    generators?

    Consider what a compiler does can be roughly subdivided
    into the follwing tasks:

    a) Lexing, resulting in a stream of tokens
    b) Parsing, resulting in an abstract syntax tree
    c) Conversion into intermediate form, resulting in
    SSA (usually, these days)
    d) Optimization of the SSA, resulting in modified SSA
    e) Conversion into machine language, resulting in some
    internl representation of the machine language
    f) Optimization of the machine language
    g) Output to later stages (in assembler or in intermediate
    form)

    Roughly speaking a)-c) is the front end, d) is the middle end and
    e)-f) is the back end.

    As our moderator writes,

    [There are certainly programs that will generate a combined lexer and parser but there's a lot more to a compiler. -John]

    a) and b) are well covered. For c), you would need a machine-
    implementable description of the language semantics, and there is
    no such thing in widespread use. d) is a main component of today's
    compilers, which is hand-crafted code. e)-g) can be covered by
    machine descriptions, that gcc and LLVM have (it is debatable if
    these can be called generators).

    So, assuming you can use gcc or LLVM, the main gap is c) - you
    would need a formal semantic description, and it would have to
    be written in a way that it can be translated automatically.
    That might be doable if you design your own langue with that goal.

    Defining a format for language semantics which is useful for a
    reasonable subset of the multitude of languages in current use and machine-translatable (a input to a "semantic front end generator",
    if you will) will be _extremely_ hard. Maybe starting with the
    formal definition of Modula-2 and then thinking in two directions,
    how could a semantic front end generator be built from that
    description and and how could that format be extended to cover
    other languages, could be the start of an approach.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Fernando@21:1/5 to Roger L Costello on Sun May 29 05:00:47 2022
    Hi Roger.

    At least in theory, the Second Futamura Projection (of which there are three
    of them) would generate a compiler. That's a way to do partial program evaluation (https://en.wikipedia.org/wiki/Partial_evaluation). There have been research papers about it, demonstrating that one can indeed generate compilers out of partial evaluation, eg.:

    * An experiment in partial evaluation: the generation of a compiler generator. ND Jones et al. 1985
    * A compiler generator produced by a self-applicable specializer can have a surprisingly natural and understandable structure. SA Romanenko, 1988
    * Practical second Futamura projection: partial evaluation for
    high-performance language interpreters. F Latifi, 2019
    * Etc

    Regards,

    Fernando

    On Saturday, May 28, 2022 at 11:16:21 PM UTC-3, Roger L Costello wrote:
    Hi Folks,

    There are lexer generators. Flex is a lexer generator.
    There are parser generators. Bison is a parser generator.
    Are there compiler generators?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Martin Ward@21:1/5 to Roger L Costello on Sun May 29 12:00:12 2022
    On 28/05/2022 23:27, Roger L Costello wrote:
    There are lexer generators. Flex is a lexer generator.

    There are parser generators. Bison is a parser generator.

    Are there compiler generators?

    Suppose a program F(x, y) takes two inputs and produces
    an output. Suppose input x is known at compile time.
    Then we can optimise the program F(x, .) by precomputing
    all the computations that depend on x to give a program
    G(.) which can be applied to y to produce the output F(x, y):

    G(y) = F(x, y)

    This optimisation process is called "partial evaluation"
    or "specialisation".

    An interpreter is a program F(p, d) which takes
    a program p and some input data d and produces
    the result of executing p on d.

    In theory a partial evaluator (a "specialiser") can be applied
    to an interpreter and the source code for a program to give
    an executable which is a specialised version of the interpreter
    which only runs the given source code, does not require
    the source code to be applied and runs faster than
    the original combination of interpreter and source code.

    To take this to the next level: specialising the specialiser
    for the interpreter generates a compiler for the interpreted
    language. So this is a type of "compiler generator".

    Finally, specialising the specialiser for itself gives a tool
    that can convert any interpreter into an equivalent compiler.

    https://en.wikipedia.org/wiki/Partial_evaluation

    Futamura, Y. (1999). "Partial Evaluation of Computation Process—An
    Approach to a Compiler-Compiler". Higher-Order and Symbolic Computation.
    12 (4): 381–391. CiteSeerX 10.1.1.10.2747. doi:10.1023/A:1010095604496.
    S2CID 12673078.

    In practice, there is a lot more to writing a compiler than just
    partially evaluating an interpreter.

    (The difference between theory and practice is that in theory there
    is no difference between theory and practice, but in practice there is.)

    --
    Martin

    Dr Martin Ward | Email: martin@gkc.org.uk | http://www.gkc.org.uk G.K.Chesterton site: http://www.gkc.org.uk/gkc | Erdos number: 4

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From gah4@21:1/5 to Roger L Costello on Sun May 29 23:29:58 2022
    On Saturday, May 28, 2022 at 7:16:21 PM UTC-7, Roger L Costello wrote:

    There are lexer generators. Flex is a lexer generator.

    There are parser generators. Bison is a parser generator.

    Are there compiler generators?

    My old favorite, and probably still favorite, compiler book is:

    "Retargetable C Compiler, A: Design and Implementation"

    (It is more likely to be your favorite if you are interested in a C compiler.)

    LCC uses, more or less, a code generator generator. You supply the
    instruction combinations to do operations that are needed, and it
    uses dynamic programming to select the optimal code.

    It is usual to generate assembly code for an assembler.

    Mostly that leaves the middle end, especially for optimization,
    that doesn't have its own generator.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Martin Ward on Mon May 30 07:35:16 2022
    Martin Ward <martin@gkc.org.uk> writes:
    In practice, there is a lot more to writing a compiler than just
    partially evaluating an interpreter.

    In the 1990s I heard several talks about compiler generation in that
    way, and it was an old topic by then, with still no practical results,
    so my impression was that it never was going to be practical.

    But recently I heard great things about optimizing Truffle
    interpreters which eventually results in a compiler for the
    interpreted language. You probably still have to hold it right, but
    it's certainly much better than what I expected.

    I asked one of the people involved if there had been a breakthrough,
    but he could not name one, and said that there were just a lot of
    little problems to be solved.

    - anton
    --
    M. Anton Ertl
    anton@mips.complang.tuwien.ac.at
    http://www.complang.tuwien.ac.at/anton/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Hans-Peter Diettrich@21:1/5 to Thomas Koenig on Mon May 30 14:53:59 2022
    On 5/29/22 11:14 AM, Thomas Koenig wrote:
    Roughly speaking a)-c) is the front end, d) is the middle end and
    e)-f) is the back end.

    According to CoCo/R a "compiler" can be anything that deals with formal
    input. A generator of binary programs is only one kind of a compiler,
    another one is a pretty printer.

    I mention this because IMO there is much more demand for non-programming language processors (HTML, XML...) than for programming language
    compilers. I'd appreciate a word of our esteemed mod about the term
    "compiler" in this group.

    DoDi
    [We chose the name comp.compilers in the 1980s and it's not going to change, but I've always considered anything related to analysis or translation of computer
    languages to be on topic. -John]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kaz Kylheku@21:1/5 to Roger L Costello on Mon May 30 20:20:44 2022
    On 2022-05-28, Roger L Costello <costello@mitre.org> wrote:
    Hi Folks,

    There are lexer generators. Flex is a lexer generator.

    There are parser generators. Bison is a parser generator.

    Are there compiler generators?

    Writing a compiler isn't something that, as a whole, requires code
    generation.

    The reason is that the bulk of it can be handled by libraries.

    Say you want to write a working compiler with minimal effort.

    You have some input syntax and so for that there is scanner and parser generation with which you write your front end.

    What helps you avoid writing the middle, and the back-end from scratch?
    More code generation of some kind?

    Quite simply, some compiler construction components which have API's
    that you use. Ideally speaking, here is something like how it goes: in
    your parser's actions, you use the API's to generate the right
    intermediate representation of the code, and then hand it off to some
    function that handles the remaining processing. The resulting compiler
    then fits into an existing toolchain framework as a first-class
    citizen.

    There isn't any part of that process which screams "this needs code generation". Really, doing the pattern matching to handle your source
    syntax and translate it into the intermediate representation is the main
    part of the system that benefits from code generation, and
    that's about it.

    What could help you would be a specialized parser construction language
    which has some constructs that can be used in the actions to ease the translations to the intermediate form. That would then be closer to a
    "compiler generator", perhaps: something that not only writes the
    parser, where you write procedural code for every phrase structure rule,
    but something which has a rewrite notation that lets write the source->intermediate translation scheme declaratively, using higher
    level concepts from that particular intermediate rep, rather than, say,
    low C language calls into its API.

    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Hans-Peter Diettrich@21:1/5 to Hans-Peter Diettrich on Tue May 31 12:57:18 2022
    On 5/30/22 2:53 PM, Hans-Peter Diettrich wrote:

    analysis or translation of computer languages to be on topic. -John]

    What are those "computer" languages? I'd prefer "formal" languages
    (Chomsky...) instead. E.g. Meta§ also was used for DNA analysis.

    DoDi
    [I was going to say artificial languages but I don't think we have anything useful to say about Esperanto. In practice it hasn't been very hard to keep discussions more or less on topic. -John]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From gah4@21:1/5 to Hans-Peter Diettrich on Tue May 31 16:55:22 2022
    On Tuesday, May 31, 2022 at 8:20:08 AM UTC-7, Hans-Peter Diettrich wrote:
    On 5/30/22 2:53 PM, Hans-Peter Diettrich wrote:

    analysis or translation of computer languages to be on topic. -John]
    What are those "computer" languages? I'd prefer "formal" languages (Chomsky...) instead. E.g. Meta§ also was used for DNA analysis.

    [I was going to say artificial languages but I don't think we have anything useful to say about Esperanto. In practice it hasn't been very hard to keep discussions more or less on topic. -John]

    Definitely interpreters and macro processors have been discussed,
    though some might not call them compilers. Also text processors
    like TeX.

    I am wondering, though, about (human) language translators.
    It seems that many use non-deterministic AI systems, and so are
    fundamentally different from most of what is discussed here.

    If you parse Esperanto with a Flex/Bison parser then it should
    be fine here. If you parse Fortran with a deep neural net, then
    maybe not.
    [That's essentially what I've been thinking. In the 1950s and 1960s
    there was a lot of work trying to do human language translation using
    formal methods and it worked very poorly, sort of adequate for
    translating technical manuals, not for anything else. The breakthrough
    was when someone at Google realized that the Candian parliament's
    Hansard, the transcript of debates, had high quality parallel
    French/English translations going back a century and they fed it
    into their machine learning systems. -John]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Christopher F Clark@21:1/5 to All on Wed Jun 1 14:07:46 2022
    Although there are many wonderful examples in replies already. I want
    to mention the Eli "system"
    http://eli-project.sourceforge.net/

    It was (is) an attempt to do exactly that. It did so by cobbling
    together different tools that different parts of a compiler. One in
    particular is worth mentioning (Oil) which handled static type systems
    and which types were compatible and when coercions were applicable. I
    never found the overall system to be that usable because it didn't
    really unify the tools to work well together, so that the notations
    tended to stay divergent.

    In fact, that is partially why in our (Compiler Resources) version of
    Yacc++ we integrated the lexer and used Yacc-like notation for it. We
    actually did it, because using one notation simply makes sense, but
    seeing a tool which used multiple non-integrated notations simply
    reinforced that opinion.

    -----

    But, the main point is that compilers are a big area with lots of
    little parts, some of them have tools and others don't. More
    importantly, much of the development in compilers since 1991 has been
    in hand-written parts. I don't think there is a single production
    quality compiler for C++ using anything but a hard-written parser.
    That to me is a shame. What we have instead of tools are more
    "frameworks" e.g. gcc and the gnu-compiler-collection, CLANG, even the
    MIPS suite (Chow's work). LLVM (and possiblu MLIR) are bright spots
    that the trend might be reversing. It's a pendulum, swinging one way
    than the other.

    ------

    And finally to Dr Dietrich's and our moderators point. I think that
    compilers hand-written or tool generated are all appropriate,
    including things like TeX and HTML etc. That would also include IDEs.
    However, AI systems that aren't self explanatory and which we don't
    know exactly what they recognize are probably not. That's a different technology. Fortunately, it sounds like there are AI researchers
    attempting to make AI systems that do "explain" what their reasoning
    is. I might include them especially if they help us advance our own understanding. The following article talks about it:

    https://www.quantamagazine.org/machine-scientists-distill-the-laws-of-physics-from-raw-data-20220510/

    ------

    Sorry, that I glommed three replies into one. It's just that I am
    still in a rush to get some software finished for my "day" job
    (despite having once retired but then un-retired to do interesting
    stuff and some not so interesting) and taking time to write these
    replies definitely impacts that.

    Best regards,
    Chris

    -- ****************************************************************************** Chris Clark email: christopher.f.clark@compiler-resources.com Compiler Resources, Inc. Web Site: http://world.std.com/~compres
    23 Bailey Rd voice: (508) 435-5016
    Berlin, MA 01503 USA twitter: @intel_chris ------------------------------------------------------------------------------

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Roger L Costello@21:1/5 to All on Wed Jun 1 11:23:43 2022
    Page 161 of the book "Introducing Compiling Techniques" by J.P. Bennett says: ---------------------
    Code generator generators

    Just as there are parser generators, so there are code generator
    generators. In general they all use the same idea, to match patterns
    of instructions generated by the front end to patterns of instructions available to the target machine.

    IBURG

    Iburg generates a fast tree parser. It takes a grammar describing the
    patterns to be matched, with actions that generate the target code.
    The grammar is augmented by the 'cost' of the code generated, and is
    in general ambiguous. The Iburg generated parser finds the lowest cost
    parse of a given sentence. Typically the cost will be the execution
    time of the code, but for a compiler concerned with compact code
    generation, it could be code size.

    The idea of using a grammar to describe target code, and finding the
    lowest cost parse, in order to generate the best code predates Iburg.
    An early system due to Susan Graham used YACC productions to achieve
    the same end.

    ---------------------

    Wow!

    So a compiler can be generated declaratively by using a set of
    declarative generator tools, e.g., Flex for lexical analysis, Bison
    for syntax/semantic analysis, and Iburg for code generation.

    Has anyone used this combination of tools to create a whole compiler?

    /Roger
    [I expect that someone has used lex, yacc, and iburg in the same
    compiler sometime in the past 30 years. But that doesn't mean that
    they combine into a compiler generator any more than a saw, a hammer,
    and a paintbrush combine into a house generator. They're tools, each
    does what it does. -John]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Roger L Costello on Wed Jun 1 18:05:13 2022
    Roger L Costello <costello@mitre.org> writes:
    So a compiler can be generated declaratively by using a set of
    declarative generator tools, e.g., Flex for lexical analysis, Bison
    for syntax/semantic analysis, and Iburg for code generation.

    Has anyone used this combination of tools to create a whole compiler?

    The students taking my compiler course build such compilers: one
    compiler per student, implementing a different small programming
    language every year. They also use Ox (an attribute grammar evaluator generator) for getting more structure into the in-between part.
    Instead of iburg, they can also use burg, which has the same
    interface.

    - anton
    --
    M. Anton Ertl
    anton@mips.complang.tuwien.ac.at
    http://www.complang.tuwien.ac.at/anton/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From gah4@21:1/5 to Roger L Costello on Wed Jun 1 14:02:02 2022
    On Wednesday, June 1, 2022 at 10:24:49 AM UTC-7, Roger L Costello wrote:

    (snip)

    So a compiler can be generated declaratively by using a set of
    declarative generator tools, e.g., Flex for lexical analysis, Bison
    for syntax/semantic analysis, and Iburg for code generation.

    (snip, our moderator mentioned)
    [I expect that someone has used lex, yacc, and iburg in the same
    compiler sometime in the past 30 years. But that doesn't mean that
    they combine into a compiler generator any more than a saw, a hammer,
    and a paintbrush combine into a house generator. They're tools, each
    does what it does. -John]

    There are now 3D printed houses, such as using computer controlled
    concrete pouring devices. One could accept a computerized design
    for a house, and then generate it in concrete. (I think you still need
    to do the interior finishing the old way.)

    One could write a system around flex, bison, and iburg, that would
    accept as input a description of the language, and the output
    machine instructions, that would call flex, bison, and iburg as
    needed. The description could be a slightly higher level,
    or as flex/bison input with minimal wrapper.
    (Maybe not so much optimization, as that tends to be more
    specialized. It could be useful as a first compiler for
    a new machine.)

    Many imperative languages are similar enough in structure
    that it might almost work. It might be especially interesting
    for those wanting to make small changes to existing
    languages.

    Well, I would choose a higher level language than the usual C
    for output of lexer and parser. A language with just the operations
    needed for a target compiler.

    One could, then, use the code generator not only for the output
    from the generated compiler, but for the compiler itself.
    (Assume it is for a new machine, with no compilers yet.)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From minforth@arcor.de@21:1/5 to Roger L Costello on Tue Jun 7 07:22:53 2022
    Roger L Costello schrieb am Mittwoch, 1. Juni 2022 um 19:24:49 UTC+2:
    So a compiler can be generated declaratively by using a set of
    declarative generator tools, e.g., Flex for lexical analysis, Bison
    for syntax/semantic analysis, and Iburg for code generation.

    Has anyone used this combination of tools to create a whole compiler?

    This is a hot AI research field. 'Deep compilers' touch topics like
    least cost parsing and optimization of other compilation steps.
    But AFAIU there is no such holistic thing like automatic complete
    compiler construction a la
    Compiler = f (grammar, software-infrastructure, target-hardware)

    Some state of the art overview: https://github.com/zwang4/awesome-machine-learning-in-compilers

    OTOH code generators based on graphical input are already "old hats".

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Koenig@21:1/5 to minf...@arcor.de on Tue Jun 7 18:34:12 2022
    minf...@arcor.de <minforth@arcor.de> schrieb:

    But AFAIU there is no such holistic thing like automatic complete
    compiler construction a la
    Compiler = f (grammar, software-infrastructure, target-hardware)

    This function misses "semantics" as an argument, probably the
    biggest hurdle.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From mac@21:1/5 to All on Thu Jun 9 14:12:51 2022
    Are there compiler generators?

    Our esteemed moderator no doubt remembers PQCC (https://en.wikipedia.org/wiki/PQCC) the Production-Quality-Compiler
    Compiler back in the 80s.

    Our idea of a production-quality compiler has evolved since then, both in language semantics and in code quality

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)