• What attributes of a programming language simplify its implementation?

    From Christopher F Clark@21:1/5 to All on Fri Sep 30 12:46:28 2022
    I answered this question on Quora, but I think it is relevant to this
    community (and I know I'll get discussion as a result)..

    What attributes of a programming language simplify its implementation.

    1. Simple semantics. That's it. Simple semantics. (Simple meaning
    whatever is easy to implement. Not mathematical elegance. Not
    consistency.)

    How do you get there?

    Have a very simple set of types. BASIC had numbers, strings, and arrays.
    Don't worry about type conversions and floating point versus integer. Sweep that all under the rug. Whatever your implementation does, that's what it
    does. (Even simpler is what a lot of shells do, you have just "strings" and
    if the strings happen to be a number when you pass them to the "add
    function", + operator, it does arithmetic. If they aren't it, whatever it
    does is the definition.)

    Do an interpreter rather than a compiler. Don't try to get "efficient"
    machine code. Just get code that works, for your simple cases. See the paragraph above. Whatever your interpreter does, that's what it does.

    Don't get fancy. The original C compilers were almost like BASIC, just
    slightly more complex. And even though they were compilers not
    interpreters. You got whatever code they generated. It just happened (well, actually a lot of theory went into making it "just happen") to easily match
    the machine/assembly language of the machines of that era. Even the stuff
    that was added to C was often done so to keep the implementation simple.
    Header files are a good example. They let you put together slightly more complex programs, but they only work if the programmer uses them right. If
    you have inconsistent conflicting header files, you get "undefined
    behavior" a code word for "whatever the implementor decided to do".
    Maybe (if you are lucky) you get an error, but maybe you get code that just doesn't work.
    ------------------------------

    But static typing. No. It doesn't help. Simplicity of implementation wants
    you to throw away all those types. What static typing gives you is reliable
    and well-defined programs, not a simple implementation.

    Ahead of time compilation, same thing. Does not make the implementation
    easier. It has other attributes but simplicity of implementation is not necessarily one of them. (In some cases it can be simpler, but not always.
    an interpreter is almost always simpler than any compiler for the same
    amount of functionality.)
    ------------------------------

    *Edit added:*

    By the way, that's how many introductory Compiler classes are structured.
    Take a language with a relatively simple language (C or Pascal are popular choices, lisp dialects are even simpler) and then throw things out. One
    type "int" which is a fixed width (e.g. 32 bit) signed integer, no
    conversions. Allow only one function "main". Allow only one arithmetic operation "add" (+). Allow only one comparison "equal" (==). If you are generating code rather than doing an interpreter, pick the simplest architecture you can (e.g. MIPS) and then only allow constants of 16 bits
    so you don't need hi/lo. Now, you have a simple enough language that a
    student can likely get it working in one semester (or even one quarter).

    Believe it or not, that's actually how a lot of "real" compilers are
    written. You do a "spike" that is pick one *exceptionally* simple case and
    get it working end-to-end. Then, you build around that. If something looks, hard, you do a new spike that makes that issue as simple as possible and
    get that working.
    ------------------------------

    Even C++ was built that way. It started with a working C compiler as a
    base(*). Then Stroustrup added, feature by feature (probably using C
    macros) the things he wanted to make it object-oriented, to make it "C with classes". He didn't start with multiple-inheritance and templates and the
    STL. You can even see the results of that in the design of C++.

    I suspect the weird way that constructors take parameters as
    ctor_name(arg1, arg2, arg3) comes from that. Ctors were probably initially turned into macros and that was C's syntax for macros. The fact that it
    makes certain declarations ambiguous wasn't noticed because in the "spike"
    they worked as intended. The complexity of the other case (how you
    sometimes can't tell a function declaration from a constructor call) was ignored until later.

    Similarly, the fact that you need to use "new" and "delete" instead of
    "malloc" and "free". The same thing. In a spike that made it easy. Fixing malloc and free to know when things had ctors and initializing them
    properly would have been more work. Adding new functions that did so was easier. Thus simplicity of implementation ruled and the complexity for
    users was not factored in.

    I could go on. Even later when C++ had a standards committee, things were
    added one feature at a time. The STL didn't exist until after C++ has templates. The move semantics rules were a patch to fix up a case where
    things that were initially simple didn't do what users wanted. But again,
    they were done as a "spike" add only one feature at a time. And sometimes,
    one has to add new features or specifications to fix up the interaction of
    the features which slowly acreted.

    *) And starting with a C compiler as a base, gave Stroustrup a simple model
    to start with. Writing C code is easier than writing assembly code, even
    for a PDP-11. Again, simplify as much as possible to make one's
    implementation easy.

    Lots of "lisp" interpreters are written in lisp, because that's an easy way
    to express lisp's semantics. You then have a small program written in lisp, that you need to hand-implement. Once that program works, you bootstrap
    your way up to the whole interpreter you want.

    When we did a Jovial compiler at my first job, we started with PL/I macros
    that gave us a subset of Jovial that we needed. We didn't worry about the
    cases where the PL/I semantics weren't exactly the same as Jovial, we
    weren't going to use those features anyway. Again, sweep any hard semantics under the rug and don't worry about them. Make your implementation simple
    and accept whatever semantics it gives you. Label anything that doesn't
    work the way you want in your implementation, "undefined behavior". ------------------------------

    By the way Richard P Gabriel famously wrote about this, coining the phrase "Worse is better". Here <https://en.wikipedia.org/wiki/Worse_is_better>is a link to a Wikipedia article derived from his ideas.
    -- ******************************************************************************

    Chris Clark email: christopher.f.clark@compiler-resources.com Compiler Resources, Inc. Web Site: http://world.std.com/~compres
    23 Bailey Rd voice: (508) 435-5016
    Berlin, MA 01503 USA twitter: @intel_chris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Martin Ward@21:1/5 to Christopher F Clark on Sat Oct 1 15:56:49 2022
    On 30/09/2022 12:46, Christopher F Clark wrote:
    What attributes of a programming language simplify its implementation.

    1. Simple semantics. That's it. Simple semantics. (Simple meaning
    whatever is easy to implement. Not mathematical elegance. Not
    consistency.)

    sweep any hard semantics under the rug and don't worry about them.
    ... Label anything that doesn't
    work the way you want in your implementation, "undefined behavior".

    This might be OK for a throw away student project (but I still
    think students ought to understand the importance of elegance),
    but for a production compiler/language that is going
    to be used by lots of programmers for lots of projects,
    it is a classic example of optimizing the wrong thing.

    The tiny amount of time you saved with incomplete and inconsistent
    behaviour is lost many times over as programmers spend hours debugging
    weird behaviour, working around the missing or inconsistent semantics
    and writing convoluted code to avoid undefined behaviour.

    "Sweep any hard semantics under the rug": where the hackers can find
    it and exploit the inevitable security holes created by the semantics
    that your simple implementation happens to give you (that you labelled
    as "undefined behaviour").

    Make every single programmer who uses your compile do extra work in
    every program they write, just so that you can save a little bit of
    work in the design and implementation of your compiler because you
    don't care about mathematical elegance or consistency.

    C is filled with

    By the way Richard P Gabriel famously wrote about this, coining the phrase "Worse is better".

    Gabriel argued that "Worse is better" produced more *successful*
    software than the MIT approach. This is true, of course, but success
    of bad software is a bad thing, not a good thing. Highly successful
    bad software has been filling the columns of comp.risks ever since it
    began.

    What does this C code print:

    unsigned int plus_one = 1;
    int minus_one = -1;

    if (plus_one < minus_one)
    printf("1 < -1");
    else
    printf("boring");

    --
    Martin

    Dr Martin Ward | Email: martin@gkc.org.uk | http://www.gkc.org.uk G.K.Chesterton site: http://www.gkc.org.uk/gkc | Erdos number: 4

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From gah4@21:1/5 to All on Sat Oct 1 17:05:56 2022
    On Saturday, October 1, 2022 at 12:28:25 PM UTC-7, Martin Ward wrote:

    (snip)

    It seems to me that there are two questions, more or less
    syntax and semantics. Mattbe what the language allows you to ask,
    and what happens when you do it.

    I have always found Fortran had strange restrictions on what you
    were allowed to do. Some because it made the compiler easier
    to write (especially in the early days), but also because someone
    thought you shouldn't do that.

    REAL DO variables were added in Fortran 77, and then removed
    not so much later.

    On the other hand, PL/I is pretty good at allowing things, even if
    there isn't much reason. I did this one in high school, not knowing
    if it would actually work:

    DCL (I, J, K, L) CHAR(100);
    J='1';
    K='100';
    L='1';
    DO I=J TO K BY L;
    PUT LIST(I, SQRT(I));
    END;

    I suspect no designers of PL/I ever expected someone to try it,
    but the ability is there, and compilers do it.

    Now, it turns out that you have to add a few blanks to K, as the
    loop comparison is done as a string compare. (I didn't guess
    that until finding that the loop didn't end.)

    C lets you do some things that it probably shouldn't, though.

    Unlike many languages, the whole definition of PL/I was written
    before writing the first compiler. (Not that all features were
    implemented in the first compiler.)
    [PL/I was a remarkably good language considering what a rush job it was but
    it has plenty of odd things, e.g.

    DCL (I, J, K) CHAR(3);
    I = 1;
    J = 2;
    K = I+J;

    What does K contain? Nope, it contains three spaces because the 1 and 2
    are converted to ' 1' and ' 2', they're converted back to integer,
    added, converted back to to a default size integer string
    like ' 3' and string assignment truncates from the right. -John]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Christopher F Clark@21:1/5 to All on Sun Oct 2 01:21:48 2022
    I absolutely agree with Martin Ward's response:

    it is a classic example of optimizing the wrong thing.

    While, as he says, it might be useful in simplifying an assignment in a
    course where you are trying to teach the fundamentals, it is the wrong
    answer for just about any other usage. Even for something throwaway one is doing for oneself it is probably the wrong approach. Those throwaway
    things often live longer than expected and are used far wider. You are trading a moments convenience for a lifetime of pain and regret.

    As one of my mentors said, "I can get you something real fast if the answer doesn't have to be right". It can be easy to get also.

    -- ******************************************************************************

    Chris Clark email:
    christopher.f.clark@compiler-resources.com
    Compiler Resources, Inc. Web Site: http://world.std.com/~compres
    23 Bailey Rd voice: (508) 435-5016
    Berlin, MA 01503 USA twitter: @intel_chris ------------------------------------------------------------------------------

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From gah4@21:1/5 to our moderator on Sun Oct 2 00:11:53 2022
    On Saturday, October 1, 2022 at 6:34:58 PM UTC-7, gah4 wrote:

    (snip)

    (our moderator wrote)

    [PL/I was a remarkably good language considering what a rush job it was but it has plenty of odd things, e.g.

    DCL (I, J, K) CHAR(3);
    I = 1;
    J = 2;
    K = I+J;

    What does K contain? Nope, it contains three spaces because the 1 and 2
    are converted to ' 1' and ' 2', they're converted back to integer,
    added, converted back to to a default size integer string
    like ' 3' and string assignment truncates from the right. -John]

    Yes.

    In the DO loop example, the first try was with K='100';
    but the second had three blanks before the 100. That way the
    string comparison works.

    And the SQRT is done in double precision.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Robin Vowels@21:1/5 to All on Mon Oct 3 12:34:14 2022
    From: "gah4" <gah4@u.washington.edu>
    Sent: Sunday, October 02, 2022 11:05 AM


    On the other hand, PL/I is pretty good at allowing things, even if
    there isn't much reason. I did this one in high school, not knowing
    if it would actually work:

    DCL (I, J, K, L) CHAR(100);
    J='1';
    K='100';
    L='1';
    DO I=J TO K BY L;
    PUT LIST(I, SQRT(I));
    END;

    I suspect no designers of PL/I ever expected someone to try it,
    but the ability is there, and compilers do it.

    Now, it turns out that you have to add a few blanks to K, as the
    loop comparison is done as a string compare. (I didn't guess
    that until finding that the loop didn't end.)

    C lets you do some things that it probably shouldn't, though.

    Unlike many languages, the whole definition of PL/I was written
    before writing the first compiler.

    That's not true. The preprocessor was designed after the first
    release. As well as that, some features of output were designed after
    the first release.

    (Not that all features were implemented in the first compiler.)
    [PL/I was a remarkably good language considering what a rush job it was but it has plenty of odd things, e.g.

    DCL (I, J, K) CHAR(3);
    I = 1;
    J = 2;
    K = I+J;

    This will not work either.
    In the first place, the lengths of the strings are too small to accommodate
    the converted integer constants, 1 and 2. The STRINGSIZE condition is raised at run-time.
    In the second place, the length of K is too short to accommodate
    the sum of I and J.
    In the third place, the STRINGSIZE condition is raised at run-time
    for the assignments to I, J, and K.
    Apart from that, the compiler gives compile-time messages that,
    in each of the three assignments, the string variables are too short to accommodate the values that are to be assigned to them.

    What does K contain? Nope, it contains three spaces because the 1 and 2
    are converted to ' 1' and ' 2', they're converted back to integer,
    added, converted back to to a default size integer string
    like ' 3' and string assignment truncates from the right. -John]

    Again, not quite. Even if such a program were allowed to run,
    the STRINGSIZE condition is raised. What happens after that depends on
    what the programmer does to handle the condition.
    [The IBM manuals say that STRINGSIZE is normally disabled. So you
    can check for truncation if you want, but by default it won't
    and you'll get the three spaces. -John]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From robin51@dodo.com.au@21:1/5 to Robin Vowels on Mon Oct 3 15:59:52 2022
    On 2022-10-03 12:34, Robin Vowels wrote:
    From: "gah4" <gah4@u.washington.edu>
    Sent: Sunday, October 02, 2022 11:05 AM

    Unlike many languages, the whole definition of PL/I was written
    before writing the first compiler.

    That's not true. The preprocessor was designed after the first
    release. As well as that, some features of output were designed after
    the first release.

    (Not that all features were implemented in the first compiler.)
    [PL/I was a remarkably good language considering what a rush job it
    was but
    it has plenty of odd things, e.g.

    DCL (I, J, K) CHAR(3);
    I = 1;
    J = 2;
    K = I+J;

    This will not work either.
    In the first place, the lengths of the strings are too small to
    accommodate
    the converted integer constants, 1 and 2. The STRINGSIZE condition is
    raised
    at run-time.
    In the second place, the length of K is too short to accommodate
    the sum of I and J.
    In the third place, the STRINGSIZE condition is raised at run-time
    for the assignments to I, J, and K.
    Apart from that, the compiler gives compile-time messages that,
    in each of the three assignments, the string variables are too short to accommodate the values that are to be assigned to them.

    What does K contain? Nope, it contains three spaces because the 1 and
    2
    are converted to ' 1' and ' 2', they're converted back to integer,
    added, converted back to to a default size integer string
    like ' 3' and string assignment truncates from the right. -John]

    Again, not quite. Even if such a program were allowed to run,
    the STRINGSIZE condition is raised. What happens after that depends on
    what the programmer does to handle the condition.
    [The IBM manuals say that STRINGSIZE is normally disabled. So you
    can check for truncation if you want, but by default it won't
    and you'll get the three spaces. -John]

    You're forgetting about the three compile-time messages, warning
    that all three strings will be truncated.

    Yes, STRINGSIZE,** like STRINGRANGE***, SUBSCRIPTRANGE, SIZE,
    FIXEDOVERFLOW*, etc
    are not enabled by default.
    Originally, these were not enabled because it took extra instructions
    to implement the test on S/360.
    However, it was patently evident that not having them enabled
    wasted considerable time and effort in detecting programming errors.
    I, for one, always enable these conditions. The extra instructions
    and extra execution time are usually trivial and are unimportant.
    ______
    * except for FIXEDOVERFLOW, for which such errors could be detected by
    hardware on S/360, and produced an interrupt if enabled.
    ** STRINGSIZE was not in the early specifications for PL/I-F,
    but was found from practice to be as important as the others
    because truncation without warning could lead to errors.
    *** STRINGRANGE was not in the early specifications for PL/I-F,
    but was comparable to SUBSCRIPTRANGE in checking for out-of-bound
    position references. It was found to be essential for detecting
    programming errors.
    [I agree that adding two numbers and getting three spaces was bad
    practice, and there were ways to avoid shooting yourself in the foot.
    My point was that there were situations where each individual step was reasonable, but the combination was absurd. Those are hard to
    completely avoid. PL/I had a lot of them for entirely understandable
    reasons. That didn't mean it was a hard language, rather that you
    had to understand what you were doing. -John]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From gah4@21:1/5 to rob...@dodo.com.au on Mon Oct 3 12:28:56 2022
    On Monday, October 3, 2022 at 11:15:17 AM UTC-7, rob...@dodo.com.au wrote:

    (snip on some use of PL/I)

    Yes, STRINGSIZE,** like STRINGRANGE***, SUBSCRIPTRANGE, SIZE,
    FIXEDOVERFLOW*, etc
    are not enabled by default.
    Originally, these were not enabled because it took extra instructions
    to implement the test on S/360.

    (and our moderator wrote)

    [I agree that adding two numbers and getting three spaces was bad
    practice, and there were ways to avoid shooting yourself in the foot.
    My point was that there were situations where each individual step was reasonable, but the combination was absurd. Those are hard to
    completely avoid. PL/I had a lot of them for entirely understandable
    reasons. That didn't mean it was a hard language, rather that you
    had to understand what you were doing. -John]

    The reason for mentioning PL/I was because, in comparing to Fortran
    it is easy to see where Fortran was designed for ease of implementation
    (even though the designers will disagree), and PL/I for ease of use
    (even though it is easy to find counterexamples.)

    PL/I, in all the places where it makes any sense, allows for the
    completely general form of expression. To do that, it allows for
    some conversions which give surprising results.

    Fortran instead restricts you from doing things that make
    sense, to stop you from doing things that don't.

    In the case of ENTRY (a rare language feature by now),
    Fortran EQUIVALENCEs the different return values, where
    PL/I allows for the appropriate conversion.

    PL/I has (only) generic intrinsic functions. You can give any data
    type, including CHAR, to SQRT. Fortran added generic intrinsic
    functions in Fortran 77, though not quite completely. Yet you still
    can't use SQRT on an integer type. It can't be that hard to
    implement the conversion to floating point, but it might cause
    other changes to the language.

    In Fortran 77, they added the ability to use floating point data
    type in DO loops. In Fortran 90, they removed that ability.
    It can't be all that hard to implement, and there are problems
    in using it, but they aren't all that bad.

    Since PL/I allows the same type of expression everywhere, compilers
    only need to implement it once. Fortran has complicated rules on
    what kind of expression you can use where. Even though it
    simplifies each one, compilers have to implement them all, and
    use each one in the right place. Users have a hard time
    remembering which one goes where.

    In the end, Fortran rules meant to simplify the implementation
    actually make it harder.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Koenig@21:1/5 to Christopher F Clark on Sat Oct 8 22:44:34 2022
    Christopher F Clark <christopher.f.clark@compiler-resources.com> schrieb:
    I answered this question on Quora, but I think it is relevant to this community (and I know I'll get discussion as a result)..

    What attributes of a programming language simplify its implementation.

    1. Simple semantics. That's it. Simple semantics. (Simple meaning
    whatever is easy to implement. Not mathematical elegance. Not
    consistency.)

    How do you get there?

    If ease of language implementation is the primary concern, then
    one could use a stack-based language. Easy to write an interpreter
    or compiler for, hard to write in the language itself, so it will
    likely be very unpopular (but popularity wasn't in the list of
    requirements).

    Have a very simple set of types. BASIC had numbers, strings, and arrays. Don't worry about type conversions and floating point versus integer. Sweep that all under the rug.

    You cannot "sweep it under the rug", you have to define the semantics somewhere. It is possible to define the semantics ad-hoc and not to
    document them (which you seem to be advocating). That is a recipe
    for problems later.

    Whatever your implementation does, that's what it
    does. (Even simpler is what a lot of shells do, you have just "strings" and if the strings happen to be a number when you pass them to the "add function", + operator, it does arithmetic. If they aren't it, whatever it does is the definition.)

    That strikes me as a bad idea if the language is supposed to be
    used for something in the real world. Ill-defined semantics are
    a disservice to potential users (but not laying traps for the user
    was not on the list of requirements, either).

    [...]
    [Sounds like we're on our way to reinventing Forth. It had (still has) famously tiny implementations. -John]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From minforth@arcor.de@21:1/5 to Thomas Koenig on Mon Nov 14 05:14:29 2022
    Thomas Koenig schrieb am Sonntag, 9. Oktober 2022 um 02:20:13 UTC+2:
    Christopher F Clark <christoph...@compiler-resources.com> schrieb:
    I answered this question on Quora, but I think it is relevant to this community (and I know I'll get discussion as a result)..

    What attributes of a programming language simplify its implementation.
    ...

    If ease of language implementation is the primary concern, then
    one could use a stack-based language. Easy to write an interpreter
    or compiler for, hard to write in the language itself, so it will
    likely be very unpopular (but popularity wasn't in the list of
    requirements).
    Have a very simple set of types. BASIC had numbers, strings, and arrays. Don't worry about type conversions and floating point versus integer. Sweep that all under the rug.
    You cannot "sweep it under the rug", you have to define the semantics somewhere. It is possible to define the semantics ad-hoc and not to
    document them (which you seem to be advocating). That is a recipe
    for problems later.
    Whatever your implementation does, that's what it
    does. (Even simpler is what a lot of shells do, you have just "strings" and if the strings happen to be a number when you pass them to the "add function", + operator, it does arithmetic. If they aren't it, whatever it does is the definition.)
    That strikes me as a bad idea if the language is supposed to be
    used for something in the real world. Ill-defined semantics are
    a disservice to potential users (but not laying traps for the user
    was not on the list of requirements, either).

    [...]
    [Sounds like we're on our way to reinventing Forth. It had (still has) famously tiny implementations. -John]

    Reinventing old wheels is not much fun. But use Forth as your toolbox
    to make your own DSL and you can go _very_ far without diving into
    all those dragon books and gigabyte compilers and toolsets.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From gah4@21:1/5 to minf...@arcor.de on Tue Nov 15 06:09:52 2022
    On Tuesday, November 15, 2022 at 2:31:08 AM UTC-8, minf...@arcor.de wrote:

    (snip)

    [...]
    [Sounds like we're on our way to reinventing Forth. It had (still has) famously tiny implementations. -John]

    Reinventing old wheels is not much fun. But use Forth as your toolbox
    to make your own DSL and you can go _very_ far without diving into
    all those dragon books and gigabyte compilers and toolsets.

    Using lex/yacc or flex/bison, you can do it without going all that
    deep into the books, or completely understanding them.

    You can write C programs mostly without knowing how C compilers
    work, and also for most other languages.

    It then depends on how you define "simplify".

    In the case of small embedded processors, where the size of
    all the code is important, then you have to count code generated
    by compilers and parser generators.

    But most often, it is how much work it is for you.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)