Forum: >>> Magnum BBS <<<

What attributes of a programming language simplify its implementation?

From Christopher F Clark@21:1/5 to All on Fri Sep 30 12:46:28 2022

I answered this question on Quora, but I think it is relevant to this
community (and I know I'll get discussion as a result)..

What attributes of a programming language simplify its implementation.

1. Simple semantics. That's it. Simple semantics. (Simple meaning
whatever is easy to implement. Not mathematical elegance. Not
consistency.)

How do you get there?

Have a very simple set of types. BASIC had numbers, strings, and arrays.
Don't worry about type conversions and floating point versus integer. Sweep that all under the rug. Whatever your implementation does, that's what it
does. (Even simpler is what a lot of shells do, you have just "strings" and
if the strings happen to be a number when you pass them to the "add
function", + operator, it does arithmetic. If they aren't it, whatever it
does is the definition.)

Do an interpreter rather than a compiler. Don't try to get "efficient"
machine code. Just get code that works, for your simple cases. See the paragraph above. Whatever your interpreter does, that's what it does.

Don't get fancy. The original C compilers were almost like BASIC, just
slightly more complex. And even though they were compilers not
interpreters. You got whatever code they generated. It just happened (well, actually a lot of theory went into making it "just happen") to easily match
the machine/assembly language of the machines of that era. Even the stuff
that was added to C was often done so to keep the implementation simple.
Header files are a good example. They let you put together slightly more complex programs, but they only work if the programmer uses them right. If
you have inconsistent conflicting header files, you get "undefined
behavior" a code word for "whatever the implementor decided to do".
Maybe (if you are lucky) you get an error, but maybe you get code that just doesn't work.
------------------------------

But static typing. No. It doesn't help. Simplicity of implementation wants
you to throw away all those types. What static typing gives you is reliable
and well-defined programs, not a simple implementation.

Ahead of time compilation, same thing. Does not make the implementation
easier. It has other attributes but simplicity of implementation is not necessarily one of them. (In some cases it can be simpler, but not always.
an interpreter is almost always simpler than any compiler for the same
amount of functionality.)
------------------------------

*Edit added:*

By the way, that's how many introductory Compiler classes are structured.
Take a language with a relatively simple language (C or Pascal are popular choices, lisp dialects are even simpler) and then throw things out. One
type "int" which is a fixed width (e.g. 32 bit) signed integer, no
conversions. Allow only one function "main". Allow only one arithmetic operation "add" (+). Allow only one comparison "equal" (==). If you are generating code rather than doing an interpreter, pick the simplest architecture you can (e.g. MIPS) and then only allow constants of 16 bits
so you don't need hi/lo. Now, you have a simple enough language that a
student can likely get it working in one semester (or even one quarter).

Believe it or not, that's actually how a lot of "real" compilers are
written. You do a "spike" that is pick one *exceptionally* simple case and
get it working end-to-end. Then, you build around that. If something looks, hard, you do a new spike that makes that issue as simple as possible and
get that working.
------------------------------

Even C++ was built that way. It started with a working C compiler as a
base(*). Then Stroustrup added, feature by feature (probably using C
macros) the things he wanted to make it object-oriented, to make it "C with classes". He didn't start with multiple-inheritance and templates and the
STL. You can even see the results of that in the design of C++.

I suspect the weird way that constructors take parameters as
ctor_name(arg1, arg2, arg3) comes from that. Ctors were probably initially turned into macros and that was C's syntax for macros. The fact that it
makes certain declarations ambiguous wasn't noticed because in the "spike"
they worked as intended. The complexity of the other case (how you
sometimes can't tell a function declaration from a constructor call) was ignored until later.

Similarly, the fact that you need to use "new" and "delete" instead of
"malloc" and "free". The same thing. In a spike that made it easy. Fixing malloc and free to know when things had ctors and initializing them
properly would have been more work. Adding new functions that did so was easier. Thus simplicity of implementation ruled and the complexity for
users was not factored in.

I could go on. Even later when C++ had a standards committee, things were
added one feature at a time. The STL didn't exist until after C++ has templates. The move semantics rules were a patch to fix up a case where
things that were initially simple didn't do what users wanted. But again,
they were done as a "spike" add only one feature at a time. And sometimes,
one has to add new features or specifications to fix up the interaction of
the features which slowly acreted.

*) And starting with a C compiler as a base, gave Stroustrup a simple model
to start with. Writing C code is easier than writing assembly code, even
for a PDP-11. Again, simplify as much as possible to make one's
implementation easy.

Lots of "lisp" interpreters are written in lisp, because that's an easy way
to express lisp's semantics. You then have a small program written in lisp, that you need to hand-implement. Once that program works, you bootstrap
your way up to the whole interpreter you want.

When we did a Jovial compiler at my first job, we started with PL/I macros
that gave us a subset of Jovial that we needed. We didn't worry about the
cases where the PL/I semantics weren't exactly the same as Jovial, we
weren't going to use those features anyway. Again, sweep any hard semantics under the rug and don't worry about them. Make your implementation simple
and accept whatever semantics it gives you. Label anything that doesn't
work the way you want in your implementation, "undefined behavior". ------------------------------

By the way Richard P Gabriel famously wrote about this, coining the phrase "Worse is better". Here <https://en.wikipedia.org/wiki/Worse_is_better>is a link to a Wikipedia article derived from his ideas.
-- ******************************************************************************

Chris Clark email: christopher.f.clark@compiler-resources.com Compiler Resources, Inc. Web Site: http://world.std.com/~compres
23 Bailey Rd voice: (508) 435-5016
Berlin, MA 01503 USA twitter: @intel_chris

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Martin Ward@21:1/5 to Christopher F Clark on Sat Oct 1 15:56:49 2022

On 30/09/2022 12:46, Christopher F Clark wrote:

What attributes of a programming language simplify its implementation.

1. Simple semantics. That's it. Simple semantics. (Simple meaning
whatever is easy to implement. Not mathematical elegance. Not
consistency.)

sweep any hard semantics under the rug and don't worry about them.
... Label anything that doesn't
work the way you want in your implementation, "undefined behavior".

This might be OK for a throw away student project (but I still
think students ought to understand the importance of elegance),
but for a production compiler/language that is going
to be used by lots of programmers for lots of projects,
it is a classic example of optimizing the wrong thing.

The tiny amount of time you saved with incomplete and inconsistent
behaviour is lost many times over as programmers spend hours debugging
weird behaviour, working around the missing or inconsistent semantics
and writing convoluted code to avoid undefined behaviour.

"Sweep any hard semantics under the rug": where the hackers can find
it and exploit the inevitable security holes created by the semantics
that your simple implementation happens to give you (that you labelled
as "undefined behaviour").

Make every single programmer who uses your compile do extra work in
every program they write, just so that you can save a little bit of
work in the design and implementation of your compiler because you
don't care about mathematical elegance or consistency.

C is filled with

By the way Richard P Gabriel famously wrote about this, coining the phrase "Worse is better".

Gabriel argued that "Worse is better" produced more *successful*
software than the MIT approach. This is true, of course, but success
of bad software is a bad thing, not a good thing. Highly successful
bad software has been filling the columns of comp.risks ever since it
began.

What does this C code print:

unsigned int plus_one = 1;
int minus_one = -1;

if (plus_one < minus_one)
printf("1 < -1");
else
printf("boring");

--
Martin

Dr Martin Ward | Email: martin@gkc.org.uk | http://www.gkc.org.uk G.K.Chesterton site: http://www.gkc.org.uk/gkc | Erdos number: 4

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From gah4@21:1/5 to All on Sat Oct 1 17:05:56 2022

On Saturday, October 1, 2022 at 12:28:25 PM UTC-7, Martin Ward wrote:

(snip)

It seems to me that there are two questions, more or less
syntax and semantics. Mattbe what the language allows you to ask,
and what happens when you do it.

I have always found Fortran had strange restrictions on what you
were allowed to do. Some because it made the compiler easier
to write (especially in the early days), but also because someone
thought you shouldn't do that.

REAL DO variables were added in Fortran 77, and then removed
not so much later.

On the other hand, PL/I is pretty good at allowing things, even if
there isn't much reason. I did this one in high school, not knowing
if it would actually work:

DCL (I, J, K, L) CHAR(100);
J='1';
K='100';
L='1';
DO I=J TO K BY L;
PUT LIST(I, SQRT(I));
END;

I suspect no designers of PL/I ever expected someone to try it,
but the ability is there, and compilers do it.

Now, it turns out that you have to add a few blanks to K, as the
loop comparison is done as a string compare. (I didn't guess
that until finding that the loop didn't end.)

C lets you do some things that it probably shouldn't, though.

Unlike many languages, the whole definition of PL/I was written
before writing the first compiler. (Not that all features were
implemented in the first compiler.)
[PL/I was a remarkably good language considering what a rush job it was but
it has plenty of odd things, e.g.

DCL (I, J, K) CHAR(3);
I = 1;
J = 2;
K = I+J;

What does K contain? Nope, it contains three spaces because the 1 and 2
are converted to ' 1' and ' 2', they're converted back to integer,
added, converted back to to a default size integer string
like ' 3' and string assignment truncates from the right. -John]

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Christopher F Clark@21:1/5 to All on Sun Oct 2 01:21:48 2022

I absolutely agree with Martin Ward's response:

it is a classic example of optimizing the wrong thing.

While, as he says, it might be useful in simplifying an assignment in a
course where you are trying to teach the fundamentals, it is the wrong
answer for just about any other usage. Even for something throwaway one is doing for oneself it is probably the wrong approach. Those throwaway
things often live longer than expected and are used far wider. You are trading a moments convenience for a lifetime of pain and regret.

As one of my mentors said, "I can get you something real fast if the answer doesn't have to be right". It can be easy to get also.

-- ******************************************************************************

Chris Clark email:
christopher.f.clark@compiler-resources.com
Compiler Resources, Inc. Web Site: http://world.std.com/~compres
23 Bailey Rd voice: (508) 435-5016
Berlin, MA 01503 USA twitter: @intel_chris ------------------------------------------------------------------------------

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From gah4@21:1/5 to our moderator on Sun Oct 2 00:11:53 2022

On Saturday, October 1, 2022 at 6:34:58 PM UTC-7, gah4 wrote:

(snip)

(our moderator wrote)

[PL/I was a remarkably good language considering what a rush job it was but it has plenty of odd things, e.g.

DCL (I, J, K) CHAR(3);
I = 1;
J = 2;
K = I+J;

What does K contain? Nope, it contains three spaces because the 1 and 2
are converted to ' 1' and ' 2', they're converted back to integer,
added, converted back to to a default size integer string
like ' 3' and string assignment truncates from the right. -John]

Yes.

In the DO loop example, the first try was with K='100';
but the second had three blanks before the 100. That way the
string comparison works.

And the SQRT is done in double precision.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Robin Vowels@21:1/5 to All on Mon Oct 3 12:34:14 2022

From: "gah4" <gah4@u.washington.edu>
Sent: Sunday, October 02, 2022 11:05 AM

On the other hand, PL/I is pretty good at allowing things, even if
there isn't much reason. I did this one in high school, not knowing
if it would actually work:

DCL (I, J, K, L) CHAR(100);
J='1';
K='100';
L='1';
DO I=J TO K BY L;
PUT LIST(I, SQRT(I));
END;

I suspect no designers of PL/I ever expected someone to try it,
but the ability is there, and compilers do it.

Now, it turns out that you have to add a few blanks to K, as the
loop comparison is done as a string compare. (I didn't guess
that until finding that the loop didn't end.)

C lets you do some things that it probably shouldn't, though.

Unlike many languages, the whole definition of PL/I was written
before writing the first compiler.

That's not true. The preprocessor was designed after the first
release. As well as that, some features of output were designed after
the first release.

(Not that all features were implemented in the first compiler.)
[PL/I was a remarkably good language considering what a rush job it was but it has plenty of odd things, e.g.

DCL (I, J, K) CHAR(3);
I = 1;
J = 2;
K = I+J;

This will not work either.
In the first place, the lengths of the strings are too small to accommodate
the converted integer constants, 1 and 2. The STRINGSIZE condition is raised at run-time.
In the second place, the length of K is too short to accommodate
the sum of I and J.
In the third place, the STRINGSIZE condition is raised at run-time
for the assignments to I, J, and K.
Apart from that, the compiler gives compile-time messages that,
in each of the three assignments, the string variables are too short to accommodate the values that are to be assigned to them.

What does K contain? Nope, it contains three spaces because the 1 and 2
are converted to ' 1' and ' 2', they're converted back to integer,
added, converted back to to a default size integer string
like ' 3' and string assignment truncates from the right. -John]

Again, not quite. Even if such a program were allowed to run,
the STRINGSIZE condition is raised. What happens after that depends on
what the programmer does to handle the condition.
[The IBM manuals say that STRINGSIZE is normally disabled. So you
can check for truncation if you want, but by default it won't
and you'll get the three spaces. -John]

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From robin51@dodo.com.au@21:1/5 to Robin Vowels on Mon Oct 3 15:59:52 2022

On 2022-10-03 12:34, Robin Vowels wrote:

From: "gah4" <gah4@u.washington.edu>
Sent: Sunday, October 02, 2022 11:05 AM

Unlike many languages, the whole definition of PL/I was written
before writing the first compiler.

That's not true. The preprocessor was designed after the first
release. As well as that, some features of output were designed after
the first release.

(Not that all features were implemented in the first compiler.)
[PL/I was a remarkably good language considering what a rush job it
was but
it has plenty of odd things, e.g.

DCL (I, J, K) CHAR(3);
I = 1;
J = 2;
K = I+J;

This will not work either.
In the first place, the lengths of the strings are too small to
accommodate
the converted integer constants, 1 and 2. The STRINGSIZE condition is
raised
at run-time.
In the second place, the length of K is too short to accommodate
the sum of I and J.
In the third place, the STRINGSIZE condition is raised at run-time
for the assignments to I, J, and K.
Apart from that, the compiler gives compile-time messages that,
in each of the three assignments, the string variables are too short to accommodate the values that are to be assigned to them.

What does K contain? Nope, it contains three spaces because the 1 and
2
are converted to ' 1' and ' 2', they're converted back to integer,
added, converted back to to a default size integer string
like ' 3' and string assignment truncates from the right. -John]

Again, not quite. Even if such a program were allowed to run,
the STRINGSIZE condition is raised. What happens after that depends on
what the programmer does to handle the condition.
[The IBM manuals say that STRINGSIZE is normally disabled. So you
can check for truncation if you want, but by default it won't
and you'll get the three spaces. -John]

You're forgetting about the three compile-time messages, warning
that all three strings will be truncated.

Yes, STRINGSIZE,** like STRINGRANGE***, SUBSCRIPTRANGE, SIZE,
FIXEDOVERFLOW*, etc
are not enabled by default.
Originally, these were not enabled because it took extra instructions
to implement the test on S/360.
However, it was patently evident that not having them enabled
wasted considerable time and effort in detecting programming errors.
I, for one, always enable these conditions. The extra instructions
and extra execution time are usually trivial and are unimportant.
______
* except for FIXEDOVERFLOW, for which such errors could be detected by
hardware on S/360, and produced an interrupt if enabled.
** STRINGSIZE was not in the early specifications for PL/I-F,
but was found from practice to be as important as the others
because truncation without warning could lead to errors.
*** STRINGRANGE was not in the early specifications for PL/I-F,
but was comparable to SUBSCRIPTRANGE in checking for out-of-bound
position references. It was found to be essential for detecting
programming errors.
[I agree that adding two numbers and getting three spaces was bad
practice, and there were ways to avoid shooting yourself in the foot.
My point was that there were situations where each individual step was reasonable, but the combination was absurd. Those are hard to
completely avoid. PL/I had a lot of them for entirely understandable
reasons. That didn't mean it was a hard language, rather that you
had to understand what you were doing. -John]

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From gah4@21:1/5 to rob...@dodo.com.au on Mon Oct 3 12:28:56 2022

On Monday, October 3, 2022 at 11:15:17 AM UTC-7, rob...@dodo.com.au wrote:

(snip on some use of PL/I)

Yes, STRINGSIZE,** like STRINGRANGE***, SUBSCRIPTRANGE, SIZE,
FIXEDOVERFLOW*, etc
are not enabled by default.
Originally, these were not enabled because it took extra instructions
to implement the test on S/360.

(and our moderator wrote)

[I agree that adding two numbers and getting three spaces was bad
practice, and there were ways to avoid shooting yourself in the foot.
My point was that there were situations where each individual step was reasonable, but the combination was absurd. Those are hard to
completely avoid. PL/I had a lot of them for entirely understandable
reasons. That didn't mean it was a hard language, rather that you
had to understand what you were doing. -John]

The reason for mentioning PL/I was because, in comparing to Fortran
it is easy to see where Fortran was designed for ease of implementation
(even though the designers will disagree), and PL/I for ease of use
(even though it is easy to find counterexamples.)

PL/I, in all the places where it makes any sense, allows for the
completely general form of expression. To do that, it allows for
some conversions which give surprising results.

Fortran instead restricts you from doing things that make
sense, to stop you from doing things that don't.

In the case of ENTRY (a rare language feature by now),
Fortran EQUIVALENCEs the different return values, where
PL/I allows for the appropriate conversion.

PL/I has (only) generic intrinsic functions. You can give any data
type, including CHAR, to SQRT. Fortran added generic intrinsic
functions in Fortran 77, though not quite completely. Yet you still
can't use SQRT on an integer type. It can't be that hard to
implement the conversion to floating point, but it might cause
other changes to the language.

In Fortran 77, they added the ability to use floating point data
type in DO loops. In Fortran 90, they removed that ability.
It can't be all that hard to implement, and there are problems
in using it, but they aren't all that bad.

Since PL/I allows the same type of expression everywhere, compilers
only need to implement it once. Fortran has complicated rules on
what kind of expression you can use where. Even though it
simplifies each one, compilers have to implement them all, and
use each one in the right place. Users have a hard time
remembering which one goes where.

In the end, Fortran rules meant to simplify the implementation
actually make it harder.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Thomas Koenig@21:1/5 to Christopher F Clark on Sat Oct 8 22:44:34 2022

Christopher F Clark <christopher.f.clark@compiler-resources.com> schrieb:

I answered this question on Quora, but I think it is relevant to this community (and I know I'll get discussion as a result)..

What attributes of a programming language simplify its implementation.

1. Simple semantics. That's it. Simple semantics. (Simple meaning
whatever is easy to implement. Not mathematical elegance. Not
consistency.)

How do you get there?

If ease of language implementation is the primary concern, then
one could use a stack-based language. Easy to write an interpreter
or compiler for, hard to write in the language itself, so it will
likely be very unpopular (but popularity wasn't in the list of
requirements).

Have a very simple set of types. BASIC had numbers, strings, and arrays. Don't worry about type conversions and floating point versus integer. Sweep that all under the rug.

You cannot "sweep it under the rug", you have to define the semantics somewhere. It is possible to define the semantics ad-hoc and not to
document them (which you seem to be advocating). That is a recipe
for problems later.

Whatever your implementation does, that's what it
does. (Even simpler is what a lot of shells do, you have just "strings" and if the strings happen to be a number when you pass them to the "add function", + operator, it does arithmetic. If they aren't it, whatever it does is the definition.)

That strikes me as a bad idea if the language is supposed to be
used for something in the real world. Ill-defined semantics are
a disservice to potential users (but not laying traps for the user
was not on the list of requirements, either).

[...]
[Sounds like we're on our way to reinventing Forth. It had (still has) famously tiny implementations. -John]

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From minforth@arcor.de@21:1/5 to Thomas Koenig on Mon Nov 14 05:14:29 2022

Thomas Koenig schrieb am Sonntag, 9. Oktober 2022 um 02:20:13 UTC+2:

Christopher F Clark <christoph...@compiler-resources.com> schrieb:

I answered this question on Quora, but I think it is relevant to this community (and I know I'll get discussion as a result)..

What attributes of a programming language simplify its implementation.
...

If ease of language implementation is the primary concern, then
one could use a stack-based language. Easy to write an interpreter
or compiler for, hard to write in the language itself, so it will
likely be very unpopular (but popularity wasn't in the list of
requirements).

Have a very simple set of types. BASIC had numbers, strings, and arrays. Don't worry about type conversions and floating point versus integer. Sweep that all under the rug.

You cannot "sweep it under the rug", you have to define the semantics somewhere. It is possible to define the semantics ad-hoc and not to
document them (which you seem to be advocating). That is a recipe
for problems later.

Whatever your implementation does, that's what it
does. (Even simpler is what a lot of shells do, you have just "strings" and if the strings happen to be a number when you pass them to the "add function", + operator, it does arithmetic. If they aren't it, whatever it does is the definition.)

That strikes me as a bad idea if the language is supposed to be
used for something in the real world. Ill-defined semantics are
a disservice to potential users (but not laying traps for the user
was not on the list of requirements, either).

[...]
[Sounds like we're on our way to reinventing Forth. It had (still has) famously tiny implementations. -John]

Reinventing old wheels is not much fun. But use Forth as your toolbox
to make your own DSL and you can go _very_ far without diving into
all those dragon books and gigabyte compilers and toolsets.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From gah4@21:1/5 to minf...@arcor.de on Tue Nov 15 06:09:52 2022

On Tuesday, November 15, 2022 at 2:31:08 AM UTC-8, minf...@arcor.de wrote:

(snip)

[...]
[Sounds like we're on our way to reinventing Forth. It had (still has) famously tiny implementations. -John]

Reinventing old wheels is not much fun. But use Forth as your toolbox
to make your own DSL and you can go _very_ far without diving into
all those dragon books and gigabyte compilers and toolsets.

Using lex/yacc or flex/bison, you can do it without going all that
deep into the books, or completely understanding them.

You can write C programs mostly without knowing how C compilers
work, and also for most other languages.

It then depends on how you define "simplify".

In the case of small embedded processors, where the size of
all the code is important, then you have to count code generated
by compilers and parser generators.

But most often, it is how much work it is for you.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	300
Nodes:	16 (2 / 14)
Uptime:	64:23:36
Calls:	6,712
Files:	12,244
Messages:	5,356,124

What attributes of a programming language simplify its implementation?

Who's Online

System Info