This is to follow up on comments from another thread. It's prompted by a picture of a calculator which Bart posted:
https://upload.wikimedia.org/wikipedia/commons/9/9c/Hp35s_Calculator.jpg
I think the issue being discussed was what operators should be provided
by a language. I'd suggest almost none!
Don't get me wrong! AISI there should be standard operator /symbols/
such as +, *, >>, etc (and possibly some reserved operator words such as "and" and "or") and they should have precedences and associativities but
that they should be semantically meaningless. Instead, they would be
given meaning by the data types to which they are applied.
That's not as strange as it may sound. In
A + B
the meaning of + will depend on whether A and B are ints or floats - or, possibly, strings or some user-defined type.
If the meaning depends on the types of the operands then it doesn't have
to be built in but can be defined by code /for the requisite type/. For example, if A and B are floats and there's a definition for the type
'float' which includes the following
type float
function "+"(float self, float other)
... code for float addition ...
then the code therein could carry out the necessary floating-point
addition. (I'm being deliberately vague about how as this topic is more
about invocation.)
The first point is that type float can be an external module and not
part of the language. The same is possibly true also of signed and
unsigned integers!?!
IOW the /language/ would define the precedences and associativity of
operator symbols or standard names but the meaning of those operators
would be defined /externally/.
The second point is that named prefix operations such as ABS and SIN -
and many like them - would not need to be part of the language. For one thing, they could be defined in external modules and, for another, they
would appear best as functions rather than as operators. For example, in
abs(y)
the abs function referred to would depend on whether y was an int or a
float, and a call such as
sin(x)
would work for x being a float but the name "sin" would be undefined if
x were an integer.
Going back to the calculator, aside from common operations surely most
of the more obscure actions one could find on a calculator such as statistical and financial operations would be most clearly expressed in
code by
op_name(args)
Further, that 'named prefix' form allows any number of functions to be apply-able to the types of its args. Possibly most of an advanced calculator's functions are unary, anyway, so the prefix form would be
more appropriate.
With function form there's no ambiguity about how any operations are
applied and the names give a clue as to the meaning. For example,
factorial(x)
gives a big clue as to what it might do....!
In sum, perhaps it would be better to build common operators into type libraries and not make them part of the language itself.
This is to follow up on comments from another thread. It's prompted by a picture
of a calculator which Bart posted:
https://upload.wikimedia.org/wikipedia/commons/9/9c/Hp35s_Calculator.jpg
I think the issue being discussed was what operators should be provided by a language. I'd suggest almost none!
Don't get me wrong! AISI there should be standard operator /symbols/ such as +,
*, >>, etc (and possibly some reserved operator words such as "and" and "or") and they should have precedences and associativities but that they should be semantically meaningless. Instead, they would be given meaning by the data types
to which they are applied.
That's not as strange as it may sound. In
A + B
the meaning of + will depend on whether A and B are ints or floats - or, possibly, strings or some user-defined type.
If the meaning depends on the types of the operands then it doesn't have to be
built in but can be defined by code /for the requisite type/. For example, if A
and B are floats and there's a definition for the type 'float' which includes the following
On 21/11/2021 18:55, James Harris wrote:
In sum, perhaps it would be better to build common operators into
type libraries and not make them part of the language itself.
You need a more complicated implementation, and one that implements
more advanced optimisations, which are going to be slower, to get
code of the same standard as you can get easily when things are
built-in.
On 21/11/2021 19:45, Bart wrote:
On 21/11/2021 18:55, James Harris wrote:
In sum, perhaps it would be better to build common operators into
type libraries and not make them part of the language itself.
I agree with Charles!
You need a more complicated implementation, and one that implements
more advanced optimisations, which are going to be slower, to get
code of the same standard as you can get easily when things are
built-in.
Doesn't follow. The library can have access to parts of the language that are not available to ordinary users, inc [eg] escapes
into assembler, inlining, access to the compiler's internal structures,
..., and can be optimised when the compiler is built.
keep the language definition simple,
not cluttered [as are, eg, Pascal
and C] by having to explain as part of the language what "add-ops" are
and how they relate to "mult-ops" or logical operators.
You are re-inventing Algol 68 again.
On 22/11/2021 20:34, Andy Walker wrote:
On 21/11/2021 19:45, Bart wrote:
On 21/11/2021 18:55, James Harris wrote:
In sum, perhaps it would be better to build common operators into
type libraries and not make them part of the language itself.
You need a more complicated implementation, and one that implements
more advanced optimisations, which are going to be slower, to get
code of the same standard as you can get easily when things are
built-in.
Doesn't follow. The library can have access to parts of the
language that are not available to ordinary users, inc [eg] escapes
into assembler, inlining, access to the compiler's internal structures,
..., and can be optimised when the compiler is built.
This is my point: you need:
* Inline assemby
* Inlining
* User-defined operator overloads
* The equivalent of C++'s constexpr
* A gcc-class optimiser
...
Too many advanced compilers features, just to keep the language 'small'.
Yet the user will see the small core features plus the ones implemented
via language features; they will still see a big language.
Where's the advantage, and who benefits?
One problem I still have is essentially overloading (that old
chestnut) and how to distinguish bound from unbound functions. (Bound
would be defined in a library for the type of their first argument.
Unbound would not. Yet calls to them might look remarkably similar....)
On 2021-11-23 17:39, James Harris wrote:
One problem I still have is essentially overloading (that old
chestnut) and how to distinguish bound from unbound functions. (Bound
would be defined in a library for the type of their first argument.
Unbound would not. Yet calls to them might look remarkably similar....)
The terms are "method" vs "free function."
Dedicated argument makes no sense, obviously. It is motivated by
inability to implement multiple dispatch, serves as an excuse.
One answer is per types. Free function would be defined on the whole
class, that makes them distinguishable as the type would be different.
Most languages allow free functions with the same type, arguably they
should not [unless inside the private implementation]. This again is problematic with modules and multiple dispatch.
So, basically, the issue is wide open to debate and there is no workable concept in sight.
On 23/11/2021 17:44, Dmitry A. Kazakov wrote:
On 2021-11-23 17:39, James Harris wrote:
One problem I still have is essentially overloading (that old
chestnut) and how to distinguish bound from unbound functions. (Bound
would be defined in a library for the type of their first argument.
Unbound would not. Yet calls to them might look remarkably similar....)
The terms are "method" vs "free function."
Maybe, although I think of methods as going along with classes, and
there are no classes (at least no inheritance) in what I have in mind.
One answer is per types. Free function would be defined on the whole
class, that makes them distinguishable as the type would be different.
What about the syntax of invocation? In
a = F(b, c)
it might be that F is bound to the type of its first argument, b, or it
might be that the programmer has defined F as an unbound function with
two parameters. There's nothing in the source to indicate which is
meant, and that seems to be a problem.
On 22/11/2021 23:08, Bart wrote:[...]
On 22/11/2021 20:34, Andy Walker wrote:
Doesn't follow. The library can have access to parts of theThis is my point: you need:
language that are not available to ordinary users, inc [eg] escapes
into assembler, inlining, access to the compiler's internal structures,
..., and can be optimised when the compiler is built.
* Inline assemby
I don't see why Andy says there should be escapes to assembly and I'd question your list, too.
This is my point: you need:You need a more complicated implementation, and one that implementsDoesn't follow. The library can have access to parts of the
more advanced optimisations, which are going to be slower, to get
code of the same standard as you can get easily when things are
built-in.
language that are not available to ordinary users, inc [eg] escapes
into assembler, inlining, access to the compiler's internal structures,
..., and can be optimised when the compiler is built.
* Inline assemby
* Inlining
* User-defined operator overloads
* The equivalent of C++'s constexpr
* A gcc-class optimiser
...
Too many advanced compilers features, just to keep the language 'small'.
Yet the user will see the small core features plus the ones
implemented via language features; they will still see a big
language.
Where's the advantage, and who benefits?
The point is toWhat, like Algol68? They seemed to have gone out of their way to make
keep the language definition simple,
the definition imcomprehensible!
(My a68g.exe is 2.8MB; my nearest language is 0.5MB. Just saying...)My copy of "a68g" [2.8.4] is 1183284 bytes; it has crept
On 2021-11-23 21:38, James Harris wrote:
On 23/11/2021 17:44, Dmitry A. Kazakov wrote:
On 2021-11-23 17:39, James Harris wrote:
What about the syntax of invocation? In
a = F(b, c)
it might be that F is bound to the type of its first argument, b, or
it might be that the programmer has defined F as an unbound function
with two parameters. There's nothing in the source to indicate which
is meant, and that seems to be a problem.
There is no problem. Let we have three types:
T1 a
T2 b
T3 c
Now, let F be not a method of b. This is what you mean, right?
The first argument of F would be class T2 rather than simple T2.
It
means that b will be "converted" to the class. [Under the model that b
keeps the type tag it is null conversion.] Then inside F it will
dispatch to the methods of T2.
Thus regardless on any visibility the semantics F regarding b could not change.
Methods persist and class-wide operations are always reduced to method
calls, which makes it safe.
Free functions of simple types in presence of classes of any kind is
asking for trouble.
On 23/11/2021 21:34, Dmitry A. Kazakov wrote:
On 2021-11-23 21:38, James Harris wrote:
On 23/11/2021 17:44, Dmitry A. Kazakov wrote:
On 2021-11-23 17:39, James Harris wrote:
...
What about the syntax of invocation? In
a = F(b, c)
it might be that F is bound to the type of its first argument, b, or
it might be that the programmer has defined F as an unbound function
with two parameters. There's nothing in the source to indicate which
is meant, and that seems to be a problem.
There is no problem. Let we have three types:
T1 a
T2 b
T3 c
Now, let F be not a method of b. This is what you mean, right?
I can't be sure whether you meant to say "not" in that but the idea is
that the definition of type T2 would include at least one function
called F as in
type T2
function F
The first argument of F would be class T2 rather than simple T2.
I don't know about that. AFAICS the first argument of F would be of
/type/ T2. That's all.
What to do? I don't know. AFAICS both forms of function definition have
their uses - and programmers are used to both forms.
Am open to suggestions!
The trouble is that some functions are naturally free/unbound, i.e. they either don't have any parameters or they are not naturally bound to the
type of their first parameter so in a call such as
F(x)
there's no way to tell whether F is free/bound or non-free/unbound.
On 2021-11-24 11:06, James Harris wrote:
On 23/11/2021 21:34, Dmitry A. Kazakov wrote:
On 2021-11-23 21:38, James Harris wrote:
On 23/11/2021 17:44, Dmitry A. Kazakov wrote:
On 2021-11-23 17:39, James Harris wrote:
...
What about the syntax of invocation? In
a = F(b, c)
it might be that F is bound to the type of its first argument, b, or
it might be that the programmer has defined F as an unbound function
with two parameters. There's nothing in the source to indicate which
is meant, and that seems to be a problem.
There is no problem. Let we have three types:
T1 a
T2 b
T3 c
Now, let F be not a method of b. This is what you mean, right?
I can't be sure whether you meant to say "not" in that but the idea is
that the definition of type T2 would include at least one function
called F as in
type T2
function F
You are talking about [misfortunate] lexical elements of your language.
The question is the semantics of these.
I guessed that this construct
would imply that F is a method in T2 argument/result and when F declared outside would be not, i.e. a free function.
The first argument of F would be class T2 rather than simple T2.
I don't know about that. AFAICS the first argument of F would be of
/type/ T2. That's all.
That means a plain free function, which has all problems expected. You
get what you asked for.
What to do? I don't know. AFAICS both forms of function definition
have their uses - and programmers are used to both forms.
Am open to suggestions!
I already explained that. If F were class-wide in b, it would have no
such problems because it would always go back to the methods. Any type-specific argument or result, or if you want alternative
terminology, any *contravariant* argument, is a trouble in presence of classes of whatever origin. Drop classes and you will have no problems.
Keep classes (implicit conversions included) and you will have problems
with contravariance. OK?
The trouble is that some functions are naturally free/unbound, i.e.
they either don't have any parameters or they are not naturally bound
to the type of their first parameter so in a call such as
F(x)
there's no way to tell whether F is free/bound or non-free/unbound.
Why should anybody care? It tells do F on x. The effect is determined by
the type of x. Qualifiers like "bound" are implementation details at
best, otherwise meaningless.
On 24/11/2021 11:18, Dmitry A. Kazakov wrote:
That means a plain free function, which has all problems expected. You
get what you asked for.
It is not possible to make all functions bound. For example,
time_now()
or
emit_newline()
cannot be bound to the types of their first parameters because neither
has a first parameter.
I already explained that. If F were class-wide in b, it would have no
such problems because it would always go back to the methods. Any
type-specific argument or result, or if you want alternative
terminology, any *contravariant* argument, is a trouble in presence of
classes of whatever origin. Drop classes and you will have no
problems. Keep classes (implicit conversions included) and you will
have problems with contravariance. OK?
Unfortunately, what you explained doesn't see to match because you are
using terms (such as contravariance) associated with subtyping when
there is no subtyping in the problem! And it's left me mystified as to
how all those terms relate to the issue at hand - if at all. :-(
When you explain things it might help if you were to do so in terms of
the mechanics rather than in terms of buzzwords/jargon.
Why should anybody care? It tells do F on x. The effect is determined
by the type of x. Qualifiers like "bound" are implementation details
at best, otherwise meaningless.
Well, if you saw
F(x)
where would you as a programmer look to find where F was defined?
Similarly, if F were to be defined in two places as in
type T2
function F....
and
function(T2 F, ....)
which one should the F(x) in
T2 x
F(x)
invoke?
so even if the same name were defined in both ways there would always be
a way to specify one or the other - with the bound form being easiest to specify.
On 22/11/2021 23:08, Bart wrote:
This is my point: you need:You need a more complicated implementation, and one that implementsDoesn't follow. The library can have access to parts of the
more advanced optimisations, which are going to be slower, to get
code of the same standard as you can get easily when things are
built-in.
language that are not available to ordinary users, inc [eg] escapes
into assembler, inlining, access to the compiler's internal structures,
..., and can be optimised when the compiler is built.
* Inline assemby
* Inlining
* User-defined operator overloads
[You don't need user-defined overloads unless your language
includes them. In which case you need them because of the language,
not because you've implemented them in a library.]
* The equivalent of C++'s constexpr
* A gcc-class optimiser
...
Too many advanced compilers features, just to keep the language 'small'.
If you're going to have these things, you need them anyway.
There is no clear-cut boundary between what compilers do and what
goes into a standard library. It doesn't matter whether it's the
compiler that decides that "A + B" turns into "load A, load B, add"
or the library that decides that.
In either case, syntax analysis
produces a node in the parse tree consisting of an operator and its
two [in this case] operands, or some near equivalent to that. How
that gets processed further is a matter of taste.
What, like Algol68? They seemed to have gone out of their way to make
the definition imcomprehensible!
That is because some people went out of their way to claim
that it is incomprehensible,
and others more recently [no names, no
pack drill] still go out of their way to ask me what a semicolon
means instead of looking it up themselves.
chart in Algol Bulletin gets the A68 syntax onto one side; the
A68R User Guide gets it, in larger type and with lots of white
space, into well under two sides; substantially less than usual representations of C, Pascal, Ada, ....
(My a68g.exe is 2.8MB; my nearest language is 0.5MB. Just saying...)My copy of "a68g" [2.8.4] is 1183284 bytes; it has crept
up over the decades from the 504749 bytes of 1.4.0, the earliest
A68G version I have. Algol itself hasn't changed, and A68G "then"
was missing only a few minor features, so I expect the increase
comes primarily from (a) much more extensive checking, (b) extra
language features such as partial parametrisation and the Torrix
extensions, and (c) additional library features such as Plotutils, PostgreSQL, linear algebra, and many others. I expect that if
your language included such things, it too would grow from 0.5MB
to around 1MB.
On 2021-11-24 16:35, James Harris wrote:
On 24/11/2021 11:18, Dmitry A. Kazakov wrote:
I already explained that. If F were class-wide in b, it would have no
such problems because it would always go back to the methods. Any
type-specific argument or result, or if you want alternative
terminology, any *contravariant* argument, is a trouble in presence
of classes of whatever origin. Drop classes and you will have no
problems. Keep classes (implicit conversions included) and you will
have problems with contravariance. OK?
Unfortunately, what you explained doesn't see to match because you are
using terms (such as contravariance) associated with subtyping when
there is no subtyping in the problem! And it's left me mystified as to
how all those terms relate to the issue at hand - if at all. :-(
As I said, no classes no problems. Your "bondage" stuff loses sense
without classes. It makes no difference how you declare subprograms and
where back in FORTRAN-IV. You want FORTRAN-IV be FORTRAN-IV! You want
more, learn the meaning of the "buzz words"...
When you explain things it might help if you were to do so in terms of
the mechanics rather than in terms of buzzwords/jargon.
All mechanics was explained 100 times over. You just skipped it all 100 times.
Similarly, if F were to be defined in two places as in
type T2
function F....
and
function(T2 F, ....)
I have no idea what the second is supposed to mean, a cat walking the keyboard?
The real world does not happen in jargon but in transformations of data
and data structures.
Similarly, if F were to be defined in two places as in
type T2
function F....
and
function(T2 F, ....)
I have no idea what the second is supposed to mean, a cat walking the
keyboard?
Sorry, it was pseudo C syntax. The type name T2 preceding the identifier
F was meant to indicate that F would be of type T2. The point of the two fragments of code was that in BOTH of them the first argument to F would
be of type T2. The problem, then, is to say which F would be invoked by
F(x)
where x was of type T2.
On 2021-11-26 18:21, James Harris wrote:
Similarly, if F were to be defined in two places as in
type T2
function F....
and
function(T2 F, ....)
I have no idea what the second is supposed to mean, a cat walking the
keyboard?
Sorry, it was pseudo C syntax. The type name T2 preceding the
identifier F was meant to indicate that F would be of type T2. The
point of the two fragments of code was that in BOTH of them the first
argument to F would be of type T2. The problem, then, is to say which
F would be invoked by
F(x)
where x was of type T2.
The call is ambiguous when F of both declarations is directly visible.
An important difference between a method and a free function is that
method cannot be hidden. If you see T2 or any object of you also see all methods of T2.
If you want to have an overloaded 'abs' operator or function* User-defined operator overloads[You don't need user-defined overloads unless your language
includes them. In which case you need them because of the language,
not because you've implemented them in a library.]
available to the user of the language (so I'm not specifying how),
then if it's not built-in, it will need the ability to do
user-overloads, plus inlining if you want it as efficient.
It matters very much. My static language is self-hosted, and canToo many advanced compilers features, just to keep the language 'small'.If you're going to have these things, you need them anyway.
There is no clear-cut boundary between what compilers do and what
goes into a standard library. It doesn't matter whether it's the
compiler that decides that "A + B" turns into "load A, load B, add"
or the library that decides that.
build itself entirely from source fast enough that I could do so ten
times every second if I was so inclined.
To me that is important; not that I need to do that 600 times a
minute, but it means an effectively zero edit-build-run cycle for similar-sized applications.
But I did recently glimpse it again at the back of that PDF. The
first thing you see is it talking about Protonotion, Metanotion, and Hypernotion.
A few pages further on, it's on about Moid, Mood and Mu.
You can't help but think that this is a language takes itself too
seriously! (I have also considered that it might all have been an
eloborate joke.)
An interesting exercise would be to take the language and create a
new definition without all those fancy terms.
AFAICS in Algol68, ";" and "," can both be used to separate things,
but "A;B" means do A then B, but "A,B" do A and B, in either order or concurrently.
Whether that applies to A[i,j,k] or F(a,b,c), I don't know.
However, it doesn't like F(a;b;c) when three arguments are expected.
So there is something more to it.
(Don't bother to explain; I'm never going to get it. I'm just not
surprised that Wirth left the Algol68 (or 'new Algol') group and
produced his own simpler language.)
Indeed. The question is what to do about the ambiguity. Options:
1. Pick the F function found in the type.
2. Pick the F function which is 'unbound'.
3. If the two functions are defined in different scopes pick the one
which is 'closer in' to where F is invoked.
4. Reject the expression because of the ambiguity.
I do not know which approach would be best.
On 24/11/2021 18:19, Bart wrote:
* User-defined operator overloads
So somewhere in your source there is code that says "A + B"
does not mean "pass A and B to a function called +" but instead notes
this as a special sort of node in the parse tree [or near equivalent]
and says "Aha! This compiles into ...". The effect is the same
whether this is built-in to the parser or is some "magic" in the
library. Which version is faster depends on how it is done; there
are pros and cons.
To me that is important; not that I need to do that 600 times a
minute, but it means an effectively zero edit-build-run cycle for
similar-sized applications.
No, it means an effectively zero build-run part of that
cycle. If it was ten times slower, it would have negligible
effect on the whole cycle,
unless it takes you less than a few
seconds to note the results of the run, work out what has gone
wrong [if anything], decide what you need to change, go to the
right place in the code, and type in the change.
[A68:]
But I did recently glimpse it again at the back of that PDF. The
first thing you see is it talking about Protonotion, Metanotion, and
Hypernotion.
Normal people largely skip that bit; it's just explaining
how two-level grammars work.
A few pages further on, it's on about Moid, Mood and Mu.
Actually about "MOID", etc. That's the metalanguage, the
higher of the two levels. It's how you get an infinite set of
lower-level syntax rules, escape the tyranny of BNF, and check
as a matter of syntax that variables are declared, etc.
You can't help but think that this is a language takes itself too
seriously! (I have also considered that it might all have been an
eloborate joke.)
You can't please everyone. A "MOID" is either a "MODE"
[type] or "void"; what should it be called, given that you want
to use it several hundred times?
So somewhere in your source there is code that says "A + B"Imagine if you were creating the bytecode for an interpreter.
does not mean "pass A and B to a function called +" but instead notes
this as a special sort of node in the parse tree [or near equivalent]
and says "Aha! This compiles into ...". The effect is the same
whether this is built-in to the parser or is some "magic" in the
library. Which version is faster depends on how it is done; there
are pros and cons.
Bytecode execution is already slow enough, you will surely want a
dedicated instruction for a such a common operation, rather than
involve a call/return sequence plus whatever other instructions are
needed.
Well, the bytecode for my interpeter [and many others!] has such an
ADD instruction. So does the VM used by my compiler, [...].
Why not specify it straightaway instead of going around the houses?
So I switched to native, and got 20 times faster.
Why would I want to make it slower?
With near-instant compilation you develop different ways of working.
I might edit, build and re-run just to the spacing right or fix some
annoying detail, which could take multiple builds a few seconds
apart.
[A68:](Did you mean Normally rather than Normal?)
But I did recently glimpse it again at the back of that PDF. TheNormal people largely skip that bit; it's just explaining
first thing you see is it talking about Protonotion, Metanotion, and
Hypernotion.
how two-level grammars work.
Except that it isn't a matter of syntax; that's too constraining.A few pages further on, it's on about Moid, Mood and Mu.Actually about "MOID", etc. That's the metalanguage, the
higher of the two levels. It's how you get an infinite set of
lower-level syntax rules, escape the tyranny of BNF, and check
as a matter of syntax that variables are declared, etc.
(But didn't you say its syntax could be described in 1-2 pages?)
You can't please everyone. A "MOID" is either a "MODE"A MODE. Especially if as you say, a MODE/VOID is used in 100s of
[type] or "void"; what should it be called, given that you want
to use it several hundred times?
places.
If it's necessary to discriminate at some point between VOID and any
other MODE, then do so more locally, or with a less silly-sounding
name.
On 27/11/2021 01:21, Bart wrote:
[I wrote:]
So somewhere in your source there is code that says "A + B"Imagine if you were creating the bytecode for an interpreter.
does not mean "pass A and B to a function called +" but instead notes
this as a special sort of node in the parse tree [or near equivalent]
and says "Aha! This compiles into ...". The effect is the same
whether this is built-in to the parser or is some "magic" in the
library. Which version is faster depends on how it is done; there
are pros and cons.
Bytecode execution is already slow enough, you will surely want a
dedicated instruction for a such a common operation, rather than
involve a call/return sequence plus whatever other instructions are
needed.
??? Of course it is common for back-end interpreters to have "add" instructions. That makes it less, rather than more,
important for the front end to do anything special with it. You
seem to be hung up in thinking that a library cannot do anything
but turn "A + B" into two parameters and a function call. If it
is integrated into the system, it has privileged access to the
innards of the compiler, the operating system, ....
Well, the bytecode for my interpeter [and many others!] has such an
ADD instruction. So does the VM used by my compiler, [...].
Why not specify it
So I switched to native, and got 20 times faster.
Why would I want to make it slower?
No reason why you should; I was merely pointing out that whether a compilation and run takes 0.01s [as a large majority of
mine do using Algol 68G, with most of the rest < 1s] or 1s makes
no interesting difference to the complete edit-compile-run cycle.
But you seem to be assuming that putting things into libraries is
/ipso facto/ significantly slower than building them in to the
syntax. It may be; or it could be faster; or it could make
absolutely no measurable difference, depending on implementation
details and other factors.
Except that it isn't a matter of syntax; that's too constraining.
It /is/ a matter of syntax in the RR; in what sense is
it constraining? If you use undeclared variables, do you not
want the compiler to detect the error?
Or do you prefer the
early versions of Fortran or C where the compiler assumes that
an undeclared "j" [for example] is an integer variable?
It's necessary throughout the RR. All part of being
careful with use of language. You find "MOID" silly; most
of us just find it a little quirky,
OK, you can understand that internally, a language+implementation can
do things as it pleases, just so long as the user can do A+B and it
does what is expected of it.
Which means /my/ implementation can choose to handle it as it likes
too.
Yet not long ago you seemed to keen to implement things like:
A +:= B
via operator-overload mechanisms expressed within the language
itself.
And more recently that you wanted new comparison operators to
be defined the same way.
In that latter case, the benefits of the language allowing such new
features without bothering the language or changing the compiler,
seemed to be outweighed by the poor quality and limitations of the
result (and its non-stardardness!)
But it seems my approach is perfectly valid too.
I've just used gcc to build a 4-line hello.c program; it took over 5
seconds even on my somewhat faster new PC, less than 1 line per
second!
But it's nothing to do with syntax!Except that it isn't a matter of syntax; that's too constraining.It /is/ a matter of syntax in the RR; in what sense is
it constraining? If you use undeclared variables, do you not
want the compiler to detect the error?
Syntax is just the shape of the
code. Here is the kind of syntax we've been using:
A + B
That's a perfectly valid expression in most normal languages. Except
in Algol68, because A and B haven't been declared?
It's far simpler to separate these different ideas: the shape of the
code; the resolving of the names (which may detect they are
undefined); the code generation, execution etc.
In my dynamic language, it doesn't care they might be undefined until runtime.
Instead, Algol68 chooses the most complex approach all, that is hard
to understand and difficult to implement.
Or do you prefer theIf that's what the language said, then that's what it will be. If I
early versions of Fortran or C where the compiler assumes that
an undeclared "j" [for example] is an integer variable?
execute this complete program of my 'Q' language:
print j
it says:
<Void>
(That's not an error, but most other uses are.)
It's not exactly the end of the world. In Algol68:
INT j;
print(j)
gives a runtime error. So j might be defined, but it hasn't done much
good! I guess the RR doesn't make correct /initialisation/ part of
the syntax?
[...] What I can confidently say is that, to me,
[the RR] is almost completely useless for any purpose.
On 28/11/2021 12:52, Bart wrote:
I've just used gcc to build a 4-line hello.c program; it took over 5
seconds even on my somewhat faster new PC, less than 1 line per
second!
Perhaps you should switch to plain old "cc"? Oh,
wait, on my machine that is a symlink to "gcc". On my home
PC, thirteen years old and nothing fancy, it compiles and
runs a "hello.c" program in 0.04s, only 3x slower than A68G.
[...] What I can confidently say is that, to me,
[the RR] is almost completely useless for any purpose.
That's fine. You're not writing an A68 compiler.
But it also means that you're equally almost completely
unqualified to comment on any matter of A68.
Here however are some benchmarks for 'fannkuch(10)':
Secs Lang Types Exec
A68G 50 A68 Static Interp? Win or WSL
A68G -optimise -- A68 (WSL: Failed)
qq -asmopt 2.2 Q Dynamic Interp (mildly accelerated)
gcc -O3 0.22 C Static Native
mm -opt 0.22 M Static Native
My stuff doesn't do too badly speedwise does it?
I couldn't get a68g -optimise to even, even on WSL, a shame as that 50-second timing to interpret /static/ code is poor.
On 29/11/2021 00:53, Bart wrote:
Here however are some benchmarks for 'fannkuch(10)':
Secs Lang Types Exec
A68G 50 A68 Static Interp? Win or WSL
A68G -optimise -- A68 (WSL: Failed)
qq -asmopt 2.2 Q Dynamic Interp (mildly accelerated)
gcc -O3 0.22 C Static Native
mm -opt 0.22 M Static Native
My stuff doesn't do too badly speedwise does it?
I couldn't get a68g -optimise to even, even on WSL, a shame as that
50-second timing to interpret /static/ code is poor.
$ cat hello.c
#include <stdio.h>
int main() {printf("Hello World!");}
$ time gcc hello.c && ./a.out
real 0m0.038s
user 0m0.034s
sys 0m0.004s
Hello World!$
$ cat hello.a68
BEGIN print("Hello World!") END
$ time a68g hello.a68
Hello World!
real 0m0.016s
user 0m0.008s
sys 0m0.008s
$
That's Ubuntu on a modern Intel 68-bit hardware.
On 29/11/2021 12:30, Charles Lindsey wrote:
On 29/11/2021 00:53, Bart wrote:
Here however are some benchmarks for 'fannkuch(10)':
Secs Lang Types Exec
A68G 50 A68 Static Interp? Win or WSL
A68G -optimise -- A68 (WSL: Failed)
qq -asmopt 2.2 Q Dynamic Interp (mildly accelerated)
gcc -O3 0.22 C Static Native
mm -opt 0.22 M Static Native
My stuff doesn't do too badly speedwise does it?
I couldn't get a68g -optimise to even, even on WSL, a shame as that 50-second
timing to interpret /static/ code is poor.
$ cat hello.c
#include <stdio.h>
int main() {printf("Hello World!");}
$ time gcc hello.c && ./a.out
real 0m0.038s
user 0m0.034s
sys 0m0.004s
Hello World!$
$ cat hello.a68
BEGIN print("Hello World!") END
$ time a68g hello.a68
Hello World!
real 0m0.016s
user 0m0.008s
sys 0m0.008s
$
That's Ubuntu on a modern Intel 68-bit hardware.
I'm still stuck on 64 bits unfortunately.
Regarding the above benchmark, the A68 code for that is given below.
So that I can properly fill in my table, I'd be grateful if you could run this
on your machine both with and without -optimise, as that flag doesn't work on my
WSL. Then I can estimate the runtime on my machine.
(I assume it transpiles to C then runs that C. The errors are within the A68 headers.)
-----------------------------------------------------------------------------------
INT res2;
PROC fannkuch=(INT n)INT: BEGIN
[n]INT p,q,s;
INT sign, maxflips, sum, q1, flips, qq, t, sx, i,j;
sign:=1;
maxflips:=0;
sum:=0;
FOR i TO n DO
p[i]:=i;
q[i]:=i;
s[i]:=i
OD;
DO
q1:=p[1];
IF q1/=1 THEN
FOR i FROM 2 TO n DO q[i]:=p[i] OD;
flips:=1;
DO
qq:=q[q1];
IF qq=1 THEN
sum+:=sign*flips;
IF flips>maxflips THEN
maxflips:=flips
FI;
exit1
FI;
q[q1]:=q1;
IF q1>=4 THEN
i:=2; j:=q1-1;
WHILE
t:=q[i]; q[i]:=q[j]; q[j]:=t;
i+:=1;
j-:=1;
i<j
DO SKIP OD
FI;
q1:=qq;
flips+:=1
OD;
exit1: SKIP
FI;
IF sign=1 THEN
t:=p[2]; p[2]:=p[1]; p[1]:=t;
sign:=-1
ELSE
t:=p[2]; p[2]:=p[3]; p[3]:=t;
sign:=1;
FOR i FROM 3 TO n DO
sx:=s[i];
IF sx/=1 THEN s[i]:=sx-1; exit2 FI;
IF i=n THEN
res2:=maxflips;
return
FI;
s[i]:=i;
t:=p[1];
FOR j TO i DO
p[j]:=p[j+1]
OD;
p[i+1]:=t
OD;
exit2: SKIP
FI
OD;
return:
sum
END;
INT n=10;
INT res;
res:=fannkuch(n);
print(("Fannkuch(",n,") = ",res," ", res2,newline)) -----------------------------------------------------------------------------------
Output should be:
Fannkuch( +10) = +73196 +38
On 29/11/2021 13:33, Bart wrote:
On 29/11/2021 12:30, Charles Lindsey wrote:
On 29/11/2021 00:53, Bart wrote:
Here however are some benchmarks for 'fannkuch(10)':
Secs Lang Types Exec
A68G 50 A68 Static Interp? Win or WSL
A68G -optimise -- A68 (WSL: Failed)
qq -asmopt 2.2 Q Dynamic Interp (mildly accelerated)
gcc -O3 0.22 C Static Native
mm -opt 0.22 M Static Native
Regarding the above benchmark, the A68 code for that is given below.
So that I can properly fill in my table, I'd be grateful if you could
run this on your machine both with and without -optimise, as that flag
doesn't work on my WSL. Then I can estimate the runtime on my machine.
(I assume it transpiles to C then runs that C. The errors are within
the A68 headers.)
$ time a68g --nooptimise fannkuch.a68
Fannkuch( +10) = +73196 +38
real 0m40.654s
user 0m40.644s
sys 0m0.005s
$
$
$ time a68g --optimise fannkuch.a68
Fannkuch( +10) = +73196 +38
real 0m13.150s
user 0m10.383s
sys 0m0.119s
$
On 22/11/2021 20:34, Andy Walker wrote:
On 21/11/2021 19:45, Bart wrote:
On 21/11/2021 18:55, James Harris wrote:
In sum, perhaps it would be better to build common operators into
type libraries and not make them part of the language itself.
????I agree with Charles!
You need a more complicated implementation, and one that implements
more advanced optimisations, which are going to be slower, to get
code of the same standard as you can get easily when things are
built-in.
????Doesn't follow.? The library can have access to parts of the
language that are not available to ordinary users, inc [eg] escapes
into assembler, inlining, access to the compiler's internal structures, ..., and can be optimised when the compiler is built.
This is my point: you need:
* Inline assemby
* Inlining
* User-defined operator overloads
* The equivalent of C++'s constexpr
* A gcc-class optimiser
...
Too many advanced compilers features, just to keep the language 'small'.
Bart <bc@freeuk.com> wrote:
* Inline assemby
* Inlining
* User-defined operator overloads
* The equivalent of C++'s constexpr
* A gcc-class optimiser
...
I use a language like this. You do need inlining and it is useful
only due to overloading (_not_ limited to operators). However,
other position on your list above are not needed. More precisely,
there is small core language which implements basic types and
corresponding operators. Basic library types simply ofer "view"
of basic types and operators from core language. Due to inlining
there is no runtime loss of efficiency compared to using core
language directly. But there are also fancy library types
which are infeasible to have as part of core language.
Too many advanced compilers features, just to keep the language 'small'.
You did not get it: purpose is to have managable implementation
of _very large_ language.
Low-level error prone parts are
small. Library types can be written by ordinary application
programmers.
Overloading can be implemented with quite small extra code.
Inlining is also easy to implement if you have internal
representation of larger code fragments (as opposed to
generating machine code directly after recognizing given
syntactic construct). Both put limits on maximal possible
compiler speed: with overloading you have more complex
symbol table and need more lookups. Intermediate representation
takes time to build (on top of time needed to generate code).
Also, inlining allows large object code from small sources,
so if you measure time per source line compier speed will
likely be worse than speed of compiler that forces you to
write each source line separately. Still, compiler can
be reasonably fast.
Anyway, increase in compiler complexity is relatively
small. Even for small languages in total balance
approach using overloading may give you saving in
effort spent on implementation.
Just as a simple
example let me mention Pascal. In Pascal there is
a buch of "builtin" operations (mostly on files).
Most of them are too complex to do by inline code
so practical compilers have runtime library implementing
them. But compiler still needs to parse those
"builtins" and convert them to runtime calls.
In a sense this is trivial task, but with realistic
compiler structure you need several lines of compiler
code to generate corresponding runtime call.
Bunch
of older languages like Cobol, Fortran, PL/I had
special "builtin" operations, some of them much
more than Pascal.
Let me add that there are more radical approaches:
there are languages in which "users" can redefine
scanner and/or parser. So users can add on the
fly new operators and define their meaning. As
a possible example consider you 'stringimage'
extention:
in extensible language user simply
declares that 'stringimage' is a new keyword that
should invoke its handler.
following program text, extracts filename,
reads the file and tells compiler to use resulting
string as a string constant.
The whole thing is
probably 10 lines for handler and a single
line to register new keyword.
allows syntax extentions instead of overloading
you can just define new operators, so overloading
strictly speaking is not necessary. Similarly
you have some way for syntax handlers to generate
code, which is similar to inlining, but stricly
speaking inlining is not needed.
work. AFAIK, no language I have ever used expects "A[i;j;k]" to
be an array element or "F(a;b;c)" to be the call of a function
with three parameters.
Far from clear. Simpler to describe? People mocked the 236
pages of the RR [most of it comments, examples or the system
library] when it first came out -- until they found that proper
descriptions of other languages were at least as long, and
nowhere near as precise.
The A68G timing suggests it is not generating C code for directly
executing the program, but just generating C code that does the
intepretation more efficently.
On 01/12/2021 00:07, Bart wrote:
[various benchmark timings:]
The A68G timing suggests it is not generating C code for directly
executing the program, but just generating C code that does the
intepretation more efficently.
AIUI, A68G doesn't "generate C code" at all, unless you go
down the "--optimise" route
[in which case you might do better with
an actual compiler, such as "algol68toc"
builds a tree and then "walks" it, doing a /lot/ of checking as it
goes.
Anyway, this got me looking at the supplied "fannkuch"
program, which looks as though it's imported from some other
language, as the Algol is very unidiomatic. Simply tidying
the code speeds it up by around 10%. I saved a further ~15%
by replacing "sign" by a Boolean, replacing explicit array
elements by declaring and using "ref int"s to point to them
and similar, copying arrays in toto rather than element by
element, and so on. That got the time down from just under
40s to just under 30s. I didn't touch the actual algorithm.
... [in which case you might do better withI tried to run that but it's a 16-bit executable that no long runs
an actual compiler, such as "algol68toc"
under Win64.
All my fannkuch versions are derived from an original in Lua.
I'm using it as a benchmark for testing compilers [...].
Partly for that reason, they need to comprise largely the same code.
On 05/12/2021 12:22, Bart wrote:
[I wrote:]
... [in which case you might do better withI tried to run that but it's a 16-bit executable that no long runs
an actual compiler, such as "algol68toc"
under Win64.
Ah, unlucky. If you're that interested, perhaps you need
to download the source and build a 64-bit executable.
You could even use VirtualBox and start a FreeDOS virtual machine, and
enjoy the nostalgia :-)
On 05/12/2021 12:22, Bart wrote:
[I wrote:]
... [in which case you might do better withI tried to run that but it's a 16-bit executable that no long runs
an actual compiler, such as "algol68toc"
under Win64.
Ah, unlucky. If you're that interested, perhaps you need
to download the source and build a 64-bit executable.
[...]
All my fannkuch versions are derived from an original in Lua.
I'm using it as a benchmark for testing compilers [...].
Partly for that reason, they need to comprise largely the same code.
Well, that's a rather grey area. Is "a := b" largely
the same code as "for i to n do a[i] := b[i] od"?
The former
is simply not possible in many languages, but it [or a standard
"copyarray" procedure] is likely to be implemented significantly
more efficiently than the loop [without sufficiently aggressive optimisation]. Similarly, for the use of a built-in Boolean
type vs an integer to be tested for sign. Similarly, when there
are issues of "return;" vs "run off the end of the procedure"
or of "loop and a half" constructs. As noted in my PP, simply
tidying the "fannkuch" code into idiomatic Algol saved around
25% of the time [without changing the underlying algorithm].
No doubt if you started with some Algol and translated that
directly into C or Lua, you would find that you could save a
similar amount by switching some of the resulting ugly C/Lua
into more idiomatic code.
[Similarly, and relevant to a nearby thread, the
difference between 0-based and 1-based arrays is more than
is often supposed. It affects which comparison operator you
use, and whether you pre- or post-increment or decrement, and
then you find yourself pulling the code around to suit.]
Yes, these are useful tweaks, but there's such a huge gap between
A68G and the next language, that the basic problem still exists: it
is slow!
A68 really needs someone to produce an equivalent transpiler, an
update of that algol68toc program.
(If I liked it more, and understood it a LOT more, then I'd have a go myself.)
Yes, the original algorithm was 1-based, and all versions I made are
1-based even when the language was 0-based (by allocating an extra
element and ignoring element 0).
I would expect an optimising compiler to iron out such minor
differences.
On 24/11/2021 00:04, Andy Walker wrote:
If you're going to have these things, you need them anyway.
There is no clear-cut boundary between what compilers do and what
goes into a standard library. It doesn't matter whether it's the
compiler that decides that "A + B" turns into "load A, load B, add"
or the library that decides that.
It matters very much. My static language is self-hosted, and can build
itself entirely from source fast enough that I could do so ten times
every second if I was so inclined.
To me that is important; not that I need to do that 600 times a minute,
but it means an effectively zero edit-build-run cycle for similar-sized applications.
That would be harder to achieve using the wrong language choices; it
would also take longer, and the result would build apps more slowly.
On 08/12/2021 00:01, Bart wrote:
Yes, these are useful tweaks, but there's such a huge gap between
A68G and the next language, that the basic problem still exists: it
is slow!
Meanwhile, my I suggest a little test for you? Instead
of "fannkuch(10)", please get your collection of programs and
compilers to produce "fannkuch(2)". Any compilers that return
"2, 2" [without comment and from code equivalent to what you
supplied] should be kept well away from any serious computation.
I would expect an optimising compiler to iron out such minor
differences.
I suspect you're being optimistic if you expect even a good optimiser to detect that [in a reasonably large and complicated
program] element 0 is never accessed, or systemically to reduce all
indexes by 1. But you never know.
Why so slow? Apparently it was all to do with templates.
When I said the wrong language design choices could slow things down, I didn't even know it could be by that much.
I think I'm happy with my own choices at the minute (a 40Kloc project
takes 0.1 seconds on one core).
On 2021-12-08 22:29, Bart wrote:
Why so slow? Apparently it was all to do with templates.
Likely, GCC is very poor with them. But your language has nothing to
express polymorphism in either form. So are not in position to criticize.
When I said the wrong language design choices could slow things down,
I didn't even know it could be by that much.
Yes, a modern medium-sized project is extremely large. In a language
without polymorphism it would take years to design and the code would be tenfold bigger and complex.
I think I'm happy with my own choices at the minute (a 40Kloc project
takes 0.1 seconds on one core).
It is a false equivalence. You simply cannot write equivalent code of
same size.
On 09/12/2021 08:46, Dmitry A. Kazakov wrote:
Yes, a modern medium-sized project is extremely large. In a language
without polymorphism it would take years to design and the code would be
tenfold bigger and complex.
Exactly. Developer time is more important than computer time.
It is a false equivalence. You simply cannot write equivalent code of
same size.
Indeed.
You could also look at these example and be impressed with how much functionality you can get from a mere 20 Kloc of code with C++, where
the same useful work in C or BOL (Bart's Own Language) might take 50
times as much.
(This is, of course, a big reason why languages like Python are popular.)
On 2021-12-08 22:29, Bart wrote:
Why so slow? Apparently it was all to do with templates.
Likely, GCC is very poor with them. But your language has nothing to
express polymorphism in either form. So are not in position to criticize.
When I said the wrong language design choices could slow things down,
I didn't even know it could be by that much.
Yes, a modern medium-sized project is extremely large. In a language
without polymorphism it would take years to design and the code would be tenfold bigger and complex.
I think I'm happy with my own choices at the minute (a 40Kloc project
takes 0.1 seconds on one core).
It is a false equivalence. You simply cannot write equivalent code of
same size.
On 09/12/2021 07:46, Dmitry A. Kazakov wrote:
On 2021-12-08 22:29, Bart wrote:
Why so slow? Apparently it was all to do with templates.
Likely, GCC is very poor with them. But your language has nothing to
express polymorphism in either form. So are not in position to criticize.
I expect my dynamic language can do polymorphism.
But is it really that necessary inside a compiler?
If I did such a project (say 100-1000Kloc) I would probably use dynamic scripting code for most of it.
Since most code is not speed-critical. Especially when you just want a
test run.
You need ask a different question: do you really expect a compiler for
any kind of language, which is only 20-25,000 lines of user-code, to
take MINUTES to build on today's machines? (And on multiple cores!)
Somebody is doing something wrong.
Yes, the C++ compiler could also be more inefficient than need be, but
people can only use the C++ compilers that exist.
On 09/12/2021 08:28, David Brown wrote:
On 09/12/2021 08:46, Dmitry A. Kazakov wrote:
Yes, a modern medium-sized project is extremely large. In a language
without polymorphism it would take years to design and the code would be >>> tenfold bigger and complex.
Exactly. Developer time is more important than computer time.
Exactly. That is why having developers twiddle their thumbs for five
minutes every time they compile is a joke.
It is a false equivalence. You simply cannot write equivalent code of
same size.
Indeed.
You could also look at these example and be impressed with how much
functionality you can get from a mere 20 Kloc of code with C++, where
the same useful work in C or BOL (Bart's Own Language) might take 50
times as much.
(This is, of course, a big reason why languages like Python are popular.)
Scripting languages have all the benefits of C++ and then some. Except
for execution speed.
But the C++ in these examples is so slow to build, that a scripting
language is likely to be faster.
On 09/12/2021 12:01, Bart wrote:
On 09/12/2021 08:28, David Brown wrote:
On 09/12/2021 08:46, Dmitry A. Kazakov wrote:
Yes, a modern medium-sized project is extremely large. In a language
without polymorphism it would take years to design and the code would be >>>> tenfold bigger and complex.
Exactly. Developer time is more important than computer time.
Exactly. That is why having developers twiddle their thumbs for five
minutes every time they compile is a joke.
When the alternative is spending weeks or months writing the code by
hand rather than using templates?
Scripting languages have all the benefits of C++ and then some. Except
for execution speed.
Spoken like someone who does not know C++, or have any concept of
scripting languages.
There are different languages with different balances of features and emphasises, and that's good. But you'd have to be completely ignorant
to think that there are any scripting languages with /all/ the benefits
of C++
But the C++ in these examples is so slow to build, that a scripting
language is likely to be faster.
That would seem highly unlikely.
On 2021-12-09 11:50, Bart wrote:
On 09/12/2021 07:46, Dmitry A. Kazakov wrote:
On 2021-12-08 22:29, Bart wrote:
Why so slow? Apparently it was all to do with templates.
Likely, GCC is very poor with them. But your language has nothing to
express polymorphism in either form. So are not in position to
criticize.
I expect my dynamic language can do polymorphism.
I expect it does not, because...
But is it really that necessary inside a compiler?
you obviously do not understand it.
If I did such a project (say 100-1000Kloc) I would probably use
dynamic scripting code for most of it.
Again a failure to see actual challenges of medium to large software collaboration projects.
Since most code is not speed-critical. Especially when you just want a
test run.
Yep, a test run of a production line automation system, a banking
system,
a luggage delivery system written in BOL, would love to see it,
from a safe distance of course...
You need ask a different question: do you really expect a compiler for
any kind of language, which is only 20-25,000 lines of user-code, to
take MINUTES to build on today's machines? (And on multiple cores!)
Somebody is doing something wrong.
Yes, lack of experience/education on modern language-assisted software design.
On your part we see an active denial of very existence of such problems. Things you see important are just silly from the point of view of a practicing software developer. Things you reject are key tools for
managing complexity of modern software.
Get me an example of polymorphism, as there seem to be various kinds,
and I'll see if I can express that in my language.
But, on top of all the various issues involved in bigger projects, you
don't want build times to be unnecessarily slow for no good reason.
For that, you need someone whose job it is to oversee that part, to
identify such problems, and to do something to fix it if possible.
On your part we see an active denial of very existence of such
problems. Things you see important are just silly from the point of
view of a practicing software developer. Things you reject are key
tools for managing complexity of modern software.
OK. Carry on waiting 5 minutes between each run.
Hopefully it will be
for a more worthwhile reason than over-eager, inappropriate use of
templates.
That's not "wrong"; with the code [as supplied to me in theMeanwhile, my I suggest a little test for you? InsteadWhat should it produce for fannkuch(2)?
of "fannkuch(10)", please get your collection of programs and
compilers to produce "fannkuch(2)". Any compilers that return
"2, 2" [without comment and from code equivalent to what you
supplied] should be kept well away from any serious computation.
I downloaded versions in 4 languages from the original benchmarks
site. Three of them (Lua, Python, Julia) go wrong in giving a program
error.
The fourth, in C, completes, but fannkuch(2) returns -1. I don't know
if that is correct or not, but it sounds like it might be undefined.
The presence of these lines:
swap(p[2],p[3])
signx:=1
for i:=3 to n do
plus perm1[2] in the 0-based Python, and p[3] in Julia, suggests n
needs to be at least 3.
Element 0 can stay there; it's not doing any harm with just 3 arrays.I would expect an optimising compiler to iron out such minorI suspect you're being optimistic if you expect even a good
differences.
optimiser to detect that [in a reasonably large and complicated
program] element 0 is never accessed, or systemically to reduce all
indexes by 1. But you never know.
Being 1-based may simply mean an extra offset on some address modes;
is that what you think is causing a slow-down? With local arrays,
that offset should just be absorbed into the stack offset of the
array.
On 2021-12-09 14:39, Bart wrote:Yeah, like nobody cares about stopping at a red light!
Get me an example of polymorphism, as there seem to be various kinds,
and I'll see if I can express that in my language.
This again shows lack of understanding. Types of polymorphism are
irrelevant as they all serve the same goal. The difference between
ad-hoc, parametric, dynamic polymorphism are in their capabilities/limitations to express certain types of classes.
But, on top of all the various issues involved in bigger projects, you
don't want build times to be unnecessarily slow for no good reason.
Nobody cares about build times,
Yes, you do not know how testing and software quality assurance work
either.
Do you think is reasonable to spend 5 minutes across 8 cores doing thatOn your part we see an active denial of very existence of such
problems. Things you see important are just silly from the point of
view of a practicing software developer. Things you reject are key
tools for managing complexity of modern software.
OK. Carry on waiting 5 minutes between each run.
Yep, like the James Webb telescope, 30 years before the first and only
run...
Again, properties of advanced programming languages such as static verification, modularization, separation of interface and
implementation, reuse, generic programming are focused on risk analysis
and understanding that 50% of cases are fundamentally untestable, 50% of
the rest is economically, technically, timely infeasible to test, 50% of
the rest of the rest require in-depth analysis of the requirements, the underlying physical processes and the implementation algorithms in order
to test properly.
On 09/12/2021 14:49, Dmitry A. Kazakov wrote:<snip>
On 2021-12-09 14:39, Bart wrote:
On your part we see an active denial of very existence of such
problems. Things you see important are just silly from the point of
view of a practicing software developer. Things you reject are key
tools for managing complexity of modern software.
OK. Carry on waiting 5 minutes between each run.
Yep, like the James Webb telescope, 30 years before the first and only run...
Again, properties of advanced programming languages such as static verification, modularization, separation of interface andDo you think is reasonable to spend 5 minutes across 8 cores doing that
implementation, reuse, generic programming are focused on risk analysis
and understanding that 50% of cases are fundamentally untestable, 50% of the rest is economically, technically, timely infeasible to test, 50% of the rest of the rest require in-depth analysis of the requirements, the underlying physical processes and the implementation algorithms in order
to test properly.
on a 25,000-line program on every routine build?
I don't. Most people would agree that that is over the top. Even with
those 'polymorphic' features (which, if the cost is to slow things down
by 100 times, are non-viable).
Perhaps you genuinely think that's how long such things should take.
On 04/12/2021 09:32, antispam@math.uni.wroc.pl wrote:
Bart <bc@freeuk.com> wrote:
* Inline assemby
* Inlining
* User-defined operator overloads
* The equivalent of C++'s constexpr
* A gcc-class optimiser
...
I use a language like this. You do need inlining and it is useful
only due to overloading (_not_ limited to operators). However,
other position on your list above are not needed. More precisely,
there is small core language which implements basic types and
corresponding operators. Basic library types simply ofer "view"
of basic types and operators from core language. Due to inlining
there is no runtime loss of efficiency compared to using core
language directly. But there are also fancy library types
which are infeasible to have as part of core language.
Too many advanced compilers features, just to keep the language 'small'.
You did not get it: purpose is to have managable implementation
of _very large_ language.
Source code for gcc was 45,000 source files last I looked. I don't even
know if that's just for C, or for C++ too.
That's a thousand times a larger scale than what I do. So I guess I'm in
less need of such methods.
Low-level error prone parts are
small. Library types can be written by ordinary application
programmers.
Overloading can be implemented with quite small extra code.
Inlining is also easy to implement if you have internal
representation of larger code fragments (as opposed to
generating machine code directly after recognizing given
syntactic construct). Both put limits on maximal possible
compiler speed: with overloading you have more complex
symbol table and need more lookups. Intermediate representation
takes time to build (on top of time needed to generate code).
Also, inlining allows large object code from small sources,
so if you measure time per source line compier speed will
likely be worse than speed of compiler that forces you to
write each source line separately. Still, compiler can
be reasonably fast.
Anyway, increase in compiler complexity is relatively
small. Even for small languages in total balance
approach using overloading may give you saving in
effort spent on implementation.
I could do overloading. I have an approach I've developed, I just
haven't got round to it (there are a dozen more things more useful ahead
of it, which I haven't done yet either!).
But, this purely user-defined overloading of existing operators, with user-defined types, or applying extra operations to standard types.
It is not for implementing the core language, which I like to handle by dedicated code in the compiler, keeping it tight, fast and allowing such enhancements as above to be optional (and on a waiting list).
For example, there is already a backend IL instruction, MUL.I64, to do
signed multiply. The main part, MUL, is attached to the binary * op
during parsing. The .I64 part is assigned two passes further on.
With only mechanisms with user-defined overloads, it's a lot harder to
make those associations. For one thing, those IL instructions are not
exposed in the language.
Just as a simple
example let me mention Pascal. In Pascal there is
a buch of "builtin" operations (mostly on files).
Most of them are too complex to do by inline code
so practical compilers have runtime library implementing
them. But compiler still needs to parse those
"builtins" and convert them to runtime calls.
In a sense this is trivial task, but with realistic
compiler structure you need several lines of compiler
code to generate corresponding runtime call.
I have a similar set of operations for I/O. They are implemented as a
set of function calls, designed to be called implicitly from code, so
have funny names. So:
print a,b
is equivalent to:
m$print_startcon()
m$print_i64_nf(a) # when a,b are i64
m$print_i64_nf(b)
m$print_end
(The bracketing between start/end calls allows separator logic within
the whole print statement; and allows calls to functions within the
print list that might themselves use print, to files etc.)
So, print is already implemented in user code. I just do not see the advantage of replacing 110 or 50 lines of code within the compiler's
code generator for print, with a likely more complicated solution.
(110 lines of code for static language which needs to generate different calls for various types; 50 lines for dynamic language.)
Bunch
of older languages like Cobol, Fortran, PL/I had
special "builtin" operations, some of them much
more than Pascal.
Let me add that there are more radical approaches:
there are languages in which "users" can redefine
scanner and/or parser. So users can add on the
fly new operators and define their meaning. As
a possible example consider you 'stringimage'
extention:
Do you mean my 'strinclude' which incorporates a text file (or
text/binary file in dynamic code), as string data within the program?
in extensible language user simply
declares that 'stringimage' is a new keyword that
should invoke its handler.
That itself is not so easy. Keywords are built-in, scopeless, and known program-wide.
With a user-define keyword, could two modules define that
keyword differently?
Could another part of the program use that same
name as an ordinary variable?
If so then it's not known as a new keyword after the next pass after
parsing. So it will need to have a suitable 'shape' to be parsed as /something/. (My language also has out-of-order definitions.)
Handler looks at
following program text, extracts filename,
reads the file and tells compiler to use resulting
string as a string constant.
How? String literals are detected fairly early, they have a certain AST
node, length etc. At what pass will this be done?
The whole thing is
probably 10 lines for handler and a single
line to register new keyword.
In language that
allows syntax extentions instead of overloading
you can just define new operators, so overloading
strictly speaking is not necessary. Similarly
you have some way for syntax handlers to generate
code, which is similar to inlining, but stricly
speaking inlining is not needed.
Do you have an example of such a language where implementing my
'strimport' is 10 lines of code? For example, C++. Or the one you
mentioned. If so, how does it look?
In my static compiler, 'strinclude' needs a handful of lines to define
enums and names, this these two lines in the parser:
when kstrincludesym then
lex()
p:=createunit1(j_strinclude,readterm2())
Followed by this code later on to create a suitable string literal:
proc tx_strinclude(unit p,a)=
int fileno
tpass(a)
if a.tag<>j_const or not a.isastring then
txerror("strincl/not string")
fi
fileno := moduletable[p.moduleno].fileno
fileno := getfile(a.svalue, path:sourcefilepaths[fileno], issupport:1)
a.svalue := sourcefiletext[fileno]
a.slength := sourcefilesizes[fileno]
deleteunit(p,a) # replace strincl node with new string
end
The fiddly bit is deciding where to look for that file (unless an
absolute path is given, it will be relative to this module).
The file involved counts as a support file, which means that when an application is combined into an amalgamated file, such files are
included. (But need to be marked because sometimes, a support file will
have the same name as an actual source module.)
Also, when reading from an amalgamated file, it will look inside that
for the support file.
I think that 20 lines of code for 'strinclude' inside a compiler is reasonable.
Bart <bc@freeuk.com> wrote:
Source code for gcc was 45,000 source files last I looked. I don't even
know if that's just for C, or for C++ too.
That's a thousand times a larger scale than what I do. So I guess I'm in
less need of such methods.
gcc is in different game: "how many % of runtime efficiency can we
gain if we throw 10 times more resources at compilation".
(110 lines of code for static language which needs to generate different
calls for various types; 50 lines for dynamic language.)
OK, if you do not need more "builtin functions", then ad-hoc approach
is fine. GNU Pascal supports hundreds of "builtin functions" (when
you count diffecences due to different types of arguments).
That itself is not so easy. Keywords are built-in, scopeless, and known
program-wide.
no, no, no.
With a user-define keyword, could two modules define that
keyword differently?
Yes, keywords are scoped and recognized only within their scope.
More precisely, each compilation stream has it own keywords which
can be added or retracted at essentially any time.
Do you have an example of such a language where implementing my
'strimport' is 10 lines of code? For example, C++. Or the one you
mentioned. If so, how does it look?
Your 'stringimage' can be done as Lisp macro:
;;; ---<cut here>-------------
(defun slurp_file (fn)
(with-open-file (f fn)
(let ((seq (make-string (file-length f))))
(read-sequence seq f)
(coerce seq 'string)))
)
(defmacro string_include (f)
(slurp_file f))
;;; ---<cut here>-------------
Note: in source file use of 'string_include' looks like function
call (almost any Lisp construct looks like function call).
However, since it is a Lisp macro, it is executed at compile
time and result is handled like part of source.
I think that 20 lines of code for 'strinclude' inside a compiler is
reasonable.
Sure. However note the difference: you could add 'strinclude' because
you are compiler author. With extensible language any user can add constructs that she/he needs.
Bart <bc@freeuk.com> wrote:
Perhaps you genuinely think that's how long such things should take.
Well, health experts says that on should take regular breaks from programming. 5 minutes is somewhat short, but better than nothing
(10 minutes probably would be optimal).
On 10/12/2021 23:28, antispam@math.uni.wroc.pl wrote:<snip>
Bart <bc@freeuk.com> wrote:
That itself is not so easy. Keywords are built-in, scopeless, and known
program-wide.
no, no, no.
With a user-define keyword, could two modules define that
keyword differently?
Yes, keywords are scoped and recognized only within their scope.
More precisely, each compilation stream has it own keywords which
can be added or retracted at essentially any time.
Those aren't what I call keywords! More like user-defined syntax. But
with custom syntax, you don't want it changing in different parts of the program.
Or are you thinking about Lisp?
Do you have an example of such a language where implementing my
'strimport' is 10 lines of code? For example, C++. Or the one you
mentioned. If so, how does it look?
Your 'stringimage' can be done as Lisp macro:
;;; ---<cut here>-------------
(defun slurp_file (fn)
(with-open-file (f fn)
(let ((seq (make-string (file-length f))))
(read-sequence seq f)
(coerce seq 'string)))
)
(defmacro string_include (f)
(slurp_file f))
;;; ---<cut here>-------------
Note: in source file use of 'string_include' looks like function
call (almost any Lisp construct looks like function call).
However, since it is a Lisp macro, it is executed at compile
time and result is handled like part of source.
So this approach can also be used by C++'s constexpr feature? (Assuming
there are ways to read files into strings.)
I notice you say 'compile-time'; does Lisp have a compilation stage with
some intermediate or binary output? I thought it was run from source.
Actually I thought that that was the big deal about it, that you can manipulate Lisp code as though it was runtime data.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 286 |
Nodes: | 16 (2 / 14) |
Uptime: | 89:22:30 |
Calls: | 6,496 |
Calls today: | 7 |
Files: | 12,100 |
Messages: | 5,277,442 |