• Common operators

    From James Harris@21:1/5 to All on Sun Nov 21 18:55:02 2021
    This is to follow up on comments from another thread. It's prompted by a picture of a calculator which Bart posted:

    https://upload.wikimedia.org/wikipedia/commons/9/9c/Hp35s_Calculator.jpg

    I think the issue being discussed was what operators should be provided
    by a language. I'd suggest almost none!

    Don't get me wrong! AISI there should be standard operator /symbols/
    such as +, *, >>, etc (and possibly some reserved operator words such as
    "and" and "or") and they should have precedences and associativities but
    that they should be semantically meaningless. Instead, they would be
    given meaning by the data types to which they are applied.

    That's not as strange as it may sound. In

    A + B

    the meaning of + will depend on whether A and B are ints or floats - or, possibly, strings or some user-defined type.

    If the meaning depends on the types of the operands then it doesn't have
    to be built in but can be defined by code /for the requisite type/. For example, if A and B are floats and there's a definition for the type
    'float' which includes the following

    type float
    function "+"(float self, float other)
    ... code for float addition ...

    then the code therein could carry out the necessary floating-point
    addition. (I'm being deliberately vague about how as this topic is more
    about invocation.)

    The first point is that type float can be an external module and not
    part of the language. The same is possibly true also of signed and
    unsigned integers!?!

    IOW the /language/ would define the precedences and associativity of
    operator symbols or standard names but the meaning of those operators
    would be defined /externally/.

    The second point is that named prefix operations such as ABS and SIN -
    and many like them - would not need to be part of the language. For one
    thing, they could be defined in external modules and, for another, they
    would appear best as functions rather than as operators. For example, in

    abs(y)

    the abs function referred to would depend on whether y was an int or a
    float, and a call such as

    sin(x)

    would work for x being a float but the name "sin" would be undefined if
    x were an integer.



    Going back to the calculator, aside from common operations surely most
    of the more obscure actions one could find on a calculator such as
    statistical and financial operations would be most clearly expressed in
    code by

    op_name(args)

    Further, that 'named prefix' form allows any number of functions to be apply-able to the types of its args. Possibly most of an advanced
    calculator's functions are unary, anyway, so the prefix form would be
    more appropriate.

    With function form there's no ambiguity about how any operations are
    applied and the names give a clue as to the meaning. For example,

    factorial(x)

    gives a big clue as to what it might do....!



    In sum, perhaps it would be better to build common operators into type libraries and not make them part of the language itself.


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to James Harris on Sun Nov 21 19:45:50 2021
    On 21/11/2021 18:55, James Harris wrote:
    This is to follow up on comments from another thread. It's prompted by a picture of a calculator which Bart posted:

    https://upload.wikimedia.org/wikipedia/commons/9/9c/Hp35s_Calculator.jpg

    I think the issue being discussed was what operators should be provided
    by a language. I'd suggest almost none!

    Don't get me wrong! AISI there should be standard operator /symbols/
    such as +, *, >>, etc (and possibly some reserved operator words such as "and" and "or") and they should have precedences and associativities but
    that they should be semantically meaningless. Instead, they would be
    given meaning by the data types to which they are applied.

    That's not as strange as it may sound. In

      A + B

    the meaning of + will depend on whether A and B are ints or floats - or, possibly, strings or some user-defined type.

    If the meaning depends on the types of the operands then it doesn't have
    to be built in but can be defined by code /for the requisite type/. For example, if A and B are floats and there's a definition for the type
    'float' which includes the following

      type float
        function "+"(float self, float other)
          ... code for float addition ...

    then the code therein could carry out the necessary floating-point
    addition. (I'm being deliberately vague about how as this topic is more
    about invocation.)

    The first point is that type float can be an external module and not
    part of the language. The same is possibly true also of signed and
    unsigned integers!?!

    IOW the /language/ would define the precedences and associativity of
    operator symbols or standard names but the meaning of those operators
    would be defined /externally/.

    The second point is that named prefix operations such as ABS and SIN -
    and many like them - would not need to be part of the language. For one thing, they could be defined in external modules and, for another, they
    would appear best as functions rather than as operators. For example, in

      abs(y)

    the abs function referred to would depend on whether y was an int or a
    float, and a call such as

      sin(x)

    would work for x being a float but the name "sin" would be undefined if
    x were an integer.



    Going back to the calculator, aside from common operations surely most
    of the more obscure actions one could find on a calculator such as statistical and financial operations would be most clearly expressed in
    code by

      op_name(args)

    Further, that 'named prefix' form allows any number of functions to be apply-able to the types of its args. Possibly most of an advanced calculator's functions are unary, anyway, so the prefix form would be
    more appropriate.

    With function form there's no ambiguity about how any operations are
    applied and the names give a clue as to the meaning. For example,

      factorial(x)

    gives a big clue as to what it might do....!



    In sum, perhaps it would be better to build common operators into type libraries and not make them part of the language itself.

    I think I've made my view known on this!

    You need a more complicated implementation, and one that implements more advanced optimisations, which are going to be slower, to get code of the
    same standard as you can get easily when things are built-in.

    The point of my calculator link was that, while it was a programmable calculator that could conceivably be used to implement ABS (which is an
    easy function actually), they still provided an ABS button anyway.

    Now, it's possible that there is an OEM maker of calculators that takes delivery of unbranded programmable calculators where many buttons are
    blank. And they decide what 'built-in' functions should be provided on
    those buttons. The devices are supplied with those already programmed.

    If languages took the same approach, then that's fine. The user won't
    know any different (except it might be slower). And they will be using apparently the same named language as other people, so they can easily
    share code.

    Except that languages don't like to draw that line and make those OEM facilities available to everyone; each person creates their own language
    with its own unwieldy set of half-implemented features.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Charles Lindsey@21:1/5 to James Harris on Mon Nov 22 14:26:06 2021
    On 21/11/2021 18:55, James Harris wrote:
    This is to follow up on comments from another thread. It's prompted by a picture
    of a calculator which Bart posted:

    https://upload.wikimedia.org/wikipedia/commons/9/9c/Hp35s_Calculator.jpg

    I think the issue being discussed was what operators should be provided by a language. I'd suggest almost none!

    Don't get me wrong! AISI there should be standard operator /symbols/ such as +,
    *, >>, etc (and possibly some reserved operator words such as "and" and "or") and they should have precedences and associativities but that they should be semantically meaningless. Instead, they would be given meaning by the data types
    to which they are applied.

    That's not as strange as it may sound. In

      A + B

    the meaning of + will depend on whether A and B are ints or floats - or, possibly, strings or some user-defined type.

    If the meaning depends on the types of the operands then it doesn't have to be
    built in but can be defined by code /for the requisite type/. For example, if A
    and B are floats and there's a definition for the type 'float' which includes the following

    You are re-inventing Algol 68 again.

    --
    Charles H. Lindsey ---------At my New Home, still doing my own thing------
    Tel: +44 161 488 1845 Web: https://www.clerew.man.ac.uk Email: chl@clerew.man.ac.uk Snail-mail: Apt 40, SK8 5BF, U.K.
    PGP: 2C15F1A9 Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Andy Walker@21:1/5 to Bart on Mon Nov 22 20:34:14 2021
    On 21/11/2021 19:45, Bart wrote:
    On 21/11/2021 18:55, James Harris wrote:
    In sum, perhaps it would be better to build common operators into
    type libraries and not make them part of the language itself.

    I agree with Charles!

    You need a more complicated implementation, and one that implements
    more advanced optimisations, which are going to be slower, to get
    code of the same standard as you can get easily when things are
    built-in.

    Doesn't follow. The library can have access to parts of the
    language that are not available to ordinary users, inc [eg] escapes
    into assembler, inlining, access to the compiler's internal structures,
    ..., and can be optimised when the compiler is built. The point is to
    keep the language definition simple, not cluttered [as are, eg, Pascal
    and C] by having to explain as part of the language what "add-ops" are
    and how they relate to "mult-ops" or logical operators.

    --
    Andy Walker, Nottingham.
    Andy's music pages: www.cuboid.me.uk/andy/Music
    Composer of the day: www.cuboid.me.uk/andy/Music/Composers/Grieg

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to Andy Walker on Mon Nov 22 23:08:00 2021
    On 22/11/2021 20:34, Andy Walker wrote:
    On 21/11/2021 19:45, Bart wrote:
    On 21/11/2021 18:55, James Harris wrote:
    In sum, perhaps it would be better to build common operators into
    type libraries and not make them part of the language itself.

        I agree with Charles!

    You need a more complicated implementation, and one that implements
    more advanced optimisations, which are going to be slower, to get
    code of the same standard as you can get easily when things are
    built-in.

        Doesn't follow.  The library can have access to parts of the language that are not available to ordinary users, inc [eg] escapes
    into assembler, inlining, access to the compiler's internal structures,
    ..., and can be optimised when the compiler is built.

    This is my point: you need:

    * Inline assemby
    * Inlining
    * User-defined operator overloads
    * The equivalent of C++'s constexpr
    * A gcc-class optimiser
    ...

    Too many advanced compilers features, just to keep the language 'small'.

    Yet the user will see the small core features plus the ones implemented
    via language features; they will still see a big language.

    Where's the advantage, and who benefits?


      The point is to
    keep the language definition simple,

    What, like Algol68? They seemed to have gone out of their way to make
    the definition imcomprehensible!

    (My a68g.exe is 2.8MB; my nearest language is 0.5MB. Just saying...)

    not cluttered [as are, eg, Pascal
    and C] by having to explain as part of the language what "add-ops" are
    and how they relate to "mult-ops" or logical operators.


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to Charles Lindsey on Tue Nov 23 15:50:56 2021
    On 22/11/2021 14:26, Charles Lindsey wrote:

    ...


    You are re-inventing Algol 68 again.


    I am not reinventing it ... but if I am going down some of the same
    lines then that's probably a good thing.

    I have yet to reply to another post where you said the same, and I think
    my reply to that may be similar to what Algol means by returning an
    object rather than its value so maybe the similarities continue!



    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to Bart on Tue Nov 23 16:39:53 2021
    On 22/11/2021 23:08, Bart wrote:
    On 22/11/2021 20:34, Andy Walker wrote:
    On 21/11/2021 19:45, Bart wrote:
    On 21/11/2021 18:55, James Harris wrote:

    In sum, perhaps it would be better to build common operators into
    type libraries and not make them part of the language itself.

    ...

    You need a more complicated implementation, and one that implements
    more advanced optimisations, which are going to be slower, to get
    code of the same standard as you can get easily when things are
    built-in.

         Doesn't follow.  The library can have access to parts of the
    language that are not available to ordinary users, inc [eg] escapes
    into assembler, inlining, access to the compiler's internal structures,
    ..., and can be optimised when the compiler is built.

    This is my point: you need:

    * Inline assemby
    * Inlining
    * User-defined operator overloads
    * The equivalent of C++'s constexpr
    * A gcc-class optimiser

    I don't see why Andy says there should be escapes to assembly and I'd
    question your list, too.

    I've got work to do on this, as yet, but here's how I see it starting.
    AFAIK it's a novel approach so may need some synapse rewiring. :-)

    A 'type library' would include code for each operator of that type. For instance imagine that an expression such as

    A + B

    has A and B of the same type as each other. Let's say it's float. In the
    type library for float there will be code for

    <float> + <float> => <float>

    The library would be responsible for generating the right IR. For
    example, if we assume that 'wider' floats can hold all the values of
    narrower ones and (to make this not too simple) we further assume that
    the language specification says that float addition is performed to the resolution of the widest of the two operands then the IR generation
    might be along the lines of the following.

    w = max(A._width, B._width)
    IR.widen(A1, A, w)
    IR.widen(B1, B, w)
    IR.declare(float, w, Result)
    IR.float_add(Result, A1, B1)

    What that is intended to do is to work out the maximum width of A and B,
    widen the narrower one (if any) to that width, declare a variable called
    Result which has that width, then carry out the addition into Result.

    Note where this sits in the compile process. The calls to IR routines
    would populate the parse tree.

    Note, also, that the tree would include noops such as those to widen a
    64-bit float to another 64-bit float but such noops would disappear as
    part of optimisation.

    Above all, note that the code is at high level and not particularly complicated. All it's really doing is adding operations to the parse tree.

    ...

    Too many advanced compilers features, just to keep the language 'small'.

    For sure, the compiler would have the above complication. But the
    handling of all data types could be written in the same way.


    Yet the user will see the small core features plus the ones implemented
    via language features; they will still see a big language.

    Not at all. I see it as a benefit for a programmer to be able to
    distinguish between language and libraries, and for the language to be
    kept small.


    Where's the advantage, and who benefits?

    The model is different, I grant you, but conceptually simple. And it's
    more consistent. All data types - those inbuilt, those provided in
    standard libraries, and those the programmer writes - would be treated
    in the same way.

    Further, the IR-generating code is simple enough that anyone who wanted
    to see the details of how an operation would be implemented can go read
    it! Maybe it would be enough to satisfy the language lawyers. :-)

    I should say the above is work in progress. I am not committed to it
    yet. One problem I still have is essentially overloading (that old
    chestnut) and how to distinguish bound from unbound functions. (Bound
    would be defined in a library for the type of their first argument.
    Unbound would not. Yet calls to them might look remarkably similar....)

    Flak jacket at the ready!


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dmitry A. Kazakov@21:1/5 to James Harris on Tue Nov 23 18:44:42 2021
    On 2021-11-23 17:39, James Harris wrote:

    One problem I still have is essentially overloading (that old
    chestnut) and how to distinguish bound from unbound functions. (Bound
    would be defined in a library for the type of their first argument.
    Unbound would not. Yet calls to them might look remarkably similar....)

    The terms are "method" vs "free function."

    Dedicated argument makes no sense, obviously. It is motivated by
    inability to implement multiple dispatch, serves as an excuse.

    One answer is per types. Free function would be defined on the whole
    class, that makes them distinguishable as the type would be different.

    Most languages allow free functions with the same type, arguably they
    should not [unless inside the private implementation]. This again is problematic with modules and multiple dispatch.

    So, basically, the issue is wide open to debate and there is no workable concept in sight.

    --
    Regards,
    Dmitry A. Kazakov
    http://www.dmitry-kazakov.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to Dmitry A. Kazakov on Tue Nov 23 20:38:30 2021
    On 23/11/2021 17:44, Dmitry A. Kazakov wrote:
    On 2021-11-23 17:39, James Harris wrote:

    One problem I still have is essentially overloading (that old
    chestnut) and how to distinguish bound from unbound functions. (Bound
    would be defined in a library for the type of their first argument.
    Unbound would not. Yet calls to them might look remarkably similar....)

    The terms are "method" vs "free function."

    Maybe, although I think of methods as going along with classes, and
    there are no classes (at least no inheritance) in what I have in mind.


    Dedicated argument makes no sense, obviously. It is motivated by
    inability to implement multiple dispatch, serves as an excuse.

    I don't think I'd want multiple dispatch. As discussed elsewhere it
    cannot scale well. At most, ISTM, one could select a routine based on
    two types: that of one argument and that of the result, though in most
    cases one would use only the type of one argument.

    For sure, if something more complicated is required a routine could do
    what I think you called double dispatch switching between different implementations based on the types of any further arguments it wanted
    (i.e. any arguments after the first). You had concerns that it would be
    slow but AFAICS it would involve two dispatches which were about as fast
    as possible and it would scale much better than multiple dispatch.


    One answer is per types. Free function would be defined on the whole
    class, that makes them distinguishable as the type would be different.

    What about the syntax of invocation? In

    a = F(b, c)

    it might be that F is bound to the type of its first argument, b, or it
    might be that the programmer has defined F as an unbound function with
    two parameters. There's nothing in the source to indicate which is
    meant, and that seems to be a problem.


    Most languages allow free functions with the same type, arguably they
    should not [unless inside the private implementation]. This again is problematic with modules and multiple dispatch.

    So, basically, the issue is wide open to debate and there is no workable concept in sight.

    Understood. It's reassuring to hear you say so...!


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dmitry A. Kazakov@21:1/5 to James Harris on Tue Nov 23 22:34:01 2021
    On 2021-11-23 21:38, James Harris wrote:
    On 23/11/2021 17:44, Dmitry A. Kazakov wrote:
    On 2021-11-23 17:39, James Harris wrote:

    One problem I still have is essentially overloading (that old
    chestnut) and how to distinguish bound from unbound functions. (Bound
    would be defined in a library for the type of their first argument.
    Unbound would not. Yet calls to them might look remarkably similar....)

    The terms are "method" vs "free function."

    Maybe, although I think of methods as going along with classes, and
    there are no classes (at least no inheritance) in what I have in mind.

    You have it the form of implicit type conversions. Say you convert float
    to double and have a function Foo defined on double and accepting float.
    This Foo is a sort of method for the class {float, double}.

    In C this is consistent because C does this *always*. Unless you do
    same, you would have a massive problem that whether a conversion apply
    could depend on the actual visibility context.

    One answer is per types. Free function would be defined on the whole
    class, that makes them distinguishable as the type would be different.

    What about the syntax of invocation? In

      a = F(b, c)

    it might be that F is bound to the type of its first argument, b, or it
    might be that the programmer has defined F as an unbound function with
    two parameters. There's nothing in the source to indicate which is
    meant, and that seems to be a problem.

    There is no problem. Let we have three types:

    T1 a
    T2 b
    T3 c

    Now, let F be not a method of b. This is what you mean, right?

    The first argument of F would be class T2 rather than simple T2. It
    means that b will be "converted" to the class. [Under the model that b
    keeps the type tag it is null conversion.] Then inside F it will
    dispatch to the methods of T2.

    Thus regardless on any visibility the semantics F regarding b could not
    change.

    Methods persist and class-wide operations are always reduced to method
    calls, which makes it safe.

    Free functions of simple types in presence of classes of any kind is
    asking for trouble.

    --
    Regards,
    Dmitry A. Kazakov
    http://www.dmitry-kazakov.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Andy Walker@21:1/5 to James Harris on Tue Nov 23 23:04:19 2021
    On 23/11/2021 16:39, James Harris wrote:
    On 22/11/2021 23:08, Bart wrote:
    On 22/11/2021 20:34, Andy Walker wrote:
         Doesn't follow.  The library can have access to parts of the
    language that are not available to ordinary users, inc [eg] escapes
    into assembler, inlining, access to the compiler's internal structures,
    ..., and can be optimised when the compiler is built.
    This is my point: you need:
    * Inline assemby
    [...]
    I don't see why Andy says there should be escapes to assembly and I'd question your list, too.

    Andy didn't say there should be, but that there can be. IOW,
    a [system or standard] library can do things that ordinary user code
    can't; in particular it can know more than ordinary users about the
    compiler internals. Whatever Bart expects his compiler to do about
    [eg] standard operators can equally be done by the library. It's a
    matter of taste whether the library is tightly integrated with the
    compiler proper, or is more loosely connected [cf whether a multi-
    pass compiler consists of a chain of completely separate programs
    or is a collection of modules linked together into one executable].
    There are pros and cons either way.

    [For "assembly" read your IR if you prefer. It's just that
    I was brought up with languages where you could use assembler codes
    directly in your program. (Eg, A68R, the Malvern compiler, had
    "CODE ... EDOC" brackets within which you could use ICL 1900 op codes, augmented by symbolic access to your program's variables.) C was the
    first language I used extensively in which this was not possible, but
    instead you had to write (PDP 11) assembler modules which then got
    linked in to your C.]

    --
    Andy Walker, Nottingham.
    Andy's music pages: www.cuboid.me.uk/andy/Music
    Composer of the day: www.cuboid.me.uk/andy/Music/Composers/Handel

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Andy Walker@21:1/5 to Bart on Wed Nov 24 00:04:52 2021
    On 22/11/2021 23:08, Bart wrote:
    You need a more complicated implementation, and one that implements
    more advanced optimisations, which are going to be slower, to get
    code of the same standard as you can get easily when things are
    built-in.
         Doesn't follow.  The library can have access to parts of the
    language that are not available to ordinary users, inc [eg] escapes
    into assembler, inlining, access to the compiler's internal structures,
    ..., and can be optimised when the compiler is built.
    This is my point: you need:
    * Inline assemby
    * Inlining
    * User-defined operator overloads

    [You don't need user-defined overloads unless your language
    includes them. In which case you need them because of the language,
    not because you've implemented them in a library.]

    * The equivalent of C++'s constexpr
    * A gcc-class optimiser
    ...
    Too many advanced compilers features, just to keep the language 'small'.

    If you're going to have these things, you need them anyway.
    There is no clear-cut boundary between what compilers do and what
    goes into a standard library. It doesn't matter whether it's the
    compiler that decides that "A + B" turns into "load A, load B, add"
    or the library that decides that. In either case, syntax analysis
    produces a node in the parse tree consisting of an operator and its
    two [in this case] operands, or some near equivalent to that. How
    that gets processed further is a matter of taste.

    Yet the user will see the small core features plus the ones
    implemented via language features; they will still see a big
    language.

    It isn't a "big language" if the syntax is small and simple.
    The problem for users is not usually checking through the list of
    library routines to see if there's one to multiply matrices or to
    open a window, but is knowing precisely how to write loops or to
    declare arrays of pointers to procedures.

    Where's the advantage, and who benefits?

    Logical separation; careful users.

    The point is to
    keep the language definition simple,
    What, like Algol68? They seemed to have gone out of their way to make
    the definition imcomprehensible!

    That is because some people went out of their way to claim
    that it is incomprehensible, and others more recently [no names, no
    pack drill] still go out of their way to ask me what a semicolon
    means instead of looking it up themselves. The Watt-Peck-Sintzoff
    chart in Algol Bulletin gets the A68 syntax onto one side; the
    A68R User Guide gets it, in larger type and with lots of white
    space, into well under two sides; substantially less than usual representations of C, Pascal, Ada, ....

    (My a68g.exe is 2.8MB; my nearest language is 0.5MB. Just saying...)
    My copy of "a68g" [2.8.4] is 1183284 bytes; it has crept
    up over the decades from the 504749 bytes of 1.4.0, the earliest
    A68G version I have. Algol itself hasn't changed, and A68G "then"
    was missing only a few minor features, so I expect the increase
    comes primarily from (a) much more extensive checking, (b) extra
    language features such as partial parametrisation and the Torrix
    extensions, and (c) additional library features such as Plotutils,
    PostgreSQL, linear algebra, and many others. I expect that if
    your language included such things, it too would grow from 0.5MB
    to around 1MB.

    --
    Andy Walker, Nottingham.
    Andy's music pages: www.cuboid.me.uk/andy/Music
    Composer of the day: www.cuboid.me.uk/andy/Music/Composers/Handel

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to Dmitry A. Kazakov on Wed Nov 24 10:06:40 2021
    On 23/11/2021 21:34, Dmitry A. Kazakov wrote:
    On 2021-11-23 21:38, James Harris wrote:
    On 23/11/2021 17:44, Dmitry A. Kazakov wrote:
    On 2021-11-23 17:39, James Harris wrote:

    ...

    What about the syntax of invocation? In

       a = F(b, c)

    it might be that F is bound to the type of its first argument, b, or
    it might be that the programmer has defined F as an unbound function
    with two parameters. There's nothing in the source to indicate which
    is meant, and that seems to be a problem.

    There is no problem. Let we have three types:

       T1 a
       T2 b
       T3 c

    Now, let F be not a method of b. This is what you mean, right?

    I can't be sure whether you meant to say "not" in that but the idea is
    that the definition of type T2 would include at least one function
    called F as in

    type T2
    function F



    The first argument of F would be class T2 rather than simple T2.

    I don't know about that. AFAICS the first argument of F would be of
    /type/ T2. That's all.

    The problem I have is that F could be defined within type T2 (i.e.
    'bound' to T2) as in

    type T2
    function F(....)

    or it could be a plain ('unbound') function as in

    function F(T2 x, ....)

    I am uncomfortable about having a function called F potentially findable
    in two completely different places.

    What to do? I don't know. AFAICS both forms of function definition have
    their uses - and programmers are used to both forms.

    Am open to suggestions!


    It
    means that b will be "converted" to the class. [Under the model that b
    keeps the type tag it is null conversion.] Then inside F it will
    dispatch to the methods of T2.

    Thus regardless on any visibility the semantics F regarding b could not change.

    Methods persist and class-wide operations are always reduced to method
    calls, which makes it safe.

    Free functions of simple types in presence of classes of any kind is
    asking for trouble.


    The trouble is that some functions are naturally free/unbound, i.e. they
    either don't have any parameters or they are not naturally bound to the
    type of their first parameter so in a call such as

    F(x)

    there's no way to tell whether F is free/bound or non-free/unbound.


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dmitry A. Kazakov@21:1/5 to James Harris on Wed Nov 24 12:18:55 2021
    On 2021-11-24 11:06, James Harris wrote:
    On 23/11/2021 21:34, Dmitry A. Kazakov wrote:
    On 2021-11-23 21:38, James Harris wrote:
    On 23/11/2021 17:44, Dmitry A. Kazakov wrote:
    On 2021-11-23 17:39, James Harris wrote:

    ...

    What about the syntax of invocation? In

       a = F(b, c)

    it might be that F is bound to the type of its first argument, b, or
    it might be that the programmer has defined F as an unbound function
    with two parameters. There's nothing in the source to indicate which
    is meant, and that seems to be a problem.

    There is no problem. Let we have three types:

        T1 a
        T2 b
        T3 c

    Now, let F be not a method of b. This is what you mean, right?

    I can't be sure whether you meant to say "not" in that but the idea is
    that the definition of type T2 would include at least one function
    called F as in

      type T2
        function F

    You are talking about [misfortunate] lexical elements of your language.
    The question is the semantics of these. I guessed that this construct
    would imply that F is a method in T2 argument/result and when F declared outside would be not, i.e. a free function.

    The first argument of F would be class T2 rather than simple T2.

    I don't know about that. AFAICS the first argument of F would be of
    /type/ T2. That's all.

    That means a plain free function, which has all problems expected. You
    get what you asked for.

    What to do? I don't know. AFAICS both forms of function definition have
    their uses - and programmers are used to both forms.

    Am open to suggestions!

    I already explained that. If F were class-wide in b, it would have no
    such problems because it would always go back to the methods. Any
    type-specific argument or result, or if you want alternative
    terminology, any *contravariant* argument, is a trouble in presence of
    classes of whatever origin. Drop classes and you will have no problems.
    Keep classes (implicit conversions included) and you will have problems
    with contravariance. OK?

    The trouble is that some functions are naturally free/unbound, i.e. they either don't have any parameters or they are not naturally bound to the
    type of their first parameter so in a call such as

      F(x)

    there's no way to tell whether F is free/bound or non-free/unbound.

    Why should anybody care? It tells do F on x. The effect is determined by
    the type of x. Qualifiers like "bound" are implementation details at
    best, otherwise meaningless.

    --
    Regards,
    Dmitry A. Kazakov
    http://www.dmitry-kazakov.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to Dmitry A. Kazakov on Wed Nov 24 15:35:51 2021
    On 24/11/2021 11:18, Dmitry A. Kazakov wrote:
    On 2021-11-24 11:06, James Harris wrote:
    On 23/11/2021 21:34, Dmitry A. Kazakov wrote:
    On 2021-11-23 21:38, James Harris wrote:
    On 23/11/2021 17:44, Dmitry A. Kazakov wrote:
    On 2021-11-23 17:39, James Harris wrote:

    ...

    What about the syntax of invocation? In

       a = F(b, c)

    it might be that F is bound to the type of its first argument, b, or
    it might be that the programmer has defined F as an unbound function
    with two parameters. There's nothing in the source to indicate which
    is meant, and that seems to be a problem.

    There is no problem. Let we have three types:

        T1 a
        T2 b
        T3 c

    Now, let F be not a method of b. This is what you mean, right?

    I can't be sure whether you meant to say "not" in that but the idea is
    that the definition of type T2 would include at least one function
    called F as in

       type T2
         function F

    You are talking about [misfortunate] lexical elements of your language.
    The question is the semantics of these.

    I was illustrating semantics. With pseudocode. Nothing to do with
    lexical elements.


    I guessed that this construct
    would imply that F is a method in T2 argument/result and when F declared outside would be not, i.e. a free function.

    Yes.


    The first argument of F would be class T2 rather than simple T2.

    I don't know about that. AFAICS the first argument of F would be of
    /type/ T2. That's all.

    That means a plain free function, which has all problems expected. You
    get what you asked for.

    It is not possible to make all functions bound. For example,

    time_now()

    or

    emit_newline()

    cannot be bound to the types of their first parameters because neither
    has a first parameter.


    What to do? I don't know. AFAICS both forms of function definition
    have their uses - and programmers are used to both forms.

    Am open to suggestions!

    I already explained that. If F were class-wide in b, it would have no
    such problems because it would always go back to the methods. Any type-specific argument or result, or if you want alternative
    terminology, any *contravariant* argument, is a trouble in presence of classes of whatever origin. Drop classes and you will have no problems.
    Keep classes (implicit conversions included) and you will have problems
    with contravariance. OK?

    Unfortunately, what you explained doesn't see to match because you are
    using terms (such as contravariance) associated with subtyping when
    there is no subtyping in the problem! And it's left me mystified as to
    how all those terms relate to the issue at hand - if at all. :-(

    When you explain things it might help if you were to do so in terms of
    the mechanics rather than in terms of buzzwords/jargon.


    The trouble is that some functions are naturally free/unbound, i.e.
    they either don't have any parameters or they are not naturally bound
    to the type of their first parameter so in a call such as

       F(x)

    there's no way to tell whether F is free/bound or non-free/unbound.

    Why should anybody care? It tells do F on x. The effect is determined by
    the type of x. Qualifiers like "bound" are implementation details at
    best, otherwise meaningless.


    Well, if you saw

    F(x)

    where would you as a programmer look to find where F was defined?

    Similarly, if F were to be defined in two places as in

    type T2
    function F....

    and

    function(T2 F, ....)

    which one should the F(x) in

    T2 x
    F(x)

    invoke?

    As it happens, I am thinking that perhaps I should just accept that
    programmers would want to use both forms but then what happens if the
    same name is defined in both ways? Does one take precedence? Should the compiler (or the linker if from separate modules) flag an error?

    It may be worth bearing in mind that the bound form can always be
    accessed via the type as in

    T2.F

    or

    (typeof x).F

    By contrast, I suppose that the unbound form could be accessed by means
    of whatever contains it such as

    namespace N
    function F

    N.F

    so even if the same name were defined in both ways there would always be
    a way to specify one or the other - with the bound form being easiest to specify.


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dmitry A. Kazakov@21:1/5 to James Harris on Wed Nov 24 17:08:32 2021
    On 2021-11-24 16:35, James Harris wrote:
    On 24/11/2021 11:18, Dmitry A. Kazakov wrote:

    That means a plain free function, which has all problems expected. You
    get what you asked for.

    It is not possible to make all functions bound. For example,

      time_now()

    or

      emit_newline()

    cannot be bound to the types of their first parameters because neither
    has a first parameter.

    emit_newline() does not have problems either as there is no class involved.

    time_now() has the result, which has a type and if this type is a member
    of a class, you have all problems back. The class could be like

    { Calendar_Zoned_Time, UTC_Time, Monotonic_Real_Time }

    I already explained that. If F were class-wide in b, it would have no
    such problems because it would always go back to the methods. Any
    type-specific argument or result, or if you want alternative
    terminology, any *contravariant* argument, is a trouble in presence of
    classes of whatever origin. Drop classes and you will have no
    problems. Keep classes (implicit conversions included) and you will
    have problems with contravariance. OK?

    Unfortunately, what you explained doesn't see to match because you are
    using terms (such as contravariance) associated with subtyping when
    there is no subtyping in the problem! And it's left me mystified as to
    how all those terms relate to the issue at hand - if at all. :-(

    As I said, no classes no problems. Your "bondage" stuff loses sense
    without classes. It makes no difference how you declare subprograms and
    where back in FORTRAN-IV. You want FORTRAN-IV be FORTRAN-IV! You want
    more, learn the meaning of the "buzz words"...

    When you explain things it might help if you were to do so in terms of
    the mechanics rather than in terms of buzzwords/jargon.

    All mechanics was explained 100 times over. You just skipped it all 100
    times.

    Why should anybody care? It tells do F on x. The effect is determined
    by the type of x. Qualifiers like "bound" are implementation details
    at best, otherwise meaningless.

    Well, if you saw

      F(x)

    where would you as a programmer look to find where F was defined?

    Right-mouse click -> "go to definition" It was a stupid question, really.

    Similarly, if F were to be defined in two places as in

      type T2
        function F....

    and

      function(T2 F, ....)

    I have no idea what the second is supposed to mean, a cat walking the
    keyboard?

    which one should the F(x) in

      T2 x
      F(x)

    invoke?

    The one declared on the type T2, obviously. Sorry, this question looks
    even more stupid to me.

    [...]

    so even if the same name were defined in both ways there would always be
    a way to specify one or the other - with the bound form being easiest to specify.

    Free functions is trouble in presence of classes, that was established.

    --
    Regards,
    Dmitry A. Kazakov
    http://www.dmitry-kazakov.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to Andy Walker on Wed Nov 24 18:19:12 2021
    On 24/11/2021 00:04, Andy Walker wrote:
    On 22/11/2021 23:08, Bart wrote:
    You need a more complicated implementation, and one that implements
    more advanced optimisations, which are going to be slower, to get
    code of the same standard as you can get easily when things are
    built-in.
         Doesn't follow.  The library can have access to parts of the
    language that are not available to ordinary users, inc [eg] escapes
    into assembler, inlining, access to the compiler's internal structures,
    ..., and can be optimised when the compiler is built.
    This is my point: you need:
    * Inline assemby
    * Inlining
    * User-defined operator overloads

        [You don't need user-defined overloads unless your language
    includes them.  In which case you need them because of the language,
    not because you've implemented them in a library.]

    If you want to have an overloaded 'abs' operator or function available
    to the user of the language (so I'm not specifying how), then if it's
    not built-in, it will need the ability to do user-overloads, plus
    inlining if you want it as efficient.


    * The equivalent of C++'s constexpr
    * A gcc-class optimiser
    ...
    Too many advanced compilers features, just to keep the language 'small'.

        If you're going to have these things, you need them anyway.
    There is no clear-cut boundary between what compilers do and what
    goes into a standard library.  It doesn't matter whether it's the
    compiler that decides that "A + B" turns into "load A, load B, add"
    or the library that decides that.

    It matters very much. My static language is self-hosted, and can build
    itself entirely from source fast enough that I could do so ten times
    every second if I was so inclined.

    To me that is important; not that I need to do that 600 times a minute,
    but it means an effectively zero edit-build-run cycle for similar-sized applications.

    That would be harder to achieve using the wrong language choices; it
    would also take longer, and the result would build apps more slowly.


      In either case, syntax analysis
    produces a node in the parse tree consisting of an operator and its
    two [in this case] operands, or some near equivalent to that.  How
    that gets processed further is a matter of taste.


    Fine. So you don't care exactly how the language makes standard language features appear as though they are built-in. So there's no problem.

    What, like Algol68? They seemed to have gone out of their way to make
    the definition imcomprehensible!

        That is because some people went out of their way to claim
    that it is incomprehensible,

    I hadn't look at the report for years, and forgotten what it was like.

    But I did recently glimpse it again at the back of that PDF. The first
    thing you see is it talking about Protonotion, Metanotion, and Hypernotion.

    A few pages further on, it's on about Moid, Mood and Mu.

    You can't help but think that this is a language takes itself too
    seriously! (I have also considered that it might all have been an
    eloborate joke.)

    An interesting exercise would be to take the language and create a new definition without all those fancy terms.

    The language, as far as I can see, doesn't offer abilities that are particularly unusual or hard to understand, certainly amongst the
    current crop of languages. It should therefore have a more down-to-earth definition.


    and others more recently [no names, no
    pack drill] still go out of their way to ask me what a semicolon
    means instead of looking it up themselves.

    Commas have the usual expected use in languages that employ them.

    AFAICS in Algol68, ";" and "," can both be used to separate things, but
    "A;B" means do A then B, but "A,B" do A and B, in either order or
    concurrently.

    Whether that applies to A[i,j,k] or F(a,b,c), I don't know.

    However, it doesn't like F(a;b;c) when three arguments are expected. So
    there is something more to it.

    (Don't bother to explain; I'm never going to get it. I'm just not
    surprised that Wirth left the Algol68 (or 'new Algol') group and
    produced his own simpler language.)


      The Watt-Peck-Sintzoff
    chart in Algol Bulletin gets the A68 syntax onto one side;  the
    A68R User Guide gets it, in larger type and with lots of white
    space, into well under two sides;  substantially less than usual representations of C, Pascal, Ada, ....

    (My a68g.exe is 2.8MB; my nearest language is 0.5MB. Just saying...)
        My copy of "a68g" [2.8.4] is 1183284 bytes;  it has crept
    up over the decades from the 504749 bytes of 1.4.0, the earliest
    A68G version I have.  Algol itself hasn't changed, and A68G "then"
    was missing only a few minor features, so I expect the increase
    comes primarily from (a) much more extensive checking, (b) extra
    language features such as partial parametrisation and the Torrix
    extensions, and (c) additional library features such as Plotutils, PostgreSQL, linear algebra, and many others.  I expect that if
    your language included such things, it too would grow from 0.5MB
    to around 1MB.

    Yeah I think this has come up before.

    The reality is that the languages (either of mine, both 0.5MB) are too different to do a meaningful comparison. A68 has lots of features I
    don't, and I have quite a few missing in A68).

    One point of similarity however is the use of a single, self-contained executable.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to Dmitry A. Kazakov on Fri Nov 26 17:21:08 2021
    On 24/11/2021 16:08, Dmitry A. Kazakov wrote:
    On 2021-11-24 16:35, James Harris wrote:
    On 24/11/2021 11:18, Dmitry A. Kazakov wrote:

    ...

    I already explained that. If F were class-wide in b, it would have no
    such problems because it would always go back to the methods. Any
    type-specific argument or result, or if you want alternative
    terminology, any *contravariant* argument, is a trouble in presence
    of classes of whatever origin. Drop classes and you will have no
    problems. Keep classes (implicit conversions included) and you will
    have problems with contravariance. OK?

    Unfortunately, what you explained doesn't see to match because you are
    using terms (such as contravariance) associated with subtyping when
    there is no subtyping in the problem! And it's left me mystified as to
    how all those terms relate to the issue at hand - if at all. :-(

    As I said, no classes no problems. Your "bondage" stuff loses sense
    without classes. It makes no difference how you declare subprograms and
    where back in FORTRAN-IV. You want FORTRAN-IV be FORTRAN-IV! You want
    more, learn the meaning of the "buzz words"...

    I have tried to understand your use of buzzwords but quite apart from
    you expecting others to know Ada lingo, in some cases I remember finding
    that you use buzzwords differently from other people. So if you want to obfuscate a meaning you are doing exactly the right thing.

    The real world does not happen in jargon but in transformations of data
    and data structures. It should always be possible to explain a concept
    in terms of data structures or algorithms. Binding a name to a function
    is a case in point and the process could be described by steps, not jargon.


    When you explain things it might help if you were to do so in terms of
    the mechanics rather than in terms of buzzwords/jargon.

    All mechanics was explained 100 times over. You just skipped it all 100 times.

    Rubbish. You may have used buzzwords 100 times but that does not
    constitute an explanation.

    ...

    Similarly, if F were to be defined in two places as in

       type T2
         function F....

    and

       function(T2 F, ....)

    I have no idea what the second is supposed to mean, a cat walking the keyboard?

    Sorry, it was pseudo C syntax. The type name T2 preceding the identifier
    F was meant to indicate that F would be of type T2. The point of the two fragments of code was that in BOTH of them the first argument to F would
    be of type T2. The problem, then, is to say which F would be invoked by

    F(x)

    where x was of type T2.


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dmitry A. Kazakov@21:1/5 to James Harris on Fri Nov 26 21:05:05 2021
    On 2021-11-26 18:21, James Harris wrote:

    The real world does not happen in jargon but in transformations of data
    and data structures.

    ... precisely described by that jargon.

    Similarly, if F were to be defined in two places as in

       type T2
         function F....

    and

       function(T2 F, ....)

    I have no idea what the second is supposed to mean, a cat walking the
    keyboard?

    Sorry, it was pseudo C syntax. The type name T2 preceding the identifier
    F was meant to indicate that F would be of type T2. The point of the two fragments of code was that in BOTH of them the first argument to F would
    be of type T2. The problem, then, is to say which F would be invoked by

      F(x)

    where x was of type T2.

    The call is ambiguous when F of both declarations is directly visible.

    An important difference between a method and a free function is that
    method cannot be hidden. If you see T2 or any object of you also see all methods of T2.

    --
    Regards,
    Dmitry A. Kazakov
    http://www.dmitry-kazakov.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to Dmitry A. Kazakov on Fri Nov 26 21:56:25 2021
    On 26/11/2021 20:05, Dmitry A. Kazakov wrote:
    On 2021-11-26 18:21, James Harris wrote:

    ...

    Similarly, if F were to be defined in two places as in

       type T2
         function F....

    and

       function(T2 F, ....)

    I have no idea what the second is supposed to mean, a cat walking the
    keyboard?

    Sorry, it was pseudo C syntax. The type name T2 preceding the
    identifier F was meant to indicate that F would be of type T2. The
    point of the two fragments of code was that in BOTH of them the first
    argument to F would be of type T2. The problem, then, is to say which
    F would be invoked by

       F(x)

    where x was of type T2.

    The call is ambiguous when F of both declarations is directly visible.

    Indeed. The question is what to do about the ambiguity. Options:

    1. Pick the F function found in the type.

    2. Pick the F function which is 'unbound'.

    3. If the two functions are defined in different scopes pick the one
    which is 'closer in' to where F is invoked.

    4. Reject the expression because of the ambiguity.

    I do not know which approach would be best.



    An important difference between a method and a free function is that
    method cannot be hidden. If you see T2 or any object of you also see all methods of T2.

    OK.


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Andy Walker@21:1/5 to Bart on Fri Nov 26 22:00:41 2021
    On 24/11/2021 18:19, Bart wrote:
    * User-defined operator overloads
         [You don't need user-defined overloads unless your language
    includes them.  In which case you need them because of the language,
    not because you've implemented them in a library.]
    If you want to have an overloaded 'abs' operator or function
    available to the user of the language (so I'm not specifying how),
    then if it's not built-in, it will need the ability to do
    user-overloads, plus inlining if you want it as efficient.

    "Available to the user" is not the same as "user-defined".
    There are typically lots of things in a library that the user not
    merely does not need to define but often cannot define within the
    formal language. This typically includes basic types such as "int",
    privileged calls to the OS, environment enquiries, generic types
    [eg when "print" can take any type as parameter(s)], ....

    [...]
    Too many advanced compilers features, just to keep the language 'small'.
         If you're going to have these things, you need them anyway.
    There is no clear-cut boundary between what compilers do and what
    goes into a standard library.  It doesn't matter whether it's the
    compiler that decides that "A + B" turns into "load A, load B, add"
    or the library that decides that.
    It matters very much. My static language is self-hosted, and can
    build itself entirely from source fast enough that I could do so ten
    times every second if I was so inclined.

    So somewhere in your source there is code that says "A + B"
    does not mean "pass A and B to a function called +" but instead notes
    this as a special sort of node in the parse tree [or near equivalent]
    and says "Aha! This compiles into ...". The effect is the same
    whether this is built-in to the parser or is some "magic" in the
    library. Which version is faster depends on how it is done; there
    are pros and cons.

    To me that is important; not that I need to do that 600 times a
    minute, but it means an effectively zero edit-build-run cycle for similar-sized applications.

    No, it means an effectively zero build-run part of that
    cycle. If it was ten times slower, it would have negligible
    effect on the whole cycle, unless it takes you less than a few
    seconds to note the results of the run, work out what has gone
    wrong [if anything], decide what you need to change, go to the
    right place in the code, and type in the change. To which you
    can [or should!] add the time needed to run regression tests,
    and documentation. [I'm not saying that speed of compilation is
    unimportant, of course.]

    [A68:]
    But I did recently glimpse it again at the back of that PDF. The
    first thing you see is it talking about Protonotion, Metanotion, and Hypernotion.

    Normal people largely skip that bit; it's just explaining
    how two-level grammars work.

    A few pages further on, it's on about Moid, Mood and Mu.

    Actually about "MOID", etc. That's the metalanguage, the
    higher of the two levels. It's how you get an infinite set of
    lower-level syntax rules, escape the tyranny of BNF, and check
    as a matter of syntax that variables are declared, etc.

    You can't help but think that this is a language takes itself too
    seriously! (I have also considered that it might all have been an
    eloborate joke.)

    You can't please everyone. A "MOID" is either a "MODE"
    [type] or "void"; what should it be called, given that you want
    to use it several hundred times? You want something short that
    conveys the right general impression. No matter what you choose,
    it will seem too jokey to some, too up itself to others.

    An interesting exercise would be to take the language and create a
    new definition without all those fancy terms.

    Feel free. You could start with the brief syntax in the
    "A68R User's Guide" or in [eg] "Introductory Algol 68 Programming".
    But beware that the C standard and other similar documents are
    several times longer, and nowhere near as definitive as the RR.

    [...]
    AFAICS in Algol68, ";" and "," can both be used to separate things,
    but "A;B" means do A then B, but "A,B" do A and B, in either order or concurrently.
    Whether that applies to A[i,j,k] or F(a,b,c), I don't know.
    However, it doesn't like F(a;b;c) when three arguments are expected.
    So there is something more to it.

    Of course there is more to it! You have supplied a serial
    clause where a parameter pack is required. You can't expect just
    to slap down random sequences of characters without understanding
    what they are supposed to be -- in any language! -- and have it
    work. AFAIK, no language I have ever used expects "A[i;j;k]" to
    be an array element or "F(a;b;c)" to be the call of a function
    with three parameters. I don't see any interesting distinction
    between the way ";" and "," are used in Algol and the way they
    are used in Pascal or C; ";" primarily to end a "statement" or
    similar, and"," primarily to break up lists.

    (Don't bother to explain; I'm never going to get it. I'm just not
    surprised that Wirth left the Algol68 (or 'new Algol') group and
    produced his own simpler language.)

    Well, simpler to compile, for sure. Simpler for new
    programmers to learn or for experienced programmers to use?
    Far from clear. Simpler to describe? People mocked the 236
    pages of the RR [most of it comments, examples or the system
    library] when it first came out -- until they found that proper
    descriptions of other languages were at least as long, and
    nowhere near as precise.

    --
    Andy Walker, Nottingham.
    Andy's music pages: www.cuboid.me.uk/andy/Music
    Composer of the day: www.cuboid.me.uk/andy/Music/Composers/Marpurg

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dmitry A. Kazakov@21:1/5 to James Harris on Fri Nov 26 23:16:19 2021
    On 2021-11-26 22:56, James Harris wrote:

    Indeed. The question is what to do about the ambiguity. Options:

    1. Pick the F function found in the type.

    2. Pick the F function which is 'unbound'.

    3. If the two functions are defined in different scopes pick the one
    which is 'closer in' to where F is invoked.

    4. Reject the expression because of the ambiguity.

    I do not know which approach would be best.

    Only #4 is valid.

    What I tried to explain you with "buzz" words the solution is preventing
    some ambiguities to appear by limiting free functions.

    There are also other language design requirements regarding consistency
    and unambiguity of program, especially similarly looking code in
    different contexts or literally same generic code instantiated in
    different contexts.

    But I do not want to go into it, because it will be another spat about
    "buzz" words.

    --
    Regards,
    Dmitry A. Kazakov
    http://www.dmitry-kazakov.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to Andy Walker on Sat Nov 27 01:21:03 2021
    On 26/11/2021 22:00, Andy Walker wrote:
    On 24/11/2021 18:19, Bart wrote:
    * User-defined operator overloads

        So somewhere in your source there is code that says "A + B"
    does not mean "pass A and B to a function called +" but instead notes
    this as a special sort of node in the parse tree [or near equivalent]
    and says "Aha!  This compiles into ...".  The effect is the same
    whether this is built-in to the parser or is some "magic" in the
    library.  Which version is faster depends on how it is done;  there
    are pros and cons.

    Imagine if you were creating the bytecode for an interpreter. Bytecode execution is already slow enough, you will surely want a dedicated
    instruction for a such a common operation, rather than involve a
    call/return sequence plus whatever other instructions are needed.

    Well, the bytecode for my interpeter [and many others!] has such an ADD instruction. So does the VM used by my compiler, and knows what is it
    early on. Very early in fact; this table entry links 'addsym', a lexer
    token, with 'kadd', the VM op for ADD:

    (addsym, "+", bin_op, kadd, kaddto, 4, 1),

    Why not specify it straightaway instead of going around the houses?

    (The '4' is a precedence level, and the '1' indicates it can start an expression. The 'kaddto' is an auxiliary bytecode for when used as
    '+:='; that is built-in too.)


    To me that is important; not that I need to do that 600 times a
    minute, but it means an effectively zero edit-build-run cycle for
    similar-sized applications.

        No, it means an effectively zero build-run part of that
    cycle.  If it was ten times slower, it would have negligible
    effect on the whole cycle,

    For a few years, I worked with an interpreted compiler. That was usable,
    most compiles /of a module/ took 0.5 seconds or less.

    (That it was usable at all was also thanks to the simple compilation
    techniques I favoured.)

    But that was with separate compilation. I wanted the advantages of a whole-program compiler, so it had to be faster if I didn't want the
    annoyance of waiting several seconds for each build.

    So I switched to native, and got 20 times faster.

    Why would I want to make it slower? It means the approach is scalable to programs 10 times the size (it might take 1-2 seconds) before build
    delays become intolerable, when I'd need some more speedups.

    That is more interesting than inventing ways for the compiler to get slower.

    (Yes I know that many people with much bigger projects and much slower compilers have to wait minutes. That is partly up to them, but I feel
    sorry for the people working with Rust now. I would switch language to
    increase productivity and a more lively environment.)

    unless it takes you less than a few
    seconds to note the results of the run, work out what has gone
    wrong [if anything], decide what you need to change, go to the
    right place in the code, and type in the change.

    With near-instant compilation you develop different ways of working. I
    might edit, build and re-run just to the spacing right or fix some
    annoying detail, which could take multiple builds a few seconds apart.

    Also, doing nothing for several seconds interrupts your concentration.

    [A68:]
    But I did recently glimpse it again at the back of that PDF. The
    first thing you see is it talking about Protonotion, Metanotion, and
    Hypernotion.

        Normal people largely skip that bit;  it's just explaining
    how two-level grammars work.

    (Did you mean Normally rather than Normal?)


    A few pages further on, it's on about Moid, Mood and Mu.

        Actually about "MOID", etc.  That's the metalanguage, the
    higher of the two levels.  It's how you get an infinite set of
    lower-level syntax rules, escape the tyranny of BNF, and check
    as a matter of syntax that variables are declared, etc.

    Except that it isn't a matter of syntax; that's too constraining.(But
    didn't you say its syntax could be described in 1-2 pages?)



    You can't help but think that this is a language takes itself too
    seriously! (I have also considered that it might all have been an
    eloborate joke.)

        You can't please everyone.  A "MOID" is either a "MODE"
    [type] or "void";  what should it be called, given that you want
    to use it several hundred times?

    A MODE. Especially if as you say, a MODE/VOID is used in 100s of places.

    If it's necessary to discriminate at some point between VOID and any
    other MODE, then do so more locally, or with a less silly-sounding name.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Andy Walker@21:1/5 to All on Sat Nov 27 22:19:43 2021
    On 27/11/2021 01:21, Bart wrote:
    [I wrote:]
    So somewhere in your source there is code that says "A + B"
    does not mean "pass A and B to a function called +" but instead notes
    this as a special sort of node in the parse tree [or near equivalent]
    and says "Aha! This compiles into ...". The effect is the same
    whether this is built-in to the parser or is some "magic" in the
    library. Which version is faster depends on how it is done; there
    are pros and cons.
    Imagine if you were creating the bytecode for an interpreter.
    Bytecode execution is already slow enough, you will surely want a
    dedicated instruction for a such a common operation, rather than
    involve a call/return sequence plus whatever other instructions are
    needed.

    ??? Of course it is common for back-end interpreters to
    have "add" instructions. That makes it less, rather than more,
    important for the front end to do anything special with it. You
    seem to be hung up in thinking that a library cannot do anything
    but turn "A + B" into two parameters and a function call. If it
    is integrated into the system, it has privileged access to the
    innards of the compiler, the operating system, ....

    Well, the bytecode for my interpeter [and many others!] has such an
    ADD instruction. So does the VM used by my compiler, [...].
    Why not specify it straightaway instead of going around the houses?

    You can. But it's not "around the houses" if it's a
    library that emits that bytecode rather than some "magic" in
    the syntax analysis. It's just a matter of choice where in the
    processing that emitting takes place. There is also merit in
    having the syntax analysis simpler and more general. There is
    also merit in leaving optimisations to a back end. There is room
    for different models! Your requirement for a monolith is perhaps
    different from [eg] Gnu's requirement for lots of front ends for
    a variety of languages feeding into a common module feeding into
    lots of back ends for different hardware.

    [...]
    So I switched to native, and got 20 times faster.
    Why would I want to make it slower?

    No reason why you should; I was merely pointing out that
    whether a compilation and run takes 0.01s [as a large majority of
    mine do using Algol 68G, with most of the rest < 1s] or 1s makes
    no interesting difference to the complete edit-compile-run cycle.
    But you seem to be assuming that putting things into libraries is
    /ipso facto/ significantly slower than building them in to the
    syntax. It may be; or it could be faster; or it could make
    absolutely no measurable difference, depending on implementation
    details and other factors.

    [...]
    With near-instant compilation you develop different ways of working.
    I might edit, build and re-run just to the spacing right or fix some
    annoying detail, which could take multiple builds a few seconds
    apart.

    Sure; I do that. It is still the case that my cycles
    are almost entirely taken up by the edit phase.

    [...]
    [A68:]
    But I did recently glimpse it again at the back of that PDF. The
    first thing you see is it talking about Protonotion, Metanotion, and
    Hypernotion.
    Normal people largely skip that bit; it's just explaining
    how two-level grammars work.
    (Did you mean Normally rather than Normal?)

    No, I meant "normal". Perhaps you're one of those strange
    people who read encyclopaedias by starting at "Aachen" and going
    through to "zygote"? Most people skip things they don't understand
    and perhaps come back to them when better placed.

    A few pages further on, it's on about Moid, Mood and Mu.
    Actually about "MOID", etc. That's the metalanguage, the
    higher of the two levels. It's how you get an infinite set of
    lower-level syntax rules, escape the tyranny of BNF, and check
    as a matter of syntax that variables are declared, etc.
    Except that it isn't a matter of syntax; that's too constraining.

    It /is/ a matter of syntax in the RR; in what sense is
    it constraining? If you use undeclared variables, do you not
    want the compiler to detect the error? Or do you prefer the
    early versions of Fortran or C where the compiler assumes that
    an undeclared "j" [for example] is an integer variable?

    (But didn't you say its syntax could be described in 1-2 pages?)

    Yes, in the same way that most standard languages can
    be described [eg] by a few pages of EBNF supplemented by lots
    of pages of semantics. The RR chooses instead to have lots
    of pages of syntax and rather few of semantics. Where to put
    the boundary is a grey area. But, broadly speaking, it's the
    semantics that lead to imprecision and debates about what
    something means.

    [...]
    You can't please everyone. A "MOID" is either a "MODE"
    [type] or "void"; what should it be called, given that you want
    to use it several hundred times?
    A MODE. Especially if as you say, a MODE/VOID is used in 100s of
    places.

    "MODE" [without "void" -- not "VOID"] is also used all over
    the place; so, for that matter, is "void" without "MODE".

    If it's necessary to discriminate at some point between VOID and any
    other MODE, then do so more locally, or with a less silly-sounding
    name.

    It's necessary throughout the RR. All part of being
    careful with use of language. You find "MOID" silly; most
    of us just find it a little quirky, and surely better than
    [eg] "MODEORVOID".

    --
    Andy Walker, Nottingham.
    Andy's music pages: www.cuboid.me.uk/andy/Music
    Composer of the day: www.cuboid.me.uk/andy/Music/Composers/Godfrey

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to Andy Walker on Sun Nov 28 12:52:18 2021
    On 27/11/2021 22:19, Andy Walker wrote:
    On 27/11/2021 01:21, Bart wrote:
    [I wrote:]
         So somewhere in your source there is code that says "A + B"
    does not mean "pass A and B to a function called +" but instead notes
    this as a special sort of node in the parse tree [or near equivalent]
    and says "Aha!  This compiles into ...".  The effect is the same
    whether this is built-in to the parser or is some "magic" in the
    library.  Which version is faster depends on how it is done;  there
    are pros and cons.
    Imagine if you were creating the bytecode for an interpreter.
    Bytecode execution is already slow enough, you will surely want a
    dedicated instruction for a such a common operation, rather than
    involve a call/return sequence plus whatever other instructions are
    needed.

            ???  Of course it is common for back-end interpreters to have "add" instructions.  That makes it less, rather than more,
    important for the front end to do anything special with it.  You
    seem to be hung up in thinking that a library cannot do anything
    but turn "A + B" into two parameters and a function call.  If it
    is integrated into the system, it has privileged access to the
    innards of the compiler, the operating system, ....

    OK, you can understand that internally, a language+implementation can do
    things as it pleases, just so long as the user can do A+B and it does
    what is expected of it.

    Which means /my/ implementation can choose to handle it as it likes too.

    Yet not long ago you seemed to keen to implement things like:

    A +:= B

    via operator-overload mechanisms expressed within the language itself.
    And more recently that you wanted new comparison operators to be defined
    the same way.

    In that latter case, the benefits of the language allowing such new
    features without bothering the language or changing the compiler, seemed
    to be outweighed by the poor quality and limitations of the result (and
    its non-stardardness!)

    But it seems my approach is perfectly valid too.


    Well, the bytecode for my interpeter [and many others!] has such an
    ADD instruction. So does the VM used by my compiler, [...].
    Why not specify it

    So I switched to native, and got 20 times faster.
    Why would I want to make it slower?

            No reason why you should;  I was merely pointing out that whether a compilation and run takes 0.01s [as a large majority of
    mine do using Algol 68G, with most of the rest < 1s] or 1s makes
    no interesting difference to the complete edit-compile-run cycle.
    But you seem to be assuming that putting things into libraries is
    /ipso facto/ significantly slower than building them in to the
    syntax.  It may be;  or it could be faster;  or it could make
    absolutely no measurable difference, depending on implementation
    details and other factors.

    I've just used gcc to build a 4-line hello.c program; it took over 5
    seconds even on my somewhat faster new PC, less than 1 line per second!

    That's from cold; subsequent runs take 0.25 seconds; still only 20 lines
    per second. Such products are cumbersome because they are so large, and overheads mean even tiny programs take an annoying amount of time.

    On normal-sized programs, it might do 14Klps unoptimised, or 3Klps at -O3.

    The A68G manages up to a respectable 20Klps, although not spectacular
    for an /interpreter/.

    The products I make can manage 500Klps for my static language, and 1Mlps
    for my interpreter.

    With the intepreter, it needs to compile an entire program head of time
    into bytecode, before it can start execution. It uses a bundled library
    which is [currently] over 10K lines. At A68G speeds, every program would
    take 0.5 seconds start-up time before it even started on the user's application.

    (That library largely provides graphics and image processing capabilities.)

    A typical application of 25Kloc would take over a second start-up, an
    annoying latency. Fortunately, at its actual speed, start-up delays only
    start to be noticable when programs reach 100K lines, and then the delay
    is minimal (around 1/10th of a second).

    Now start to compare products that use tracing JIT-compilers, which
    introduce enough overheads that on the wrong kind of application (not
    enough localised bottlenecks), they can be slower than regular
    interpreters with longer start-up times.


    Except that it isn't a matter of syntax; that's too constraining.

            It /is/ a matter of syntax in the RR;  in what sense is
    it constraining?  If you use undeclared variables, do you not
    want the compiler to detect the error?

    But it's nothing to do with syntax! Syntax is just the shape of the
    code. Here is the kind of syntax we've been using:

    A + B

    That's a perfectly valid expression in most normal languages. Except in Algol68, because A and B haven't been declared?

    It's far simpler to separate these different ideas: the shape of the
    code; the resolving of the names (which may detect they are undefined);
    the code generation, execution etc.

    In my dynamic language, it doesn't care they might be undefined until
    runtime.

    Instead, Algol68 chooses the most complex approach all, that is hard to understand and difficult to implement. You can tell it was created by academics!

    Or do you prefer the
    early versions of Fortran or C where the compiler assumes that
    an undeclared "j" [for example] is an integer variable?

    If that's what the language said, then that's what it will be. If I
    execute this complete program of my 'Q' language:

    print j

    it says:

    <Void>

    (That's not an error, but most other uses are.)

    It's not exactly the end of the world. In Algol68:

    INT j;
    print(j)

    gives a runtime error. So j might be defined, but it hasn't done much
    good! I guess the RR doesn't make correct /initialisation/ part of the
    syntax?

            It's necessary throughout the RR.  All part of being
    careful with use of language.  You find "MOID" silly;  most
    of us just find it a little quirky,

    The whole thing is quirky. What I can confidently say is that, to me,
    that whole document is almost completely useless for any purpose.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Andy Walker@21:1/5 to Bart on Sun Nov 28 23:10:04 2021
    On 28/11/2021 12:52, Bart wrote:
    OK, you can understand that internally, a language+implementation can
    do things as it pleases, just so long as the user can do A+B and it
    does what is expected of it.
    Which means /my/ implementation can choose to handle it as it likes
    too.

    Of course it can.

    Yet not long ago you seemed to keen to implement things like:
       A +:= B
    via operator-overload mechanisms expressed within the language
    itself.

    ??? You again fail to distinguish between how something
    is defined, and how it is implemented. "A +:= B" is defined by
    the RR, /together with/ [as in many language specifications] a
    general rule [end of RR10.1] that the definition describes the
    effect, and implementations are free to use more efficient
    methods. In the instant case, the RR explains what "A +:= B"
    is supposed to do; you can't really expect it to give the
    machine code for your computer, that's up to the compiler.

    And more recently that you wanted new comparison operators to
    be defined the same way.

    If you have a new operator, how would you propose it be
    defined, in general, other than within the language? We can't
    go around re-revising the language specification every time
    someone thinks of a new useful operator, nor every time someone
    invents new hardware with different machine code.

    In that latter case, the benefits of the language allowing such new
    features without bothering the language or changing the compiler,
    seemed to be outweighed by the poor quality and limitations of the
    result (and its non-stardardness!)

    Two different things. As noted during the debate, I
    personally have no requirement for "A < B < C"; I merely point
    out that I can at least /define/ what that means within A68.
    If I point out some useful operator that your language is
    missing [eg, the ability to write "A + B" where "A" and "B"
    are graphs or games or functions], you have to change your
    compiler [and its associated documentation]; I just write
    and annotate some more A68. "Poor quality" is a matter of
    opinion; you're like the man who, seeing his neighbour's
    dog playing chess, complained that it wasn't playing very
    well. "Limitations" are your lack of imagination. "Non-
    standardness" happens every time you write a new function
    in your program; it is up to good programmers to give new
    functions sufficiently descriptive names [or comments].
    The same applies to operators and types.

    But it seems my approach is perfectly valid too.

    Of course it is, if that's all you want. But it
    doesn't work for standardised languages used by lots of
    people, and perhaps across lots of hardware.

    I've just used gcc to build a 4-line hello.c program; it took over 5
    seconds even on my somewhat faster new PC, less than 1 line per
    second!

    Perhaps you should switch to plain old "cc"? Oh,
    wait, on my machine that is a symlink to "gcc". On my home
    PC, thirteen years old and nothing fancy, it compiles and
    runs a "hello.c" program in 0.04s, only 3x slower than A68G.

    [...]
    Except that it isn't a matter of syntax; that's too constraining.
             It /is/ a matter of syntax in the RR;  in what sense is
    it constraining?  If you use undeclared variables, do you not
    want the compiler to detect the error?
    But it's nothing to do with syntax!

    In the RR, it /is/ to do with syntax. ...

    Syntax is just the shape of the
    code. Here is the kind of syntax we've been using:
       A + B
    That's a perfectly valid expression in most normal languages. Except
    in Algol68, because A and B haven't been declared?

    ... If there is no declaration for "A" and "B", then it
    may be syntactically correct in your language, but it surely is
    not valid? In A68, there is no route from the concept "program"
    to "... letter a symbol, plus symbol. letter b symbol, ..." if
    they are not declared. Whether the compiler reports that as a
    syntax error or more helpfully as "identifier A not declared" is
    a matter of QoI.

    It's far simpler to separate these different ideas: the shape of the
    code; the resolving of the names (which may detect they are
    undefined); the code generation, execution etc.

    So much simpler that the C, C++, Ada, ... standards are
    much bigger, much harder to navigate, and leave far too much
    ill-defined. Of course, if your language doesn't /have/ a
    standard, that is indeed much simpler. But as soon as C moved
    away from the one implementation at Bell Labs, and it was no
    longer possible for everyone to ask DMR what something meant,
    it all became more complicated. /You/ grumble about the result
    in practically every exchange here with DB.

    Anyway, /you/ see them as "these different ideas". The
    RR sees at least the first two as a unified whole; there is a
    book, "Grammars for Programming Languages" by Cleaveland and
    Uzgalis which explains how to integrate all the rest also into
    one single grammar. You might think that's going too far?

    In my dynamic language, it doesn't care they might be undefined until runtime.

    That's fine; it just means that your language is not
    the same as C or Algol or ... in this respect. Leaving things
    to runtime has its own pros and cons; you seem very unwilling
    to accept that some issues are "on the one hand ..., on the
    other ..." so that there is a balance to be struck -- which
    sometimes comes down one side, sometimes the other.

    Instead, Algol68 chooses the most complex approach all, that is hard
    to understand and difficult to implement.

    You aren't expected to learn A68 from the RR, but from
    a suitable textbook. It's not hard to learn, unless you are
    claiming that you are substantially more stupid than the many
    thousands of students who learned A68 as their first language
    in [mainly] the 1970s. It's certainly harder than Pascal to
    implement; but you write compilers once and use them many
    times, so the difficulty is amortised over thousands of uses.

    Or do you prefer the
    early versions of Fortran or C where the compiler assumes that
    an undeclared "j" [for example] is an integer variable?
    If that's what the language said, then that's what it will be. If I
    execute this complete program of my 'Q' language:
        print j
    it says:
        <Void>
    (That's not an error, but most other uses are.)

    In a compiled language, it's a disaster waiting to
    happen, which is why it was dropped from Fortran and C.
    Again, the balance of risk is different in different
    environments. Serious [esp life-critical] code should not
    depend on things that can easily happen by accident, eg by
    a simple typo.

    It's not exactly the end of the world. In Algol68:
        INT j;
        print(j)
    gives a runtime error. So j might be defined, but it hasn't done much
    good! I guess the RR doesn't make correct /initialisation/ part of
    the syntax?

    It's undefined by the RR, and in general is not
    computable at compile time. Requiring initialisation along
    with the declaration is inefficient [think eg of a large
    array where the required values are derived, eg, from user
    input, so that filling in zeroes, for example, would be
    wasted]. A68G, as you have found, treats it as a run-time
    error; yes, there's a speed penalty, but it also picks up
    lots of programming errors.

    [...] What I can confidently say is that, to me,
    [the RR] is almost completely useless for any purpose.

    That's fine. You're not writing an A68 compiler.
    But it also means that you're equally almost completely
    unqualified to comment on any matter of A68.

    --
    Andy Walker, Nottingham.
    Andy's music pages: www.cuboid.me.uk/andy/Music
    Composer of the day: www.cuboid.me.uk/andy/Music/Composers/Ravel

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to Andy Walker on Mon Nov 29 00:53:33 2021
    On 28/11/2021 23:10, Andy Walker wrote:
    On 28/11/2021 12:52, Bart wrote:

    I've just used gcc to build a 4-line hello.c program; it took over 5
    seconds even on my somewhat faster new PC, less than 1 line per
    second!

        Perhaps you should switch to plain old "cc"?  Oh,
    wait, on my machine that is a symlink to "gcc".  On my home
    PC, thirteen years old and nothing fancy, it compiles and
    runs a "hello.c" program in 0.04s, only 3x slower than A68G.

    On my WSL, 'gcc hello.c && ./a.out' takes 0.25 seconds from 'cold', and
    then 0.1 seconds.

    However, I use Windows. DB will say that Windows is slow at file
    operations, so it is not gcc's fault.

    I would say it /is/ the app's fault if it has a complex enough structure
    that it struggles to start up each time, faults that may be hidden on
    forgiving OSes. Hence why I sometimes like to test across both.

    [...]  What I can confidently say is that, to me,
    [the RR] is almost completely useless for any purpose.

        That's fine.  You're not writing an A68 compiler.
    But it also means that you're equally almost completely
    unqualified to comment on any matter of A68.


    So they wanted to be elitist; that's fine. They can keep their language!
    But people can voice their opinions about it too.

    I just selected some choice bits of syntax and adapted them for my own purposes. However, thanks to the exchanges here, I have a mind to tweak
    my syntax to break any remaining Algol68 connections.

    I can already write 'end', 'endif' etc in place of 'fi' 'esac', 'od',
    the keywords that tend to give it away.

    My language, if I were to produce a reference to it, would definitely
    not be gobbledygook; I'd see to that. But that's unlikely to happen
    since, as certain people like to point out, it's one-person language;
    the benefits would be minimal.

    It would be like producing a guide to my own house.

    Here however are some benchmarks for 'fannkuch(10)':

    Secs Lang Types Exec

    A68G 50 A68 Static Interp? Win or WSL
    A68G -optimise -- A68 (WSL: Failed)
    qq -asmopt 2.2 Q Dynamic Interp (mildly accelerated)
    gcc -O3 0.22 C Static Native
    mm -opt 0.22 M Static Native

    My stuff doesn't do too badly speedwise does it?

    I couldn't get a68g -optimise to even, even on WSL, a shame as that
    50-second timing to interpret /static/ code is poor.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Charles Lindsey@21:1/5 to Bart on Mon Nov 29 12:30:06 2021
    On 29/11/2021 00:53, Bart wrote:


    Here however are some benchmarks for 'fannkuch(10)':

                       Secs  Lang  Types    Exec

      A68G             50    A68   Static   Interp?   Win or WSL
      A68G -optimise   --    A68                     (WSL: Failed)
      qq -asmopt        2.2  Q     Dynamic  Interp   (mildly accelerated)
      gcc -O3           0.22 C     Static   Native
      mm -opt           0.22 M     Static   Native

    My stuff doesn't do too badly speedwise does it?

    I couldn't get a68g -optimise to even, even on WSL, a shame as that 50-second timing to interpret /static/ code is poor.

    $ cat hello.c
    #include <stdio.h>
    int main() {printf("Hello World!");}
    $ time gcc hello.c && ./a.out

    real 0m0.038s
    user 0m0.034s
    sys 0m0.004s
    Hello World!$

    $ cat hello.a68
    BEGIN print("Hello World!") END
    $ time a68g hello.a68
    Hello World!

    real 0m0.016s
    user 0m0.008s
    sys 0m0.008s
    $

    That's Ubuntu on a modern Intel 68-bit hardware.

    --
    Charles H. Lindsey ---------At my New Home, still doing my own thing------
    Tel: +44 161 488 1845 Web: https://www.clerew.man.ac.uk Email: chl@clerew.man.ac.uk Snail-mail: Apt 40, SK8 5BF, U.K.
    PGP: 2C15F1A9 Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to Charles Lindsey on Mon Nov 29 13:33:48 2021
    On 29/11/2021 12:30, Charles Lindsey wrote:
    On 29/11/2021 00:53, Bart wrote:


    Here however are some benchmarks for 'fannkuch(10)':

                        Secs  Lang  Types    Exec

       A68G             50    A68   Static   Interp?   Win or WSL
       A68G -optimise   --    A68                     (WSL: Failed)
       qq -asmopt        2.2  Q     Dynamic  Interp   (mildly accelerated)
       gcc -O3           0.22 C     Static   Native
       mm -opt           0.22 M     Static   Native

    My stuff doesn't do too badly speedwise does it?

    I couldn't get a68g -optimise to even, even on WSL, a shame as that
    50-second timing to interpret /static/ code is poor.

    $ cat hello.c
    #include <stdio.h>
    int main() {printf("Hello World!");}
    $ time gcc hello.c && ./a.out

    real    0m0.038s
    user    0m0.034s
    sys    0m0.004s
    Hello World!$

    $ cat hello.a68
    BEGIN print("Hello World!") END
    $ time a68g hello.a68
    Hello World!

    real    0m0.016s
    user    0m0.008s
    sys    0m0.008s
    $

    That's Ubuntu on a modern Intel 68-bit hardware.

    I'm still stuck on 64 bits unfortunately.

    The purpose of the 'hello world' test was to demonstrate the start-up
    overheads of a big compiler like gcc.

    So even on my SSD-based PC, that overhead will add 0.25 seconds to
    building any program.

    People like to demonstrate how starting gcc on Linux takes zero time; I
    don't know why that is. Maybe large Windows PE/DLL files are slower to
    process; maybe much of what is needed for gcc on Linux is already running.

    However, I work on Windows, and that timing is real. My C compiler also
    runs on Windows, and that doesn't have a problem:

    C:\c>tm bcc hello
    Compiling hello.c to hello.exe
    TM: 0.03

    (Note that running even an empty program on this OS will take 0.02
    seconds anyway.)

    Regarding the above benchmark, the A68 code for that is given below.

    So that I can properly fill in my table, I'd be grateful if you could
    run this on your machine both with and without -optimise, as that flag
    doesn't work on my WSL. Then I can estimate the runtime on my machine.

    (I assume it transpiles to C then runs that C. The errors are within the
    A68 headers.)

    -----------------------------------------------------------------------------------
    INT res2;

    PROC fannkuch=(INT n)INT: BEGIN
    [n]INT p,q,s;
    INT sign, maxflips, sum, q1, flips, qq, t, sx, i,j;

    sign:=1;
    maxflips:=0;
    sum:=0;

    FOR i TO n DO
    p[i]:=i;
    q[i]:=i;
    s[i]:=i
    OD;

    DO
    q1:=p[1];
    IF q1/=1 THEN
    FOR i FROM 2 TO n DO q[i]:=p[i] OD;
    flips:=1;
    DO
    qq:=q[q1];
    IF qq=1 THEN
    sum+:=sign*flips;
    IF flips>maxflips THEN
    maxflips:=flips
    FI;
    exit1
    FI;

    q[q1]:=q1;
    IF q1>=4 THEN
    i:=2; j:=q1-1;
    WHILE
    t:=q[i]; q[i]:=q[j]; q[j]:=t;
    i+:=1;
    j-:=1;
    i<j
    DO SKIP OD
    FI;
    q1:=qq;
    flips+:=1
    OD;
    exit1: SKIP
    FI;

    IF sign=1 THEN
    t:=p[2]; p[2]:=p[1]; p[1]:=t;
    sign:=-1
    ELSE
    t:=p[2]; p[2]:=p[3]; p[3]:=t;
    sign:=1;
    FOR i FROM 3 TO n DO
    sx:=s[i];
    IF sx/=1 THEN s[i]:=sx-1; exit2 FI;
    IF i=n THEN
    res2:=maxflips;
    return
    FI;
    s[i]:=i;
    t:=p[1];

    FOR j TO i DO
    p[j]:=p[j+1]
    OD;

    p[i+1]:=t
    OD;
    exit2: SKIP
    FI
    OD;
    return:
    sum
    END;

    INT n=10;
    INT res;
    res:=fannkuch(n);

    print(("Fannkuch(",n,") = ",res," ", res2,newline)) -----------------------------------------------------------------------------------


    Output should be:

    Fannkuch( +10) = +73196 +38

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Charles Lindsey@21:1/5 to Bart on Tue Nov 30 20:45:45 2021
    On 29/11/2021 13:33, Bart wrote:
    On 29/11/2021 12:30, Charles Lindsey wrote:
    On 29/11/2021 00:53, Bart wrote:


    Here however are some benchmarks for 'fannkuch(10)':

                        Secs  Lang  Types    Exec

       A68G             50    A68   Static   Interp?   Win or WSL
       A68G -optimise   --    A68                     (WSL: Failed)
       qq -asmopt        2.2  Q     Dynamic  Interp   (mildly accelerated)
       gcc -O3           0.22 C     Static   Native
       mm -opt           0.22 M     Static   Native

    My stuff doesn't do too badly speedwise does it?

    I couldn't get a68g -optimise to even, even on WSL, a shame as that 50-second
    timing to interpret /static/ code is poor.

    $ cat hello.c
    #include <stdio.h>
    int main() {printf("Hello World!");}
    $ time gcc hello.c && ./a.out

    real    0m0.038s
    user    0m0.034s
    sys    0m0.004s
    Hello World!$

    $ cat hello.a68
    BEGIN print("Hello World!") END
    $ time a68g hello.a68
    Hello World!

    real    0m0.016s
    user    0m0.008s
    sys    0m0.008s
    $

    That's Ubuntu on a modern Intel 68-bit hardware.

    I'm still stuck on 64 bits unfortunately.


    Regarding the above benchmark, the A68 code for that is given below.

    So that I can properly fill in my table, I'd be grateful if you could run this
    on your machine both with and without -optimise, as that flag doesn't work on my
    WSL. Then I can estimate the runtime on my machine.

    (I assume it transpiles to C then runs that C. The errors are within the A68 headers.)

    -----------------------------------------------------------------------------------
    INT res2;

    PROC fannkuch=(INT n)INT: BEGIN
        [n]INT p,q,s;
        INT sign, maxflips, sum, q1, flips, qq, t, sx, i,j;

        sign:=1;
        maxflips:=0;
        sum:=0;

        FOR i TO n DO
            p[i]:=i;
            q[i]:=i;
            s[i]:=i
        OD;

        DO
            q1:=p[1];
            IF q1/=1 THEN
                FOR i FROM 2 TO n DO q[i]:=p[i] OD;
                flips:=1;
                DO
                    qq:=q[q1];
                    IF qq=1 THEN
                        sum+:=sign*flips;
                        IF flips>maxflips THEN
                                maxflips:=flips
                        FI;
                        exit1
                    FI;

                    q[q1]:=q1;
                    IF q1>=4 THEN
                        i:=2; j:=q1-1;
                        WHILE
                            t:=q[i]; q[i]:=q[j]; q[j]:=t;
                            i+:=1;
                            j-:=1;
                            i<j
                        DO SKIP OD
                    FI;
                    q1:=qq;
                    flips+:=1
                OD;
    exit1: SKIP
            FI;

            IF sign=1 THEN
                t:=p[2]; p[2]:=p[1]; p[1]:=t;
                sign:=-1
            ELSE
                t:=p[2]; p[2]:=p[3]; p[3]:=t;
                sign:=1;
                FOR i FROM 3 TO n DO
                    sx:=s[i];
                    IF sx/=1 THEN s[i]:=sx-1; exit2 FI;
                    IF i=n THEN
                        res2:=maxflips;
                        return
                    FI;
                    s[i]:=i;
                    t:=p[1];

                    FOR j TO i DO
                        p[j]:=p[j+1]
                    OD;

                    p[i+1]:=t
                OD;
    exit2: SKIP
            FI
        OD;
    return:
        sum
    END;

    INT n=10;
    INT res;
    res:=fannkuch(n);

    print(("Fannkuch(",n,") = ",res," ", res2,newline)) -----------------------------------------------------------------------------------


    Output should be:

    Fannkuch(        +10) =      +73196         +38

    $ time a68g --nooptimise fannkuch.a68
    Fannkuch( +10) = +73196 +38

    real 0m40.654s
    user 0m40.644s
    sys 0m0.005s
    $
    $
    $ time a68g --optimise fannkuch.a68
    Fannkuch( +10) = +73196 +38

    real 0m13.150s
    user 0m10.383s
    sys 0m0.119s
    $

    --
    Charles H. Lindsey ---------At my New Home, still doing my own thing------
    Tel: +44 161 488 1845 Web: https://www.clerew.man.ac.uk Email: chl@clerew.man.ac.uk Snail-mail: Apt 40, SK8 5BF, U.K.
    PGP: 2C15F1A9 Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to Charles Lindsey on Wed Dec 1 00:07:20 2021
    On 30/11/2021 20:45, Charles Lindsey wrote:
    On 29/11/2021 13:33, Bart wrote:
    On 29/11/2021 12:30, Charles Lindsey wrote:
    On 29/11/2021 00:53, Bart wrote:


    Here however are some benchmarks for 'fannkuch(10)':

                        Secs  Lang  Types    Exec

       A68G             50    A68   Static   Interp?   Win or WSL
       A68G -optimise   --    A68                     (WSL: Failed)
       qq -asmopt        2.2  Q     Dynamic  Interp   (mildly accelerated)
       gcc -O3           0.22 C     Static   Native
       mm -opt           0.22 M     Static   Native

    Regarding the above benchmark, the A68 code for that is given below.

    So that I can properly fill in my table, I'd be grateful if you could
    run this on your machine both with and without -optimise, as that flag
    doesn't work on my WSL. Then I can estimate the runtime on my machine.

    (I assume it transpiles to C then runs that C. The errors are within
    the A68 headers.)

    $ time a68g --nooptimise fannkuch.a68
    Fannkuch(        +10) =      +73196         +38

    real    0m40.654s
    user    0m40.644s
    sys    0m0.005s
    $
    $
    $ time a68g --optimise fannkuch.a68
    Fannkuch(        +10) =      +73196         +38

    real    0m13.150s
    user    0m10.383s
    sys    0m0.119s
    $


    Thanks. My table is now:

    Secs Lang Types Code

    A68G 50 A68 Static Interp? (WSL; Win=52)
    A68G -optimise 16 A68 Static ? (Estimated)
    CPython 12.3 Python Dynamic (WSL; Win=19)
    qq -asmopt 2.2 Q Dynamic Interp
    PyPy 0.65 Python Dynamic JIT
    LuaJIT 0.45 Lua Dynamic JIT
    gcc -O3 0.22 C Static Native
    mm -opt 0.22 M Static Native

    The A68G timing suggests it is not generating C code for directly
    executing the program, but just generating C code that does the
    intepretation more efficently.

    That's OK, it just means A68G is not a compiler, but it is still
    somewhat sluggish given that the language is statically typed.

    I've added timings for CPython plus two JIT programs. Although these do
    very well, JIT timings tend to be misleading. They're good at small
    benchmarks and but not so good at real programs.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From antispam@math.uni.wroc.pl@21:1/5 to Bart on Sat Dec 4 09:32:53 2021
    Bart <bc@freeuk.com> wrote:
    On 22/11/2021 20:34, Andy Walker wrote:
    On 21/11/2021 19:45, Bart wrote:
    On 21/11/2021 18:55, James Harris wrote:
    In sum, perhaps it would be better to build common operators into
    type libraries and not make them part of the language itself.

    ????I agree with Charles!

    You need a more complicated implementation, and one that implements
    more advanced optimisations, which are going to be slower, to get
    code of the same standard as you can get easily when things are
    built-in.

    ????Doesn't follow.? The library can have access to parts of the
    language that are not available to ordinary users, inc [eg] escapes
    into assembler, inlining, access to the compiler's internal structures, ..., and can be optimised when the compiler is built.

    This is my point: you need:

    * Inline assemby
    * Inlining
    * User-defined operator overloads
    * The equivalent of C++'s constexpr
    * A gcc-class optimiser
    ...

    I use a language like this. You do need inlining and it is useful
    only due to overloading (_not_ limited to operators). However,
    other position on your list above are not needed. More precisely,
    there is small core language which implements basic types and
    corresponding operators. Basic library types simply ofer "view"
    of basic types and operators from core language. Due to inlining
    there is no runtime loss of efficiency compared to using core
    language directly. But there are also fancy library types
    which are infeasible to have as part of core language.

    Too many advanced compilers features, just to keep the language 'small'.

    You did not get it: purpose is to have managable implementation
    of _very large_ language. Low-level error prone parts are
    small. Library types can be written by ordinary application
    programmers.

    Overloading can be implemented with quite small extra code.
    Inlining is also easy to implement if you have internal
    representation of larger code fragments (as opposed to
    generating machine code directly after recognizing given
    syntactic construct). Both put limits on maximal possible
    compiler speed: with overloading you have more complex
    symbol table and need more lookups. Intermediate representation
    takes time to build (on top of time needed to generate code).
    Also, inlining allows large object code from small sources,
    so if you measure time per source line compier speed will
    likely be worse than speed of compiler that forces you to
    write each source line separately. Still, compiler can
    be reasonably fast.

    Anyway, increase in compiler complexity is relatively
    small. Even for small languages in total balance
    approach using overloading may give you saving in
    effort spent on implementation. Just as a simple
    example let me mention Pascal. In Pascal there is
    a buch of "builtin" operations (mostly on files).
    Most of them are too complex to do by inline code
    so practical compilers have runtime library implementing
    them. But compiler still needs to parse those
    "builtins" and convert them to runtime calls.
    In a sense this is trivial task, but with realistic
    compiler structure you need several lines of compiler
    code to generate corresponding runtime call. Bunch
    of older languages like Cobol, Fortran, PL/I had
    special "builtin" operations, some of them much
    more than Pascal.

    Let me add that there are more radical approaches:
    there are languages in which "users" can redefine
    scanner and/or parser. So users can add on the
    fly new operators and define their meaning. As
    a possible example consider you 'stringimage'
    extention: in extensible language user simply
    declares that 'stringimage' is a new keyword that
    should invoke its handler. Handler looks at
    following program text, extracts filename,
    reads the file and tells compiler to use resulting
    string as a string constant. The whole thing is
    probably 10 lines for handler and a single
    line to register new keyword. In language that
    allows syntax extentions instead of overloading
    you can just define new operators, so overloading
    strictly speaking is not necessary. Similarly
    you have some way for syntax handlers to generate
    code, which is similar to inlining, but stricly
    speaking inlining is not needed.

    --
    Waldek Hebisch

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to antispam@math.uni.wroc.pl on Sat Dec 4 11:51:55 2021
    On 04/12/2021 09:32, antispam@math.uni.wroc.pl wrote:
    Bart <bc@freeuk.com> wrote:

    * Inline assemby
    * Inlining
    * User-defined operator overloads
    * The equivalent of C++'s constexpr
    * A gcc-class optimiser
    ...

    I use a language like this. You do need inlining and it is useful
    only due to overloading (_not_ limited to operators). However,
    other position on your list above are not needed. More precisely,
    there is small core language which implements basic types and
    corresponding operators. Basic library types simply ofer "view"
    of basic types and operators from core language. Due to inlining
    there is no runtime loss of efficiency compared to using core
    language directly. But there are also fancy library types
    which are infeasible to have as part of core language.

    Too many advanced compilers features, just to keep the language 'small'.

    You did not get it: purpose is to have managable implementation
    of _very large_ language.

    Source code for gcc was 45,000 source files last I looked. I don't even
    know if that's just for C, or for C++ too.

    That's a thousand times a larger scale than what I do. So I guess I'm in
    less need of such methods.

    Low-level error prone parts are
    small. Library types can be written by ordinary application
    programmers.

    Overloading can be implemented with quite small extra code.
    Inlining is also easy to implement if you have internal
    representation of larger code fragments (as opposed to
    generating machine code directly after recognizing given
    syntactic construct). Both put limits on maximal possible
    compiler speed: with overloading you have more complex
    symbol table and need more lookups. Intermediate representation
    takes time to build (on top of time needed to generate code).
    Also, inlining allows large object code from small sources,
    so if you measure time per source line compier speed will
    likely be worse than speed of compiler that forces you to
    write each source line separately. Still, compiler can
    be reasonably fast.

    Anyway, increase in compiler complexity is relatively
    small. Even for small languages in total balance
    approach using overloading may give you saving in
    effort spent on implementation.

    I could do overloading. I have an approach I've developed, I just
    haven't got round to it (there are a dozen more things more useful ahead
    of it, which I haven't done yet either!).

    But, this purely user-defined overloading of existing operators, with user-defined types, or applying extra operations to standard types.

    It is not for implementing the core language, which I like to handle by dedicated code in the compiler, keeping it tight, fast and allowing such enhancements as above to be optional (and on a waiting list).

    For example, there is already a backend IL instruction, MUL.I64, to do
    signed multiply. The main part, MUL, is attached to the binary * op
    during parsing. The .I64 part is assigned two passes further on.

    With only mechanisms with user-defined overloads, it's a lot harder to
    make those associations. For one thing, those IL instructions are not
    exposed in the language.

    Just as a simple
    example let me mention Pascal. In Pascal there is
    a buch of "builtin" operations (mostly on files).
    Most of them are too complex to do by inline code
    so practical compilers have runtime library implementing
    them. But compiler still needs to parse those
    "builtins" and convert them to runtime calls.
    In a sense this is trivial task, but with realistic
    compiler structure you need several lines of compiler
    code to generate corresponding runtime call.

    I have a similar set of operations for I/O. They are implemented as a
    set of function calls, designed to be called implicitly from code, so
    have funny names. So:

    print a,b

    is equivalent to:

    m$print_startcon()
    m$print_i64_nf(a) # when a,b are i64
    m$print_i64_nf(b)
    m$print_end

    (The bracketing between start/end calls allows separator logic within
    the whole print statement; and allows calls to functions within the
    print list that might themselves use print, to files etc.)

    So, print is already implemented in user code. I just do not see the
    advantage of replacing 110 or 50 lines of code within the compiler's
    code generator for print, with a likely more complicated solution.

    (110 lines of code for static language which needs to generate different
    calls for various types; 50 lines for dynamic language.)

    Bunch
    of older languages like Cobol, Fortran, PL/I had
    special "builtin" operations, some of them much
    more than Pascal.

    Let me add that there are more radical approaches:
    there are languages in which "users" can redefine
    scanner and/or parser. So users can add on the
    fly new operators and define their meaning. As
    a possible example consider you 'stringimage'
    extention:

    Do you mean my 'strinclude' which incorporates a text file (or
    text/binary file in dynamic code), as string data within the program?

    in extensible language user simply
    declares that 'stringimage' is a new keyword that
    should invoke its handler.

    That itself is not so easy. Keywords are built-in, scopeless, and known program-wide. With a user-define keyword, could two modules define that
    keyword differently? Could another part of the program use that same
    name as an ordinary variable?

    If so then it's not known as a new keyword after the next pass after
    parsing. So it will need to have a suitable 'shape' to be parsed as /something/. (My language also has out-of-order definitions.)


    Handler looks at
    following program text, extracts filename,
    reads the file and tells compiler to use resulting
    string as a string constant.

    How? String literals are detected fairly early, they have a certain AST
    node, length etc. At what pass will this be done?

    The whole thing is
    probably 10 lines for handler and a single
    line to register new keyword.

    In language that
    allows syntax extentions instead of overloading
    you can just define new operators, so overloading
    strictly speaking is not necessary. Similarly
    you have some way for syntax handlers to generate
    code, which is similar to inlining, but stricly
    speaking inlining is not needed.

    Do you have an example of such a language where implementing my
    'strimport' is 10 lines of code? For example, C++. Or the one you
    mentioned. If so, how does it look?

    In my static compiler, 'strinclude' needs a handful of lines to define
    enums and names, this these two lines in the parser:

    when kstrincludesym then
    lex()
    p:=createunit1(j_strinclude,readterm2())

    Followed by this code later on to create a suitable string literal:

    proc tx_strinclude(unit p,a)=
    int fileno

    tpass(a)
    if a.tag<>j_const or not a.isastring then
    txerror("strincl/not string")
    fi

    fileno := moduletable[p.moduleno].fileno
    fileno := getfile(a.svalue, path:sourcefilepaths[fileno],
    issupport:1)

    a.svalue := sourcefiletext[fileno]
    a.slength := sourcefilesizes[fileno]

    deleteunit(p,a) # replace strincl node with new string
    end

    The fiddly bit is deciding where to look for that file (unless an
    absolute path is given, it will be relative to this module).

    The file involved counts as a support file, which means that when an application is combined into an amalgamated file, such files are
    included. (But need to be marked because sometimes, a support file will
    have the same name as an actual source module.)

    Also, when reading from an amalgamated file, it will look inside that
    for the support file.

    I think that 20 lines of code for 'strinclude' inside a compiler is
    reasonable.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From antispam@math.uni.wroc.pl@21:1/5 to Andy Walker on Sat Dec 4 11:16:22 2021
    Andy Walker <anw@cuboid.co.uk> wrote:
    work. AFAIK, no language I have ever used expects "A[i;j;k]" to
    be an array element or "F(a;b;c)" to be the call of a function
    with three parameters.

    In a language that I use:

    (28) -> sin(1;2;3.0)

    (28) 0.1411200080_598672221
    Type: Float (29) -> v := new(1, 1)$Vector(Integer)

    (29) [1]
    Type: Vector(Integer) (30) -> v(3;2;1)

    (30) 1
    Type: PositiveInteger

    In the first case 3 expressions are computed, but only last one
    provided value of resulting compound expression, which in turn
    is used as argument to 'sin'. Similarly 'v(3;2;1)' uses last
    value as array index.

    Far from clear. Simpler to describe? People mocked the 236
    pages of the RR [most of it comments, examples or the system
    library] when it first came out -- until they found that proper
    descriptions of other languages were at least as long, and
    nowhere near as precise.

    I may have already mentioned it: Extended Pascal standard
    (ISO 10206) has 230 pages (so almost as long). It describes
    rather large language. I found description to be rather
    precise: it uses _slightly_ less formal style than Algol
    report, but terms are given precise definitions. A little
    sample:

    : A module A shall be designated as supplying a module B if A
    : supplies the module-heading or moduleblock of B. A module A
    : shall be designated as supplying a main-program-block if the
    : module supplies the block of the main-program-block. A module
    : A shall be designated as supplying a module-heading, module-block,
    : or block, B, either if B contains an applied occurrence of an
    : interface-identifier having a defining occurrence contained by
    : the module-heading of A, or if A supplies a module that supplies
    : B.
    :
    : No module shall supply its module-heading.

    This may be not very friendly to lay persons, in more plain
    language it says that while cyclic imports of module bodies
    are allowed, cyclic imports of interface parts ('module-heading'
    in standarese) are not allowed. To understand it you need
    to read introductory part which explains that semantics is
    defined on parse tree and grammar rules have labels allowing
    unambigous reference to parts of parse tree. In particular
    'module-heading' is part of module defined by syntax rules.
    Similarly 'applied occurrence' and 'defining occurrence' are
    precisly defined earlier.

    This is 22 years after Algol 68 report, but at least in
    presentation aspect they did _much_ better work than
    Algol group. In particular, words and terms they use
    actually make sense, so without firm grasp of the whole
    standard you can get resonable approximation to meaning
    just taking meaning of words from everyday programming
    practice. This allows you to read definitions and refine
    your understanding of meaning. After few iterations
    you converge and get the whole thing. Not so with
    Algol 68...

    --
    Waldek Hebisch

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Andy Walker@21:1/5 to Bart on Sat Dec 4 21:38:54 2021
    On 01/12/2021 00:07, Bart wrote:
    [various benchmark timings:]
    The A68G timing suggests it is not generating C code for directly
    executing the program, but just generating C code that does the
    intepretation more efficently.

    AIUI, A68G doesn't "generate C code" at all, unless you go
    down the "--optimise" route [in which case you might do better with
    an actual compiler, such as "algol68toc" or Charles's "a68s"], but
    builds a tree and then "walks" it, doing a /lot/ of checking as it
    goes.

    Anyway, this got me looking at the supplied "fannkuch"
    program, which looks as though it's imported from some other
    language, as the Algol is very unidiomatic. Simply tidying
    the code speeds it up by around 10%. I saved a further ~15%
    by replacing "sign" by a Boolean, replacing explicit array
    elements by declaring and using "ref int"s to point to them
    and similar, copying arrays in toto rather than element by
    element, and so on. That got the time down from just under
    40s to just under 30s. I didn't touch the actual algorithm.
    Testing also uncovered a buglet; the [original] code fails
    if "n < 3"; easy enough to correct, but Algol makes it easier
    to spot in the first place.

    This got me testing different versions of A68G, as
    some of them have the garbage-collection bug that causes
    repeated array copying to fill up store, necessitating a
    "sweep heap" from time to time. The results weren't quite
    what I expected. Fastest was not the current A68G [2.8.4],
    but 1.16 [the most recent Mk 1 on my computer, April 2009],
    which clocked 21s. Other versions of Mk 1 are mostly a
    little slower than 2.8.4, though a couple are a little faster.
    Slowest of all was 2.0.3, which weighed in at 64s [Bart's
    original at 88s]. The timings seem to bear little relation
    to the changelog, so I have no explanation for either the
    factor 3 in speed between 1.16 and 2.0.3, nor the factor 2
    improvement since then. We'd have to ask Marcel!

    I haven't noticed such changes in my own programs,
    and suspect that the "instruction mix" in "fannkuch" is
    completely different from the norm [eg, only a couple of
    procedure calls, almost no actual calculation yet tens of
    millions of array accesses, ...]. If I can work up the
    enthusiasm [unlikely!], I might try running the Whetstone
    benchmark through the different versions.

    --
    Andy Walker, Nottingham.
    Andy's music pages: www.cuboid.me.uk/andy/Music
    Composer of the day: www.cuboid.me.uk/andy/Music/Composers/Heller

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to Andy Walker on Sun Dec 5 12:22:06 2021
    On 04/12/2021 21:38, Andy Walker wrote:
    On 01/12/2021 00:07, Bart wrote:
    [various benchmark timings:]
    The A68G timing suggests it is not generating C code for directly
    executing the program, but just generating C code that does the
    intepretation more efficently.

        AIUI, A68G doesn't "generate C code" at all, unless you go
    down the "--optimise" route

    That's what --optimise does, but I assumed it was generating direct C code.


    [in which case you might do better with
    an actual compiler, such as "algol68toc"

    I tried to run that but it's a 16-bit executable that no long runs under
    Win64.

    or Charles's "a68s"], but
    builds a tree and then "walks" it, doing a /lot/ of checking as it
    goes.

        Anyway, this got me looking at the supplied "fannkuch"
    program, which looks as though it's imported from some other
    language, as the Algol is very unidiomatic.  Simply tidying
    the code speeds it up by around 10%.  I saved a further ~15%
    by replacing "sign" by a Boolean, replacing explicit array
    elements by declaring and using "ref int"s to point to them
    and similar, copying arrays in toto rather than element by
    element, and so on.  That got the time down from just under
    40s to just under 30s.  I didn't touch the actual algorithm.

    All my fannkuch versions are derived from an original in Lua.

    I'm using it as a benchmark for testing compilers (with up to 10,000
    copies in one file).

    Partly for that reason, they need to comprise largely the same code.

    The actual performance is less important, but it is useful for comparing
    the results of trade-offs between compilers and compile options. So
    again, they need to execute the same algorithm.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Andy Walker@21:1/5 to All on Tue Dec 7 15:52:30 2021
    On 05/12/2021 12:22, Bart wrote:
    [I wrote:]
    ... [in which case you might do better with
    an actual compiler, such as "algol68toc"
    I tried to run that but it's a 16-bit executable that no long runs
    under Win64.

    Ah, unlucky. If you're that interested, perhaps you need
    to download the source and build a 64-bit executable.

    [...]
    All my fannkuch versions are derived from an original in Lua.
    I'm using it as a benchmark for testing compilers [...].
    Partly for that reason, they need to comprise largely the same code.

    Well, that's a rather grey area. Is "a := b" largely
    the same code as "for i to n do a[i] := b[i] od"? The former
    is simply not possible in many languages, but it [or a standard
    "copyarray" procedure] is likely to be implemented significantly
    more efficiently than the loop [without sufficiently aggressive
    optimisation]. Similarly, for the use of a built-in Boolean
    type vs an integer to be tested for sign. Similarly, when there
    are issues of "return;" vs "run off the end of the procedure"
    or of "loop and a half" constructs. As noted in my PP, simply
    tidying the "fannkuch" code into idiomatic Algol saved around
    25% of the time [without changing the underlying algorithm].
    No doubt if you started with some Algol and translated that
    directly into C or Lua, you would find that you could save a
    similar amount by switching some of the resulting ugly C/Lua
    into more idiomatic code.

    [Similarly, and relevant to a nearby thread, the
    difference between 0-based and 1-based arrays is more than
    is often supposed. It affects which comparison operator you
    use, and whether you pre- or post-increment or decrement, and
    then you find yourself pulling the code around to suit.]

    --
    Andy Walker, Nottingham.
    Andy's music pages: www.cuboid.me.uk/andy/Music
    Composer of the day: www.cuboid.me.uk/andy/Music/Composers/Bull

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Andy Walker on Tue Dec 7 18:41:16 2021
    On 07/12/2021 16:52, Andy Walker wrote:
    On 05/12/2021 12:22, Bart wrote:
    [I wrote:]
    ... [in which case you might do better with
    an actual compiler, such as "algol68toc"
    I tried to run that but it's a 16-bit executable that no long runs
    under Win64.

        Ah, unlucky.  If you're that interested, perhaps you need
    to download the source and build a 64-bit executable.


    You could also try Wine on Linux if it is a Windows 3.x program, or
    DOSBox if it is a DOS program.

    You could even use VirtualBox and start a FreeDOS virtual machine, and
    enjoy the nostalgia :-)

    (I'm not guaranteeing these will work, but they might be worth a shot.)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dmitry A. Kazakov@21:1/5 to David Brown on Tue Dec 7 20:39:18 2021
    On 2021-12-07 18:41, David Brown wrote:

    You could even use VirtualBox and start a FreeDOS virtual machine, and
    enjoy the nostalgia :-)

    I managed to run Windows 98 in VirtualBox. It requires special drivers
    for the display, though.

    Firefox 2.0.0.20 is such a fun. Comparing to the latest versions it is blindingly fast. OK, it does not show most of the pages, but should I
    visit them anyway? (:-))

    --
    Regards,
    Dmitry A. Kazakov
    http://www.dmitry-kazakov.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to Andy Walker on Wed Dec 8 00:01:25 2021
    On 07/12/2021 15:52, Andy Walker wrote:
    On 05/12/2021 12:22, Bart wrote:
    [I wrote:]
    ... [in which case you might do better with
    an actual compiler, such as "algol68toc"
    I tried to run that but it's a 16-bit executable that no long runs
    under Win64.

        Ah, unlucky.  If you're that interested, perhaps you need
    to download the source and build a 64-bit executable.

    [...]
    All my fannkuch versions are derived from an original in Lua.
    I'm using it as a benchmark for testing compilers [...].
    Partly for that reason, they need to comprise largely the same code.

        Well, that's a rather grey area.  Is "a := b" largely
    the same code as "for i to n do a[i] := b[i] od"?

    I would allow that, if the copying is by value (making a distinct copy
    not sharing), which is what the element-by-element does, except there is
    no array assignment in the program, only of a slice, which is little
    more advanced.

    The original benchmark site allowed any mix of algorithms provided they
    still gave the same results, although blatant cheating (just printing a
    string with the expected results) would be forbidden.

    So if you wanted to compare implementations, you were out of luck.

    (A reminder that my own tests were to do with compilation, which needed
    a substantial source file.)

      The former
    is simply not possible in many languages, but it [or a standard
    "copyarray" procedure] is likely to be implemented significantly
    more efficiently than the loop [without sufficiently aggressive optimisation].  Similarly, for the use of a built-in Boolean
    type vs an integer to be tested for sign.  Similarly, when there
    are issues of "return;" vs "run off the end of the procedure"
    or of "loop and a half" constructs.  As noted in my PP, simply
    tidying the "fannkuch" code into idiomatic Algol saved around
    25% of the time [without changing the underlying algorithm].
    No doubt if you started with some Algol and translated that
    directly into C or Lua, you would find that you could save a
    similar amount by switching some of the resulting ugly C/Lua
    into more idiomatic code.

    Yes, these are useful tweaks, but there's such a huge gap between A68G
    and the next language, that the basic problem still exists: it is slow!

    The only comparable language I have installed is Seed7, also statically
    typed, also usually interpreted. That one takes 44 seconds for this
    task, compared with 52 seconds for A68G (on Windows; it's a bit faster
    on WSL).

    However Seed7 also has a transpiler to C, one that works and works transparently to produce a .exe file. Then the runtime reduces to 0.7
    seconds, which is more like it.

    (Note that the fastest implementation is still 3 times faster than this.)

    A68 really needs someone to produce an equivalent transpiler, an update
    of that algol68toc program.

    (If I liked it more, and understood it a LOT more, then I'd have a go
    myself.)

        [Similarly, and relevant to a nearby thread, the
    difference between 0-based and 1-based arrays is more than
    is often supposed.  It affects which comparison operator you
    use, and whether you pre- or post-increment or decrement, and
    then you find yourself pulling the code around to suit.]

    Yes, the original algorithm was 1-based, and all versions I made are
    1-based even when the language was 0-based (by allocating an extra
    element and ignoring element 0).

    I would expect an optimising compiler to iron out such minor differences.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Andy Walker@21:1/5 to Bart on Wed Dec 8 21:40:36 2021
    On 08/12/2021 00:01, Bart wrote:
    Yes, these are useful tweaks, but there's such a huge gap between
    A68G and the next language, that the basic problem still exists: it
    is slow!

    Yes, it's an interpreter. But not as slow as you claim,
    even for this very artificial test, as long as the code is written
    [or transcribed from Lua/C/whatever] by someone familiar with
    Algol [without changing the underlying algorithm, before you ask].
    In other tests, it benchmarks on my [ancient] PC at around 200x
    faster than the ICL 1906A [comparable with the fastest mainframes
    around in the mid-'70s] by the Whetstone benchmark -- a decent
    approximation to useful scientific computing -- even without the
    optimiser. That's plenty fast enough for all the real computing
    I do, where development time is much more important than the few
    extra seconds needed for live runs [esp interactive ones, where
    the real bottleneck is my typing]. I'm not [these days -- I used
    to do a lot of benchmarking in the last century!] as obsessed by
    speed as some people.

    Meanwhile, my I suggest a little test for you? Instead
    of "fannkuch(10)", please get your collection of programs and
    compilers to produce "fannkuch(2)". Any compilers that return
    "2, 2" [without comment and from code equivalent to what you
    supplied] should be kept well away from any serious computation.

    [...]
    A68 really needs someone to produce an equivalent transpiler, an
    update of that algol68toc program.
    (If I liked it more, and understood it a LOT more, then I'd have a go myself.)

    Both A68G and "algol68toc" are available as source, so the
    hard work is already done. But if you don't understand them, you're
    not well placed either to edit that source or to complain about the
    [perceived] deficiencies thereof.

    [...]
    Yes, the original algorithm was 1-based, and all versions I made are
    1-based even when the language was 0-based (by allocating an extra
    element and ignoring element 0).
    I would expect an optimising compiler to iron out such minor
    differences.

    I suspect you're being optimistic if you expect even a good
    optimiser to detect that [in a reasonably large and complicated
    program] element 0 is never accessed, or systemically to reduce all
    indexes by 1. But you never know. Small differences in code
    accumulate as dozens of programming decisions are taken on the
    basis of where arrays start and finish.

    --
    Andy Walker, Nottingham.
    Andy's music pages: www.cuboid.me.uk/andy/Music
    Composer of the day: www.cuboid.me.uk/andy/Music/Composers/Handel

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to Bart on Wed Dec 8 21:29:50 2021
    On 24/11/2021 18:19, Bart wrote:
    On 24/11/2021 00:04, Andy Walker wrote:

         If you're going to have these things, you need them anyway.
    There is no clear-cut boundary between what compilers do and what
    goes into a standard library.  It doesn't matter whether it's the
    compiler that decides that "A + B" turns into "load A, load B, add"
    or the library that decides that.

    It matters very much. My static language is self-hosted, and can build
    itself entirely from source fast enough that I could do so ten times
    every second if I was so inclined.

    To me that is important; not that I need to do that 600 times a minute,
    but it means an effectively zero edit-build-run cycle for similar-sized applications.

    That would be harder to achieve using the wrong language choices; it
    would also take longer, and the result would build apps more slowly.

    A discussion on Reddit happened to mention that someone's 20Kloc C++
    language project shouldn't take more than '5 minutes' on a modern
    machine using -O3.

    Why so slow? Apparently it was all to do with templates.

    Someone else mentioned a 25Kloc compiler taking 5 minutes, with 8
    cores/16 threads.

    And now someone had a 100Kloc project using a Sun C++ compiler than took
    *16 hours* to build.

    Effective compile-speeds of 65lps, 80 lps (with 8 cores!) and ... 1.7
    lines per second.

    When I said the wrong language design choices could slow things down, I
    didn't even know it could be by that much.

    I think I'm happy with my own choices at the minute (a 40Kloc project
    takes 0.1 seconds on one core).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to Andy Walker on Wed Dec 8 22:49:28 2021
    On 08/12/2021 21:40, Andy Walker wrote:
    On 08/12/2021 00:01, Bart wrote:
    Yes, these are useful tweaks, but there's such a huge gap between
    A68G and the next language, that the basic problem still exists: it
    is slow!

        Meanwhile, my I suggest a little test for you?  Instead
    of "fannkuch(10)", please get your collection of programs and
    compilers to produce "fannkuch(2)".  Any compilers that return
    "2, 2" [without comment and from code equivalent to what you
    supplied] should be kept well away from any serious computation.

    What should it produce for fannkuch(2)?

    I downloaded versions in 4 languages from the original benchmarks site.

    Three of them (Lua, Python, Julia) go wrong in giving a program error.

    The fourth, in C, completes, but fannkuch(2) returns -1. I don't know if
    that is correct or not, but it sounds like it might be undefined.

    The presence of these lines:

    swap(p[2],p[3])
    signx:=1
    for i:=3 to n do

    plus perm1[2] in the 0-based Python, and p[3] in Julia, suggests n needs
    to be at least 3.

    I would expect an optimising compiler to iron out such minor
    differences.

        I suspect you're being optimistic if you expect even a good optimiser to detect that [in a reasonably large and complicated
    program] element 0 is never accessed, or systemically to reduce all
    indexes by 1.  But you never know.

    Element 0 can stay there; it's not doing any harm with just 3 arrays.

    Being 1-based may simply mean an extra offset on some address modes; is
    that what you think is causing a slow-down? With local arrays, that
    offset should just be absorbed into the stack offset of the array.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dmitry A. Kazakov@21:1/5 to Bart on Thu Dec 9 08:46:48 2021
    On 2021-12-08 22:29, Bart wrote:

    Why so slow? Apparently it was all to do with templates.

    Likely, GCC is very poor with them. But your language has nothing to
    express polymorphism in either form. So are not in position to criticize.

    When I said the wrong language design choices could slow things down, I didn't even know it could be by that much.

    Yes, a modern medium-sized project is extremely large. In a language
    without polymorphism it would take years to design and the code would be tenfold bigger and complex.

    I think I'm happy with my own choices at the minute (a 40Kloc project
    takes 0.1 seconds on one core).

    It is a false equivalence. You simply cannot write equivalent code of
    same size.

    --
    Regards,
    Dmitry A. Kazakov
    http://www.dmitry-kazakov.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Dmitry A. Kazakov on Thu Dec 9 09:28:18 2021
    On 09/12/2021 08:46, Dmitry A. Kazakov wrote:
    On 2021-12-08 22:29, Bart wrote:

    Why so slow? Apparently it was all to do with templates.

    Likely, GCC is very poor with them. But your language has nothing to
    express polymorphism in either form. So are not in position to criticize.


    In general, gcc is fast at compiling templates. But there is always
    scope for improvement, and gcc has definitely been getting better there
    in recent versions. (More importantly, it has also been getting better
    at error messages from heavily templated code.)

    But with templates, there is no limit to the complexity you can have.
    Indeed, templates can be seen as a Turing-complete programming language
    at compile-time. (Individual compilers, of course, have limits to
    tempate recursion depth and the like, usually as tunable parameters for
    people doing weird stuff.)

    Improvements and enhancements to C++ have been added to combat this,
    including constexpr and consteval functions to get compile-time
    evaluation without inefficient complicated recursive template
    meta-programming, and concepts to make requirements clearer, error
    messages more useful and template compilation faster. Then there are
    modules which can hugely improve compilation of large template libraries.

    None of the new features limits the use of templates, however - they
    just offer alternatives. So it is quite easy to make code that might
    /look/ like 20 Kloc, but is effectively compiling millions of lines.
    (To get the same results, the alternative in statically compiled
    languages that don't have templates would, of course, be macros, pre-processors, generators, or manual coding also resulting in millions
    of lines.)

    Then there is the issue of optimisation. Asking for -O3 is telling the compiler to try /really/ hard to look for optimisation opportunities,
    and not to worry about the time or ram usage in doing so.
    Inter-procedural optimisations scale approximately quadratically with
    the number of functions in a compilation unit. Some others scale even
    worse in the size of functions, and use of templates can allow a lot of inlining and therefore very big functions. (There are legitimate
    use-cases for huge functions, and work in progress in gcc for handling
    them better.)

    All these things can add up to faster results, but obviously there are diminishing returns for the effort. It is rare that it makes sense to
    use -O3 rather than -O2. Think of it like using liquid nitrogen to cool
    your cpu rather than water - is the extra 10% overclocking really worth
    it? Most people would usually only do a massively templated -O3 compile
    for testing, benchmarking or bragging rights, not because they tried -O2
    and measured that the resulting binaries are too slow for their needs.


    When I said the wrong language design choices could slow things down,
    I didn't even know it could be by that much.

    Yes, a modern medium-sized project is extremely large. In a language
    without polymorphism it would take years to design and the code would be tenfold bigger and complex.

    Exactly. Developer time is more important than computer time.


    I think I'm happy with my own choices at the minute (a 40Kloc project
    takes 0.1 seconds on one core).

    It is a false equivalence. You simply cannot write equivalent code of
    same size.


    Indeed.

    You could also look at these example and be impressed with how much functionality you can get from a mere 20 Kloc of code with C++, where
    the same useful work in C or BOL (Bart's Own Language) might take 50
    times as much.

    Statistically speaking, based on real studies, error rates in software
    are proportional to the number of lines of code, and relatively
    independent of the language used. A language that lets you write fewer
    lines of code because you can make more general code and libraries or
    re-use more existing (tested) code for other sources, will let you write
    code with fewer errors - all other things being equal.

    (This is, of course, a big reason why languages like Python are popular.)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to David Brown on Thu Dec 9 11:01:17 2021
    On 09/12/2021 08:28, David Brown wrote:
    On 09/12/2021 08:46, Dmitry A. Kazakov wrote:

    Yes, a modern medium-sized project is extremely large. In a language
    without polymorphism it would take years to design and the code would be
    tenfold bigger and complex.

    Exactly. Developer time is more important than computer time.

    Exactly. That is why having developers twiddle their thumbs for five
    minutes every time they compile is a joke.

    It is a false equivalence. You simply cannot write equivalent code of
    same size.


    Indeed.

    You could also look at these example and be impressed with how much functionality you can get from a mere 20 Kloc of code with C++, where
    the same useful work in C or BOL (Bart's Own Language) might take 50
    times as much.

    (This is, of course, a big reason why languages like Python are popular.)

    Scripting languages have all the benefits of C++ and then some. Except
    for execution speed.

    But the C++ in these examples is so slow to build, that a scripting
    language is likely to be faster.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to Dmitry A. Kazakov on Thu Dec 9 10:50:10 2021
    On 09/12/2021 07:46, Dmitry A. Kazakov wrote:
    On 2021-12-08 22:29, Bart wrote:

    Why so slow? Apparently it was all to do with templates.

    Likely, GCC is very poor with them. But your language has nothing to
    express polymorphism in either form. So are not in position to criticize.

    I expect my dynamic language can do polymorphism. That would would
    between 1 and 2 magnitudes slower, but it would still be 1-2 magnitudes
    faster than that C++ compiler!

    But is it really that necessary inside a compiler?


    When I said the wrong language design choices could slow things down,
    I didn't even know it could be by that much.

    Yes, a modern medium-sized project is extremely large. In a language
    without polymorphism it would take years to design and the code would be tenfold bigger and complex.

    If I did such a project (say 100-1000Kloc) I would probably use dynamic scripting code for most of it.

    Since most code is not speed-critical. Especially when you just want a
    test run.

    I think I'm happy with my own choices at the minute (a 40Kloc project
    takes 0.1 seconds on one core).

    It is a false equivalence. You simply cannot write equivalent code of
    same size.

    We don't know what are the capabilities of the language in question for
    which that compiler, written in C++, took so long to build.

    You need ask a different question: do you really expect a compiler for
    any kind of language, which is only 20-25,000 lines of user-code, to
    take MINUTES to build on today's machines? (And on multiple cores!)
    Somebody is doing something wrong.

    Yes, the C++ compiler could also be more inefficient than need be, but
    people can only use the C++ compilers that exist.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dmitry A. Kazakov@21:1/5 to Bart on Thu Dec 9 13:00:04 2021
    On 2021-12-09 11:50, Bart wrote:
    On 09/12/2021 07:46, Dmitry A. Kazakov wrote:
    On 2021-12-08 22:29, Bart wrote:

    Why so slow? Apparently it was all to do with templates.

    Likely, GCC is very poor with them. But your language has nothing to
    express polymorphism in either form. So are not in position to criticize.

    I expect my dynamic language can do polymorphism.

    I expect it does not, because...

    But is it really that necessary inside a compiler?

    you obviously do not understand it.

    If I did such a project (say 100-1000Kloc) I would probably use dynamic scripting code for most of it.

    Again a failure to see actual challenges of medium to large software collaboration projects.

    Since most code is not speed-critical. Especially when you just want a
    test run.

    Yep, a test run of a production line automation system, a banking
    system, a luggage delivery system written in BOL, would love to see it,
    from a safe distance of course...

    You need ask a different question: do you really expect a compiler for
    any kind of language, which is only 20-25,000 lines of user-code, to
    take MINUTES to build on today's machines? (And on multiple cores!)
    Somebody is doing something wrong.

    Yes, lack of experience/education on modern language-assisted software
    design.

    Yes, the C++ compiler could also be more inefficient than need be, but
    people can only use the C++ compilers that exist.

    C++ has a lot of problems, but not lack of understanding and
    anticipating software developing issues. At least they tried and have
    achieved far more than many.

    On your part we see an active denial of very existence of such problems.
    Things you see important are just silly from the point of view of a
    practicing software developer. Things you reject are key tools for
    managing complexity of modern software.

    --
    Regards,
    Dmitry A. Kazakov
    http://www.dmitry-kazakov.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Bart on Thu Dec 9 14:16:08 2021
    On 09/12/2021 12:01, Bart wrote:
    On 09/12/2021 08:28, David Brown wrote:
    On 09/12/2021 08:46, Dmitry A. Kazakov wrote:

    Yes, a modern medium-sized project is extremely large. In a language
    without polymorphism it would take years to design and the code would be >>> tenfold bigger and complex.

    Exactly.  Developer time is more important than computer time.

    Exactly. That is why having developers twiddle their thumbs for five
    minutes every time they compile is a joke.

    When the alternative is spending weeks or months writing the code by
    hand rather than using templates?


    It is a false equivalence. You simply cannot write equivalent code of
    same size.


    Indeed.

    You could also look at these example and be impressed with how much
    functionality you can get from a mere 20 Kloc of code with C++, where
    the same useful work in C or BOL (Bart's Own Language) might take 50
    times as much.

    (This is, of course, a big reason why languages like Python are popular.)

    Scripting languages have all the benefits of C++ and then some. Except
    for execution speed.


    Spoken like someone who does not know C++, or have any concept of
    scripting languages. Or someone who /does/ know, but is trolling.

    There are different languages with different balances of features and emphasises, and that's good. But you'd have to be completely ignorant
    to think that there are any scripting languages with /all/ the benefits
    of C++ (or to think that C++ has /all/ the benefits of all scripting languages).

    But the C++ in these examples is so slow to build, that a scripting
    language is likely to be faster.


    That would seem highly unlikely. Even if such a compile time was
    realistic (and it rarely is - these examples are rather special cases),
    a scripting language is only going to be faster for short and simple
    scripts, or scripts only used occasionally.

    But since we know nothing about the C++ examples other than that someone managed to make a compilation that took a long time, it is fairly
    pointless to speculate what alternatives could have been better.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to David Brown on Thu Dec 9 14:00:03 2021
    On 09/12/2021 13:16, David Brown wrote:
    On 09/12/2021 12:01, Bart wrote:
    On 09/12/2021 08:28, David Brown wrote:
    On 09/12/2021 08:46, Dmitry A. Kazakov wrote:

    Yes, a modern medium-sized project is extremely large. In a language
    without polymorphism it would take years to design and the code would be >>>> tenfold bigger and complex.

    Exactly.  Developer time is more important than computer time.

    Exactly. That is why having developers twiddle their thumbs for five
    minutes every time they compile is a joke.

    When the alternative is spending weeks or months writing the code by
    hand rather than using templates?

    I can think of half a dozen alternatives. Plenty of time to ponder
    during the 7 hours out of 8 each day that I would be spending waiting!

    One would be to develop first using a rapid-development environment (eg. scripting). Once it works, then you port it to C++, but now it's a
    working program; you will not need to build it so many times.

    (But maintenance will still be a pig.)

    Scripting languages have all the benefits of C++ and then some. Except
    for execution speed.


    Spoken like someone who does not know C++, or have any concept of
    scripting languages.

    I created my first scripting languages in the late 80s (so I course I
    have no concept of what they are).

    When I first found out about C++ a decade later, and its 'OOP', I wanted
    to find out what it could do.

    The first examples I saw concerned first-class string handling; stuff
    I'd already been doing for years in my script language!

    There are different languages with different balances of features and emphasises, and that's good. But you'd have to be completely ignorant
    to think that there are any scripting languages with /all/ the benefits
    of C++

    So, what does C++ specifically offer that would be impossible in
    scripting code? I don't mean things that only work with native code.


    But the C++ in these examples is so slow to build, that a scripting
    language is likely to be faster.


    That would seem highly unlikely.

    If the reason for slowness is some degenerate cases of template
    expansion, then it is a possibility.

    Then a script language that provides generics anyway won't have that
    problem. It will just be a fixed factor slower for the more conventional
    parts of a compiler, not exponentially slower.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to Dmitry A. Kazakov on Thu Dec 9 13:39:53 2021
    On 09/12/2021 12:00, Dmitry A. Kazakov wrote:
    On 2021-12-09 11:50, Bart wrote:
    On 09/12/2021 07:46, Dmitry A. Kazakov wrote:
    On 2021-12-08 22:29, Bart wrote:

    Why so slow? Apparently it was all to do with templates.

    Likely, GCC is very poor with them. But your language has nothing to
    express polymorphism in either form. So are not in position to
    criticize.

    I expect my dynamic language can do polymorphism.

    I expect it does not, because...

    Get me an example of polymorphism, as there seem to be various kinds,
    and I'll see if I can express that in my language.


    But is it really that necessary inside a compiler?

    you obviously do not understand it.

    I understand /my/ compilers and they're obviously not suffering from any
    lack of whatever you think they're lacking.

    If I did such a project (say 100-1000Kloc) I would probably use
    dynamic scripting code for most of it.

    Again a failure to see actual challenges of medium to large software collaboration projects.

    I haven't done commercial apps for a long time. But when I did:

    * I wrote the core application with 75% of modules written as script
    code, about 150Kloc total
    * OEM developers created add-on functionality via my scripting language
    * End users also had the ability to customise and automate via macros,
    command-line scripts and that same scripting language

    Yes, this is a fairly small project with few collaborators.

    But, on top of all the various issues involved in bigger projects, you
    don't want build times to be unnecessarily slow for no good reason.

    For that, you need someone whose job it is to oversee that part, to
    identify such problems, and to do something to fix it if possible.

    What you don't want is for someone like you to just shrug their
    shoulders and say, Well, what do you expect? It is what it is.


    Since most code is not speed-critical. Especially when you just want a
    test run.

    Yep, a test run of a production line automation system, a banking
    system,

    Python is used extensively in banking. Python is a scripting language.

    a luggage delivery system written in BOL, would love to see it,
    from a safe distance of course...

    Actually some airport baggage handling installations were designed with
    the help of my CAD application.

    (Whether they used the script language to help, I don't know.)

    You need ask a different question: do you really expect a compiler for
    any kind of language, which is only 20-25,000 lines of user-code, to
    take MINUTES to build on today's machines? (And on multiple cores!)
    Somebody is doing something wrong.

    Yes, lack of experience/education on modern language-assisted software design.

    My first two examples were from people /implementing their own languages/.

    On your part we see an active denial of very existence of such problems. Things you see important are just silly from the point of view of a practicing software developer. Things you reject are key tools for
    managing complexity of modern software.

    OK. Carry on waiting 5 minutes between each run. Hopefully it will be
    for a more worthwhile reason than over-eager, inappropriate use of
    templates.

    Or, for that 100Kloc Sun program, wait 16 hours to build it. Because it
    is perfectly reasonable to compile at 1.7 lines per second.

    It's really no skin off my nose.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dmitry A. Kazakov@21:1/5 to Bart on Thu Dec 9 15:49:25 2021
    On 2021-12-09 14:39, Bart wrote:

    Get me an example of polymorphism, as there seem to be various kinds,
    and I'll see if I can express that in my language.

    This again shows lack of understanding. Types of polymorphism are
    irrelevant as they all serve the same goal. The difference between
    ad-hoc, parametric, dynamic polymorphism are in their
    capabilities/limitations to express certain types of classes.

    But, on top of all the various issues involved in bigger projects, you
    don't want build times to be unnecessarily slow for no good reason.

    Nobody cares about build times, because this is not how the software development process works. People even agree to suffer nightly builds,
    while calling it agile... (:-))

    For that, you need someone whose job it is to oversee that part, to
    identify such problems, and to do something to fix it if possible.

    Yes, you do not know how testing and software quality assurance work either.

    On your part we see an active denial of very existence of such
    problems. Things you see important are just silly from the point of
    view of a practicing software developer. Things you reject are key
    tools for managing complexity of modern software.

    OK. Carry on waiting 5 minutes between each run.

    Yep, like the James Webb telescope, 30 years before the first and only
    run...

    Again, properties of advanced programming languages such as static verification, modularization, separation of interface and
    implementation, reuse, generic programming are focused on risk analysis
    and understanding that 50% of cases are fundamentally untestable, 50% of
    the rest is economically, technically, timely infeasible to test, 50% of
    the rest of the rest require in-depth analysis of the requirements, the underlying physical processes and the implementation algorithms in order
    to test properly.

    Hopefully it will be
    for a more worthwhile reason than over-eager, inappropriate use of
    templates.

    Nobody is really fond of templates or generics. The problem is that no practical alternative to completely supersede parametric polymorphism
    was ever shown. Dynamic polymorphism is better than the parametric one
    in all aspects except that it cannot handle many cases of the former and inconsistent with regard to multiple dispatch etc.

    My interest in new crazy languages is if they address these issues.
    Silly suggestions about non-existing unsigned arithmetic or summing
    bitfields are not.

    --
    Regards,
    Dmitry A. Kazakov
    http://www.dmitry-kazakov.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Andy Walker@21:1/5 to All on Thu Dec 9 15:44:02 2021
    On 08/12/2021 22:49, Bart wrote:
    [I wrote:]
         Meanwhile, my I suggest a little test for you?  Instead
    of "fannkuch(10)", please get your collection of programs and
    compilers to produce "fannkuch(2)".  Any compilers that return
    "2, 2" [without comment and from code equivalent to what you
    supplied] should be kept well away from any serious computation.
    What should it produce for fannkuch(2)?
    I downloaded versions in 4 languages from the original benchmarks
    site. Three of them (Lua, Python, Julia) go wrong in giving a program
    error.
    That's not "wrong"; with the code [as supplied to me in the
    Algol version], there /is/ a program error. I deduce that the authors
    either didn't test their code or didn't care that there was a bug.
    Neither case is commendable. Given a function of "int n", it would
    seem natural to me to test with at least something like 0, 1, 2, 3,
    something bigger, and something negative. It costs zilch to add a
    test "if n < 3 then ..." to the code.

    The fourth, in C, completes, but fannkuch(2) returns -1. I don't know
    if that is correct or not, but it sounds like it might be undefined.

    The problem is perfectly well defined in the cases 1 and 2;
    for 0 or -1, you get to decide what it means. But if the C is "the
    same as" the Algol you supplied, then it has failed to detect the
    errors that A68G detects. See below.

    The presence of these lines:
        swap(p[2],p[3])
        signx:=1
        for i:=3 to n do
    plus perm1[2] in the 0-based Python, and p[3] in Julia, suggests n
    needs to be at least 3.

    In my universe, it "suggests" that the code ought to check
    that "n" is at least 3. If the arrays in fact have size 2, then
    using "p[3]" is already an error. If that error is disguised, in
    languages where "dynamic arrays" are impossible/difficult, by
    declaring the arrays to have size [say] 100 [prompting another
    sanity check in good code!], then there is a second error in that
    "p[3]" now exists but is uninitialised. A68G detects both of
    these; over to you to test other compilers.

    I'm not claiming that these are the worst and most serious
    bugs ever perpetrated. But they are examples of the sorts of bug
    that Black Hats exploit in creating malware. They look at source
    code with a keen eye on potential buffer over-runs, on dumping
    naughty instructions/data in places where they may be stumbled
    across by code using uninitialised storage, on using null pointers
    and the like [inc, of course, race conditions, not relevant here].
    Languages and compilers that don't help detect these, even in the
    presence of good testing, are a menace. [Note that C (eg) /could/
    detect these; but not if it's somehow essential to shave every
    last microsecond off the run time.]

    I would expect an optimising compiler to iron out such minor
    differences.
         I suspect you're being optimistic if you expect even a good
    optimiser to detect that [in a reasonably large and complicated
    program] element 0 is never accessed, or systemically to reduce all
    indexes by 1.  But you never know.
    Element 0 can stay there; it's not doing any harm with just 3 arrays.

    Yes, but it was your claim that an optimising compiler should
    "iron out such minor differences". My point was that what start out
    as minor differences soon turn into differences that permeate the
    entire program. For the sorts of toy program that we tend to give
    as examples here, that barely happens. But over something bigger,
    say a compiler, you make decisions early on about [eg] whether an
    index should be pre- or post-incremented, that influences whether
    you use the array element and /then/ change the index or vv, that
    changes the value of "i" /after/ the loop, that influences the
    next stage of the code and so on. Then you perhaps need to switch
    from a 0-based to a 1-based language, and whereas you can perhaps
    ignore an element 0 that you don't need, you can't ignore one that
    has been used, so you have to change every array access.

    Being 1-based may simply mean an extra offset on some address modes;
    is that what you think is causing a slow-down? With local arrays,
    that offset should just be absorbed into the stack offset of the
    array.

    I wasn't thinking about slow-downs at all; I just thought
    you were being optimistic in thinking that "gcc -O3" might magically
    turn some 1-based code into equivalent 0-based code or vv.

    --
    Andy Walker, Nottingham.
    Andy's music pages: www.cuboid.me.uk/andy/Music
    Composer of the day: www.cuboid.me.uk/andy/Music/Composers/Widor

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to Dmitry A. Kazakov on Thu Dec 9 17:22:55 2021
    On 09/12/2021 14:49, Dmitry A. Kazakov wrote:
    On 2021-12-09 14:39, Bart wrote:

    Get me an example of polymorphism, as there seem to be various kinds,
    and I'll see if I can express that in my language.

    This again shows lack of understanding. Types of polymorphism are
    irrelevant as they all serve the same goal. The difference between
    ad-hoc, parametric, dynamic polymorphism are in their capabilities/limitations to express certain types of classes.

    But, on top of all the various issues involved in bigger projects, you
    don't want build times to be unnecessarily slow for no good reason.

    Nobody cares about build times,
    Yeah, like nobody cares about stopping at a red light!


    Yes, you do not know how testing and software quality assurance work
    either.

    I guess quality assurance doesn't extend to keeping a development
    environment productive.

    On your part we see an active denial of very existence of such
    problems. Things you see important are just silly from the point of
    view of a practicing software developer. Things you reject are key
    tools for managing complexity of modern software.

    OK. Carry on waiting 5 minutes between each run.

    Yep, like the James Webb telescope, 30 years before the first and only
    run...

    Again, properties of advanced programming languages such as static verification, modularization, separation of interface and
    implementation, reuse, generic programming are focused on risk analysis
    and understanding that 50% of cases are fundamentally untestable, 50% of
    the rest is economically, technically, timely infeasible to test, 50% of
    the rest of the rest require in-depth analysis of the requirements, the underlying physical processes and the implementation algorithms in order
    to test properly.
    Do you think is reasonable to spend 5 minutes across 8 cores doing that
    on a 25,000-line program on every routine build?

    I don't. Most people would agree that that is over the top. Even with
    those 'polymorphic' features (which, if the cost is to slow things down
    by 100 times, are non-viable).

    Perhaps you genuinely think that's how long such things should take.

    Or you are just being scornful of products where it can done instantly.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From antispam@math.uni.wroc.pl@21:1/5 to Bart on Fri Dec 10 23:46:34 2021
    Bart <bc@freeuk.com> wrote:
    On 09/12/2021 14:49, Dmitry A. Kazakov wrote:
    On 2021-12-09 14:39, Bart wrote:
    <snip>
    On your part we see an active denial of very existence of such
    problems. Things you see important are just silly from the point of
    view of a practicing software developer. Things you reject are key
    tools for managing complexity of modern software.

    OK. Carry on waiting 5 minutes between each run.

    Yep, like the James Webb telescope, 30 years before the first and only run...

    Again, properties of advanced programming languages such as static verification, modularization, separation of interface and
    implementation, reuse, generic programming are focused on risk analysis
    and understanding that 50% of cases are fundamentally untestable, 50% of the rest is economically, technically, timely infeasible to test, 50% of the rest of the rest require in-depth analysis of the requirements, the underlying physical processes and the implementation algorithms in order
    to test properly.
    Do you think is reasonable to spend 5 minutes across 8 cores doing that
    on a 25,000-line program on every routine build?

    I don't. Most people would agree that that is over the top. Even with
    those 'polymorphic' features (which, if the cost is to slow things down
    by 100 times, are non-viable).

    Well, in my main project I spent more than 5 minutes on each full build
    (it in not 25000, wc counts is closer to 350000). However, full build
    is relatively rare thing, normally I do hundreds of compile-edit
    cycles between full builds (recompiling single file). Sure,
    faster build time would be nice, and I spent some time decreasing
    it (it went down from about 2 hours to few minutes, partly due to
    faster muchine, partly due to efficiency improvements). But I am
    not going to sacrifice features for faster build time.

    Perhaps you genuinely think that's how long such things should take.

    Well, health experts says that on should take regular breaks from
    programming. 5 minutes is somewhat short, but better than nothing
    (10 minutes probably would be optimal).

    --
    Waldek Hebisch

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From antispam@math.uni.wroc.pl@21:1/5 to Bart on Fri Dec 10 23:28:52 2021
    Bart <bc@freeuk.com> wrote:
    On 04/12/2021 09:32, antispam@math.uni.wroc.pl wrote:
    Bart <bc@freeuk.com> wrote:

    * Inline assemby
    * Inlining
    * User-defined operator overloads
    * The equivalent of C++'s constexpr
    * A gcc-class optimiser
    ...

    I use a language like this. You do need inlining and it is useful
    only due to overloading (_not_ limited to operators). However,
    other position on your list above are not needed. More precisely,
    there is small core language which implements basic types and
    corresponding operators. Basic library types simply ofer "view"
    of basic types and operators from core language. Due to inlining
    there is no runtime loss of efficiency compared to using core
    language directly. But there are also fancy library types
    which are infeasible to have as part of core language.

    Too many advanced compilers features, just to keep the language 'small'.

    You did not get it: purpose is to have managable implementation
    of _very large_ language.

    Source code for gcc was 45,000 source files last I looked. I don't even
    know if that's just for C, or for C++ too.

    That's a thousand times a larger scale than what I do. So I guess I'm in
    less need of such methods.

    gcc is in different game: "how many % of runtime efficiency can we
    gain if we throw 10 times more resources at compilation". But
    you can have much smaller implementation of large language.

    Low-level error prone parts are
    small. Library types can be written by ordinary application
    programmers.

    Overloading can be implemented with quite small extra code.
    Inlining is also easy to implement if you have internal
    representation of larger code fragments (as opposed to
    generating machine code directly after recognizing given
    syntactic construct). Both put limits on maximal possible
    compiler speed: with overloading you have more complex
    symbol table and need more lookups. Intermediate representation
    takes time to build (on top of time needed to generate code).
    Also, inlining allows large object code from small sources,
    so if you measure time per source line compier speed will
    likely be worse than speed of compiler that forces you to
    write each source line separately. Still, compiler can
    be reasonably fast.

    Anyway, increase in compiler complexity is relatively
    small. Even for small languages in total balance
    approach using overloading may give you saving in
    effort spent on implementation.

    I could do overloading. I have an approach I've developed, I just
    haven't got round to it (there are a dozen more things more useful ahead
    of it, which I haven't done yet either!).

    But, this purely user-defined overloading of existing operators, with user-defined types, or applying extra operations to standard types.

    It is not for implementing the core language, which I like to handle by dedicated code in the compiler, keeping it tight, fast and allowing such enhancements as above to be optional (and on a waiting list).

    For example, there is already a backend IL instruction, MUL.I64, to do
    signed multiply. The main part, MUL, is attached to the binary * op
    during parsing. The .I64 part is assigned two passes further on.

    With only mechanisms with user-defined overloads, it's a lot harder to
    make those associations. For one thing, those IL instructions are not
    exposed in the language.

    I do not think exposing IL is good idea. However, logically all you
    have is small number of "special forms", like 'if', loops, assignments
    and functions calls. Compiler needs to understand that '+' for
    machine integer type should be replaced by single instruction.
    In other words, each operation in IL represents optimization for
    specific type. This can be as simple as table of tens or hundreds
    of entries like:

    ["+", I64, I64, I64, ADD.I64]

    where first position is name of operation, second is result type,
    third and fourth are argument types and last one is corresponging
    operation in IL. This table is part of compiler, users only knows
    that "+" will add integers.

    Just as a simple
    example let me mention Pascal. In Pascal there is
    a buch of "builtin" operations (mostly on files).
    Most of them are too complex to do by inline code
    so practical compilers have runtime library implementing
    them. But compiler still needs to parse those
    "builtins" and convert them to runtime calls.
    In a sense this is trivial task, but with realistic
    compiler structure you need several lines of compiler
    code to generate corresponding runtime call.

    I have a similar set of operations for I/O. They are implemented as a
    set of function calls, designed to be called implicitly from code, so
    have funny names. So:

    print a,b

    is equivalent to:

    m$print_startcon()
    m$print_i64_nf(a) # when a,b are i64
    m$print_i64_nf(b)
    m$print_end

    (The bracketing between start/end calls allows separator logic within
    the whole print statement; and allows calls to functions within the
    print list that might themselves use print, to files etc.)

    So, print is already implemented in user code. I just do not see the advantage of replacing 110 or 50 lines of code within the compiler's
    code generator for print, with a likely more complicated solution.

    (110 lines of code for static language which needs to generate different calls for various types; 50 lines for dynamic language.)

    OK, if you do not need more "builtin functions", then ad-hoc approach
    is fine. GNU Pascal supports hundreds of "builtin functions" (when
    you count diffecences due to different types of arguments). This
    forces more systematic approach. AFAIK for Cobol you will need
    more buitins...

    Bunch
    of older languages like Cobol, Fortran, PL/I had
    special "builtin" operations, some of them much
    more than Pascal.

    Let me add that there are more radical approaches:
    there are languages in which "users" can redefine
    scanner and/or parser. So users can add on the
    fly new operators and define their meaning. As
    a possible example consider you 'stringimage'
    extention:

    Do you mean my 'strinclude' which incorporates a text file (or
    text/binary file in dynamic code), as string data within the program?

    Yes.

    in extensible language user simply
    declares that 'stringimage' is a new keyword that
    should invoke its handler.

    That itself is not so easy. Keywords are built-in, scopeless, and known program-wide.

    no, no, no.

    With a user-define keyword, could two modules define that
    keyword differently?

    Yes, keywords are scoped and recognized only within their scope.
    More precisely, each compilation stream has it own keywords which
    can be added or retracted at essentially any time.

    Could another part of the program use that same
    name as an ordinary variable?

    Yes.

    If so then it's not known as a new keyword after the next pass after
    parsing. So it will need to have a suitable 'shape' to be parsed as /something/. (My language also has out-of-order definitions.)


    Handler looks at
    following program text, extracts filename,
    reads the file and tells compiler to use resulting
    string as a string constant.

    How? String literals are detected fairly early, they have a certain AST
    node, length etc. At what pass will this be done?

    Apparently you have your compiler in mind and do not think about
    alternatives. String literals produce string tokens. There
    is no law of nature that forbids alternative ways of creating
    string tokens. Once handler created sting token and passed it
    to normal compilation mechanizm, there is no difference between
    such string token and string tokens coming from string literals
    in source file.

    The whole thing is
    probably 10 lines for handler and a single
    line to register new keyword.

    In language that
    allows syntax extentions instead of overloading
    you can just define new operators, so overloading
    strictly speaking is not necessary. Similarly
    you have some way for syntax handlers to generate
    code, which is similar to inlining, but stricly
    speaking inlining is not needed.

    Do you have an example of such a language where implementing my
    'strimport' is 10 lines of code? For example, C++. Or the one you
    mentioned. If so, how does it look?

    Your 'stringimage' can be done as Lisp macro:

    ;;; ---<cut here>-------------
    (defun slurp_file (fn)
    (with-open-file (f fn)
    (let ((seq (make-string (file-length f))))
    (read-sequence seq f)
    (coerce seq 'string)))
    )

    (defmacro string_include (f)
    (slurp_file f))
    ;;; ---<cut here>-------------

    Note: in source file use of 'string_include' looks like function
    call (almost any Lisp construct looks like function call).
    However, since it is a Lisp macro, it is executed at compile
    time and result is handled like part of source.


    In my static compiler, 'strinclude' needs a handful of lines to define
    enums and names, this these two lines in the parser:

    when kstrincludesym then
    lex()
    p:=createunit1(j_strinclude,readterm2())

    Followed by this code later on to create a suitable string literal:

    proc tx_strinclude(unit p,a)=
    int fileno

    tpass(a)
    if a.tag<>j_const or not a.isastring then
    txerror("strincl/not string")
    fi

    fileno := moduletable[p.moduleno].fileno
    fileno := getfile(a.svalue, path:sourcefilepaths[fileno], issupport:1)

    a.svalue := sourcefiletext[fileno]
    a.slength := sourcefilesizes[fileno]

    deleteunit(p,a) # replace strincl node with new string
    end

    The fiddly bit is deciding where to look for that file (unless an
    absolute path is given, it will be relative to this module).

    The file involved counts as a support file, which means that when an application is combined into an amalgamated file, such files are
    included. (But need to be marked because sometimes, a support file will
    have the same name as an actual source module.)

    Also, when reading from an amalgamated file, it will look inside that
    for the support file.

    I think that 20 lines of code for 'strinclude' inside a compiler is reasonable.

    Sure. However note the difference: you could add 'strinclude' because
    you are compiler author. With extensible language any user can add
    constructs that she/he needs.

    --
    Waldek Hebisch

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to antispam@math.uni.wroc.pl on Sat Dec 11 01:21:33 2021
    On 10/12/2021 23:28, antispam@math.uni.wroc.pl wrote:
    Bart <bc@freeuk.com> wrote:

    Source code for gcc was 45,000 source files last I looked. I don't even
    know if that's just for C, or for C++ too.

    That's a thousand times a larger scale than what I do. So I guess I'm in
    less need of such methods.

    gcc is in different game: "how many % of runtime efficiency can we
    gain if we throw 10 times more resources at compilation".

    gcc-O3 manages some 50% better performance on the typical programs /I
    write/. Example (the BB-opt column, from 2020, is roughly same as my
    current compiler):

    https://github.com/sal55/langs/blob/master/benchsumm.md

    But in general getting 2-3 times as slow as gcc-O3 is easy. Tcc manages
    that, and it is a 0.18MB compiler. It will still be faster than most interpreted languages!

    This is for fairly sensible input files. When input is crazier, perhaps machine-generated with lots of redundant code and variables [or nested
    template expansions!], then you will need better optimisation methods to
    clean up the mess.

    (110 lines of code for static language which needs to generate different
    calls for various types; 50 lines for dynamic language.)

    OK, if you do not need more "builtin functions", then ad-hoc approach
    is fine. GNU Pascal supports hundreds of "builtin functions" (when
    you count diffecences due to different types of arguments).

    If they have function syntax (note my Print statements don't) without
    too many special features, then a systematic approach is easy.

    For example, for most binary ops with symmetric operand types, there is
    just one AST node type, and translation to IL uses four lines of code:

    proc do_bin(unit p,a,b) =
    evalunit(a)
    evalunit(b)
    pcl_gen(p.pclop)
    setmode_u(p)
    end

    You tend to do this when you can find patterns, to avoid duplication.


    That itself is not so easy. Keywords are built-in, scopeless, and known
    program-wide.

    no, no, no.

    With a user-define keyword, could two modules define that
    keyword differently?

    Yes, keywords are scoped and recognized only within their scope.
    More precisely, each compilation stream has it own keywords which
    can be added or retracted at essentially any time.

    Those aren't what I call keywords! More like user-defined syntax. But
    with custom syntax, you don't want it changing in different parts of the program.

    Or are you thinking about Lisp?

    Do you have an example of such a language where implementing my
    'strimport' is 10 lines of code? For example, C++. Or the one you
    mentioned. If so, how does it look?

    Your 'stringimage' can be done as Lisp macro:

    ;;; ---<cut here>-------------
    (defun slurp_file (fn)
    (with-open-file (f fn)
    (let ((seq (make-string (file-length f))))
    (read-sequence seq f)
    (coerce seq 'string)))
    )

    (defmacro string_include (f)
    (slurp_file f))
    ;;; ---<cut here>-------------

    Note: in source file use of 'string_include' looks like function
    call (almost any Lisp construct looks like function call).
    However, since it is a Lisp macro, it is executed at compile
    time and result is handled like part of source.

    So this approach can also be used by C++'s constexpr feature? (Assuming
    there are ways to read files into strings.)

    I notice you say 'compile-time'; does Lisp have a compilation stage with
    some intermediate or binary output? I thought it was run from source.
    Actually I thought that that was the big deal about it, that you can
    manipulate Lisp code as though it was runtime data.

    The thing about a discrete compilation stage is that the resulting
    binary output contains embedded versions of those string includes.

    (My current dynamic language is run-from-source only. However there is a feature to package all source/support files into one text file; the
    string include file goes there.)


    I think that 20 lines of code for 'strinclude' inside a compiler is
    reasonable.

    Sure. However note the difference: you could add 'strinclude' because
    you are compiler author. With extensible language any user can add constructs that she/he needs.

    No, my language isn't extendable that way. People can only add things in
    the usual way: user-defined functions, records, types, etc.

    I think it makes for a simpler, easier-to-implement and faster-to-run
    language.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to antispam@math.uni.wroc.pl on Sat Dec 11 01:23:53 2021
    On 10/12/2021 23:46, antispam@math.uni.wroc.pl wrote:
    Bart <bc@freeuk.com> wrote:



    Perhaps you genuinely think that's how long such things should take.

    Well, health experts says that on should take regular breaks from programming. 5 minutes is somewhat short, but better than nothing
    (10 minutes probably would be optimal).

    I take plenty of breaks, like going on forums like these!

    But when I am working, then I don't want to hang about waiting until I
    see the results of any change.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From antispam@math.uni.wroc.pl@21:1/5 to Bart on Sat Dec 11 13:09:41 2021
    Bart <bc@freeuk.com> wrote:
    On 10/12/2021 23:28, antispam@math.uni.wroc.pl wrote:
    Bart <bc@freeuk.com> wrote:
    <snip>
    That itself is not so easy. Keywords are built-in, scopeless, and known
    program-wide.

    no, no, no.

    With a user-define keyword, could two modules define that
    keyword differently?

    Yes, keywords are scoped and recognized only within their scope.
    More precisely, each compilation stream has it own keywords which
    can be added or retracted at essentially any time.

    Those aren't what I call keywords! More like user-defined syntax. But
    with custom syntax, you don't want it changing in different parts of the program.

    Or are you thinking about Lisp?

    Not only Lisp. The following is slightly redacted part of Pop11 code:

    external declare XEvents in c;

    typedef struct {
    int type; ;;; of event */
    unsigned long serial; ;;; # of last request processed by server */
    Bool send_event; ;;; true if this came from a SendEvent request */
    Display *display; ;;; Display the event was read from */
    Window window; ;;; "event" window it is reported relative to */
    Window root; ;;; root window that the event occured on */
    Window subwindow; ;;; child window */
    Time time; ;;; milliseconds */
    int x, y; ;;; pointer x, y coordinates in event window */
    int x_root, y_root; ;;; coordinates relative to root */
    unsigned int state; ;;; key or button mask */
    unsigned int button; ;;; detail */
    Bool same_screen; ;;; same screen flag */
    } XButtonEvent;

    ...
    ...
    ...

    int *XSynchronize(display, onoff)
    Display *display;
    int onoff;
    {}

    endexternal;

    define global QLength(dpy);
    lvars dpy;
    dpy #-> qlen
    enddefine;

    Part between 'external' and 'endexternal' essentially uses C syntax.
    I declares C routines callable from Pop11. To properly declare C
    routines 'external' part also declares bunch of C types (needed for
    proper handling of arguments. After 'endexternal' you have more
    normal Pop11 code. 'external' part is quite different than other
    parts so it is natural to use distinct syntax. Note that Pop11
    has convention that each construct has clearly visible terminator,
    so there is well-defined moment to switch off syntax extention.

    'external' is provided as part of Pop11, but could be defined
    by user. ATM there are constructs that are considerd part of
    "standard" Pop11, but which were originaly created as user
    extentions.

    Concerning "you don't want it changing in different parts of the
    program", most extentions are intended to blend nicely into
    whole language. In particular you want ability to combine
    extentions. But some extentions naturally work on delimited
    fragments of code and withing fragment (like 'external' above)
    you may wich to use syntax which is quite different than rest
    of language. Another situation is "legacy code". There may be
    extention which is better than older way of doing things and
    you may intend for new way to replace the old one. In particular
    new way may be incompatible with old way. If you have old source
    files that were not updated to new way, it is natural to switch
    on old syntax in old files (or in old fragments of new files
    if you want to cram a lot of things into single file), while
    using new (incompatible) syntax in new code.

    Anyway, AFAICS Lisp tradition is to stick to "standard" Lisp
    syntax and define extentions mostly as macros (in principle
    you could redefine Lisp syntax so that is accepts your
    language as input, but such things seem to be relatively
    infreqent). OTOH in Pop11 each extention usually has its
    own syntax. Note that normal use case it to create extentions,
    that is leave core language unchanged. But if you wish
    you can redefine or elliminate part of core language
    (in Lisp macros can not change core language, you need to
    redefine syntax to effect such change).

    Do you have an example of such a language where implementing my
    'strimport' is 10 lines of code? For example, C++. Or the one you
    mentioned. If so, how does it look?

    Your 'stringimage' can be done as Lisp macro:

    ;;; ---<cut here>-------------
    (defun slurp_file (fn)
    (with-open-file (f fn)
    (let ((seq (make-string (file-length f))))
    (read-sequence seq f)
    (coerce seq 'string)))
    )

    (defmacro string_include (f)
    (slurp_file f))
    ;;; ---<cut here>-------------

    Note: in source file use of 'string_include' looks like function
    call (almost any Lisp construct looks like function call).
    However, since it is a Lisp macro, it is executed at compile
    time and result is handled like part of source.

    So this approach can also be used by C++'s constexpr feature? (Assuming
    there are ways to read files into strings.)

    IIUC currently C++ constexpr has rather strong restrictions of what
    can be computed and I think it can not read from files. But
    assuming that such restrictions are lifted, then the approach
    would work.

    Note that Lisp macros can do more: they are not limited to
    producind new constants, they can produce arbitrary code.

    Similarly, syntax handler in Pop11 can produce arbitrary code.

    I notice you say 'compile-time'; does Lisp have a compilation stage with
    some intermediate or binary output? I thought it was run from source.

    I know one "run from source" Lisp implementation (it actually compiles
    to memory and you can dump memory and later restart program from
    dumped memory image) but this is non-standard. Most Lisp implmentation
    have some intermediate format for compiled files. Also, all
    Lisp implmentations that I know can run from dumped memory image.
    Most can create stanalone executables, but some insits that you
    run loader program which loads previously dumped memory image
    and runs from it.

    Actually I thought that that was the big deal about it, that you can manipulate Lisp code as though it was runtime data.

    You can, as long as it source code. There are limitations of what
    you can do with compiled code. If you want most (all???) Lisp
    implementations will keep source together with binary code, so
    you can examine, transform and maybe recompile any routine.
    Due to binary format in use normally at any time you can replace
    given top-level routine by different one and rest of code will use
    new routine. I wrote normally, because you can _not_ replace
    inlined routines in this way (you need to replace all users).

    --
    Waldek Hebisch

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)