• It's all the fault of the ASCII committee

    From trijezdci@21:1/5 to Martin Brown on Tue Aug 30 06:07:10 2016
    On Tuesday, 30 August 2016 18:22:01 UTC+9, Martin Brown wrote:

    The "|" is a logical OR separating the various cases and as such should
    only appear between clauses and not precede the first one.

    PIM and ISO define the | within CASE statements as case label separators.

    M2 R10 defines them as case label prefixes, not as separators.

    I will explain the rationale for prefixes a little further below but let me first talk about how this all came about.

    Initially, we went along with the concept of using separators and we considered to use double-semicolons as case label separators.

    CASE x OF
    a : ...; ... ;;
    b : ...; ... ;;
    c : ...; ...
    END;

    This would have freed up the | for use as a replacement for OR which in turn would have allowed consistent use of symbols as logical operators, removing reserved words AND, OR and NOT. Our rationale was: either all are symbols or all are reserved words,
    not both, not mixed.

    However, considering the principle of least surprise, we eventually decided against double-semicolons and removed the & and ~ as synonyms for AND and OR instead.

    This turned out to be helpful later when we decided against the use of + as concatenation operator since we were then able to use the freed up &.

    Yet, while we had been looking for an alternative for | as case label separator, we felt that the ideal symbol would be a bullet and that this ought to be a prefix, not a separator:

    CASE x OF
    • a : ...
    • b : ...
    • c : ...
    END;

    I am pretty confident this is what Wirth would have designed had he had a bullet symbol at his disposal[1].


    Our primary driver to switch from case label separator to case label prefix was readability when the CASE statement spans multiple lines which is the vast majority of use cases.

    CASE x OF
    | a : ...
    | b : ...
    | c : ...
    END;

    In fact this had already been a discussion point at meetings of the ISO M2 working group. Many of the delegates felt that a prefix was more readable and a compromise was reached to allow both variants.

    Since we don't like to have alternative variants of syntax, we decided in favour of prefixes and against separators.


    [1] Unfortunately, the good people on the ASCII committee decided to waste 33 code points on control codes most of which were only useful for teletype machines, the replacement of which with video terminals was already foreseeable and had in fact already
    started while the committee was active. Due to this shortsightedness, there wasn't any space left for a bullet in the ASCII set.


    It is inconsistent to put this "|" phantom in
    and then fault trailing ";".

    It would be so if | was a separator, but it isn't since it is a prefix. ;-)


    Hope this clarifies.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From rugxulo@gmail.com@21:1/5 to trijezdci on Thu Sep 1 15:49:48 2016
    Hi,

    You're clearly very knowledgeable, so I'm not sure why I'm commenting.
    You don't need my help. :-)

    On Tuesday, August 30, 2016 at 8:07:11 AM UTC-5, trijezdci wrote:
    On Tuesday, 30 August 2016 18:22:01 UTC+9, Martin Brown wrote:

    The "|" is a logical OR separating the various cases and as such should only appear between clauses and not precede the first one.

    PIM and ISO define the | within CASE statements as case label separators.

    IIRC, one or two compilers I tried (PIM, not ISO) didn't like '|' before
    the first choice. Those may have been old PIM2, though.

    Initially, we went along with the concept of using separators and
    we considered to use double-semicolons as case label separators.

    This would have freed up the | for use as a replacement for OR which
    in turn would have allowed consistent use of symbols as logical operators, removing reserved words AND, OR and NOT. Our rationale was: either all are symbols or all are reserved words, not both, not mixed.

    Presumably "OR" isn't considered too much harder to type than '|'.
    IIRC, the aliases '~', '&' were only optional and introduced in PIM3
    (although Wirth made them mandatory in Oberon).

    However, considering the principle of least surprise, we eventually
    decided against double-semicolons and removed the & and ~ as synonyms
    for AND and OR instead.

    So did Modula-3.

    Yet, while we had been looking for an alternative for | as case label separator, we felt that the ideal symbol would be a bullet and that
    this ought to be a prefix, not a separator:

    Wasn't ":=" supposed to represent the left arrow? So why not go all out? (Personally, I abhor that, I prefer old-fashioned ASCII.)

    I am pretty confident this is what Wirth would have designed had he
    had a bullet symbol at his disposal[1].

    Maybe so, but he changes his mind a lot.

    Our primary driver to switch from case label separator to case label
    prefix was readability when the CASE statement spans multiple lines
    which is the vast majority of use cases.

    CASE x OF
    | a : ...
    | b : ...
    | c : ...
    END;

    In fact this had already been a discussion point at meetings of the
    ISO M2 working group. Many of the delegates felt that a prefix was
    more readable and a compromise was reached to allow both variants.

    They also allowed '!' instead, right? (Unlike PIM.)

    Since we don't like to have alternative variants of syntax,
    we decided in favour of prefixes and against separators.

    [1] Unfortunately, the good people on the ASCII committee decided
    to waste 33 code points on control codes .... Due to this
    shortsightedness, there wasn't any space left for a bullet in the
    ASCII set.

    So use '.' or '*' instead. Don't a lot of markup languages use the
    asterisk for that?

    Hope this clarifies.

    I wish you luck, but (to be honest) this all sounds way over my head.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From trijezdci@21:1/5 to rug...@gmail.com on Thu Sep 1 19:06:50 2016
    On Friday, 2 September 2016 07:49:48 UTC+9, rug...@gmail.com wrote:

    IIRC, one or two compilers I tried (PIM, not ISO) didn't like '|' before
    the first choice. Those may have been old PIM2, though.

    I don't think there would be any PIM compilers that permit this, but I do recall vividly that WG13 discussed and decided to permit this in ISO, although I don't quite remember which meeting it was.


    Wasn't ":=" supposed to represent the left arrow? So why not go all out? (Personally, I abhor that, I prefer old-fashioned ASCII.)

    I think this is entirely unrelated but if the ASCII committee had included a left array in the set and it would have taken hold for use as an assignment operator, we would by now find it natural. You would likely have found the idea of using ":=" for
    assignment abhorrent then. ;-)


    I am pretty confident this is what Wirth would have designed had he
    had a bullet symbol at his disposal[1].

    Maybe so, but he changes his mind a lot.

    Correct. Although I am near certain that had there been a bullet available and had he used it as a prefix for case labels he wouldn't have changed his mind on that because it is the most natural looking syntax possible. We'd all find it so obvious,
    nobody would question it.


    They also allowed '!' instead [of '|'], right? (Unlike PIM.)

    Indeed, thanks for reminding me of that, I forgot to mention it in the comparison chart.


    So use '.' or '*' instead. Don't a lot of markup languages use the
    asterisk for that?

    That would make the syntax ambiguous.

    You could have

    CASE x OF
    . foo : y := 5
    .

    where . follows 5 and it might as well mean 5. although we forbid real number literals starting or ending in a decimal point and this could be resovled in the lexer.

    However, the dot has a very small visual footprint which makes it unsuitable for this purpose since the whole rationale for the case label prefix is to stand out.


    The asterisk is better in this regard, but it causes ambiguity, too. You could have

    CASE x OF
    * foo : y := 5
    * bar :

    which could mean y := 5*bar

    The grammar would no longer be LL(1).

    In fact you need an LL(*) grammar, that is LL(k) for indefinite k because the case label could be a constant expression of the form:

    CASE x OF
    * foo : y := 5
    * bar*baz+bam/boo-blob : ...

    You'd need to keep parsing the expression without knowing whether it belongs to the right hand side of the assignment to y or to the following case label until you eventually find the : and since the expression has arbitrary length from the grammar point
    of view, you would need indefinite lookahead.


    In any event, we settled for | as a prefix and this is not going to change now. It's not as nice as a true bullet would have been, but it is a good compromise. It stands out nicely ...

    CASE x OF
    | foo : ...
    | bar : ...
    | baz : ...
    END

    It stands out a little less when each case spans multiple lines ...

    CASE x OF
    | foo : ...
    ...; ...;
    ...; ...; ...
    | bar : ...
    ...; ...;
    ...; ...; ...
    | baz : ...
    ...; ...;
    ...; ...; ...
    END;

    but still enough to give a reader's eyes visual cues.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)