• String Literals

    From Stefan Ram@21:1/5 to All on Wed Sep 29 19:46:07 2021
    I have the following ideas for string literals in a new language
    (first the string, then the string literal is given):

    String literals start with an opening bracket and end with
    a closing bracket.

    abc
    [abc]

    Brackets within the string literal are allowed when properly
    nested.

    abc[def]ghi
    [abc[def]gih]

    A single opening or closing bracket is written as "[`]" or
    "[]`", respectively. This rule has higher precedence than the
    preceding rule: whenever there is a "[`]" or "[]`" within
    a string literal, it means "[" and "]", with no exceptions.

    abc[def
    [abc[`]def]

    abc]def
    [abc[]`def]

    abc[`]def
    [abc[`]`[]`def]

    abc[]`def
    [abc[`][]``def]

    The notation for "[`]" and "[]`" within a string is awkward,
    but is antecipated to be required only rarely. Most texts will
    contain brackets that are properly nested, and this was made
    to be easy.

    So, are there any problems with this specification I have missed?
    Strings that are impossible to encode or string literals whose
    interpretation is ambiguous? Cases where frequent strings are
    cumbersome to encode? TIA!

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to Stefan Ram on Wed Sep 29 21:31:30 2021
    On 29/09/2021 20:46, Stefan Ram wrote:
    I have the following ideas for string literals in a new language
    (first the string, then the string literal is given):

    String literals start with an opening bracket and end with
    a closing bracket.

    abc
    [abc]

    Brackets within the string literal are allowed when properly
    nested.

    abc[def]ghi
    [abc[def]gih]

    A single opening or closing bracket is written as "[`]" or
    "[]`", respectively. This rule has higher precedence than the
    preceding rule: whenever there is a "[`]" or "[]`" within
    a string literal, it means "[" and "]", with no exceptions.

    abc[def
    [abc[`]def]

    abc]def
    [abc[]`def]

    abc[`]def
    [abc[`]`[]`def]

    abc[]`def
    [abc[`][]``def]

    The notation for "[`]" and "[]`" within a string is awkward,
    but is antecipated to be required only rarely. Most texts will
    contain brackets that are properly nested, and this was made
    to be easy.

    So, are there any problems with this specification I have missed?
    Strings that are impossible to encode or string literals whose
    interpretation is ambiguous? Cases where frequent strings are
    cumbersome to encode? TIA!


    I don't know if some strings are impossible to code. But it looks near-impossible to write or read any strings that contain square
    brackets or single quotes.

    How do you deal with the usual non-printable characters that need escape sequences such as CR, LF, TAB, BELL, etc?

    With the usual "..." delimiters, your examples reduce to:

    abc
    "abc"

    abc"def"ghi
    "abc""def""ghi" # or the more common:
    "abc\"def\"ghi" # (I allow both)

    I'm not sure how you came up with these puzzling 3-character sequences:

    [ [`]
    ] []`

    If introducing ` as some sort of escape symbol, why not have it precede
    the escaped character:

    [ `[
    ] `]

    Your examples, if still using [..] to delimit strings, and allowing
    embedded ...[...]... without needing escapes, become:

    abc[def]ghi
    [abc[def]gih]

    abc[def
    [abc`[def]

    abc]def
    [abc`]def]

    abc[`]def
    [abc[``]def]

    abc[]`def
    [abc[]``def]

    Here it needs `` to represent one `.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to Bart on Fri Oct 1 14:37:50 2021
    On 29/09/2021 21:31, Bart wrote:
    On 29/09/2021 20:46, Stefan Ram wrote:
       I have the following ideas for string literals in a new language
       (first the string, then the string literal is given):

    Stefan, I'll respond to your idea here as Bart has already made some of
    the points I would have made.


       String literals start with an opening bracket and end with
       a closing bracket.

    abc
    [abc]

    I'd be interested to see where you get to with this as I experimented
    with braces (rather than brackets) which have the same feature of the
    closing delimiter (and, hence, the string terminator) being different
    from the opening delimiter.


       Brackets within the string literal are allowed when properly
       nested.

    abc[def]ghi
    [abc[def]gih]

       A single opening or closing bracket is written as "[`]" or
       "[]`", respectively. This rule has higher precedence than the
       preceding rule: whenever there is a "[`]" or "[]`" within
       a string literal, it means "[" and "]", with no exceptions.

    abc[def
    [abc[`]def]

    abc]def
    [abc[]`def]

    abc[`]def
    [abc[`]`[]`def]

    abc[]`def
    [abc[`][]``def]

       The notation for "[`]" and "[]`" within a string is awkward,

    Yes, it's very awkward.


       but is antecipated to be required only rarely. Most texts will
       contain brackets that are properly nested, and this was made
       to be easy.

       So, are there any problems with this specification I have missed?
       Strings that are impossible to encode or string literals whose
       interpretation is ambiguous? Cases where frequent strings are
       cumbersome to encode?                                        TIA!


    I don't know if some strings are impossible to code. But it looks near-impossible to write or read any strings that contain square
    brackets or single quotes.

    How do you deal with the usual non-printable characters that need escape sequences such as CR, LF, TAB, BELL, etc?

    That was my main question. AISI, if Stefan uses an escape sequence for
    LF etc then a string's opening and closing delimiters could be escaped
    in order to embed them.


    With the usual "..." delimiters, your examples reduce to:

     abc
     "abc"

     abc"def"ghi
     "abc""def""ghi"        # or the more common:
     "abc\"def\"ghi"        # (I allow both)

    I chose to match an opening \ with a closing / so that string would be
    one of these

    "abc\Q/def\Q/ghi"
    "abc\q/def\q/ghi"
    "abc\"/def\"/ghi"

    Not sure which, yet, but because what comes after \ is not limited to
    one character other quote marks could be specified by name, e.g.

    \q66/ opening slanted speech mark
    \q99/ closing slanted speech mark
    \q9/ normal slanted apostrophe
    \q<</ France etc opening speech mark
    etc

    https://en.wikipedia.org/wiki/Guillemet



    I'm not sure how you came up with these puzzling 3-character sequences:

     [    [`]
     ]    []`

    If introducing ` as some sort of escape symbol, why not have it precede
    the escaped character:

     [    `[
     ]    `]

    Your examples, if still using [..] to delimit strings, and allowing
    embedded ...[...]... without needing escapes, become:

     abc[def]ghi
     [abc[def]gih]

     abc[def
     [abc`[def]

     abc]def
     [abc`]def]

     abc[`]def
     [abc[``]def]

     abc[]`def
     [abc[]``def]

    Here it needs `` to represent one `.

    However and whenever I try to encode such such strings they end up to be similarly difficult to read.

    One option, perhaps, is to allow greater spacing. Considering the last one,

    abc[]`def

    if the punctuation characters need special treatment how about spacing
    them out. For example,

    "abc" + LBRACKET + RBRACKET + BACKAPOSTROPHE + "def"

    or

    "abc\ [ /\ ] /\ ` /def"

    or

    "abc" + "\[/" + "\]/" + "\`/" + "def"

    or

    "abc\ [ ] ` /def"

    That last one's arguably not too bad a way to embed three consecutive
    special characters.


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Stefan Ram on Fri Oct 1 15:43:29 2021
    On 29/09/2021 21:46, Stefan Ram wrote:
    I have the following ideas for string literals in a new language
    (first the string, then the string literal is given):

    String literals start with an opening bracket and end with
    a closing bracket.


    Others have answered here, but have missed the elephant in the room -
    /why/? What possible advantages would this brackets mess have over
    quotation marks that are used by almost every programming language (and
    many human languages)?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stefan Ram@21:1/5 to James Harris on Fri Oct 1 14:12:42 2021
    James Harris <james.harris.1@gmail.com> writes:
    On 29/09/2021 21:31, Bart wrote:
    On 29/09/2021 20:46, Stefan Ram wrote:
    How do you deal with the usual non-printable characters that need escape >>sequences such as CR, LF, TAB, BELL, etc?
    That was my main question. AISI, if Stefan uses an escape sequence for
    LF etc then a string's opening and closing delimiters could be escaped
    in order to embed them.

    CR, LF, TAB, and BELL do not need escape sequences in my
    notation as they can be included either literally or via
    the embedding language if need be.

    [ a bracketed string
    can span several lines,
    and it may
    contain literal tab
    characters if need be.
    BELL signs are antecipated to be rarely needed.]

    "abc" + LBRACKET + RBRACKET + BACKAPOSTROPHE + "def"

    If these strings are part of a languages with string
    concatenation operators (which is intended indeed) this
    would be possible. I plan to realize concatenation of
    strings by mere concatenation of expressions, so
    "abc\adef" could be written [abc]*BELL[def], that is
    a sequence of a string literal, a name, and another
    string literal (names would have to be marked in this
    language, I used an asterisk in this post as an example
    for a marker for a reference by name).

    I decided to use []` for the closing bracket as part of the
    text, as I wrote. If I had decided to use `] for the closing
    bracket as part of the text, this would mean that a backtick
    cannot be the last character in a string. So, I could have
    used ]` instead, but using []` instead means that my strings
    always have properly nested brackets, which helps when using
    editors with functions to find matching brackets.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to Stefan Ram on Fri Oct 1 16:08:04 2021
    On 01/10/2021 15:12, Stefan Ram wrote:
    James Harris <james.harris.1@gmail.com> writes:
    On 29/09/2021 21:31, Bart wrote:
    On 29/09/2021 20:46, Stefan Ram wrote:
    How do you deal with the usual non-printable characters that need escape >>> sequences such as CR, LF, TAB, BELL, etc?
    That was my main question. AISI, if Stefan uses an escape sequence for
    LF etc then a string's opening and closing delimiters could be escaped
    in order to embed them.

    CR, LF, TAB, and BELL do not need escape sequences in my
    notation as they can be included either literally or via
    the embedding language if need be.

    [ a bracketed string
    can span several lines,
    and it may
    contain literal tab
    characters if need be.
    BELL signs are antecipated to be rarely needed.]

    Those four may be covered but do you not need to handle any other
    nonprinting characters such as backspace or del?

    You may also want to have a plan for ending lines with something other
    than the line endings which happen to be present in the particular
    editor you are using (which is what the above text would naturally
    include).

    What if someone writing one of your strings wanted to include a trailing
    space on one line but not another? In the above, trailing blanks would
    not be evident in the source.

    An escape arrangement would allow such issues to be addressed as well as providing a way of embedding (or, de-signifying) string delimiters.

    Something else to consider is where text has to be entered in lines but
    the encoded text should omit the line breaks.


    "abc" + LBRACKET + RBRACKET + BACKAPOSTROPHE + "def"

    If these strings are part of a languages with string
    concatenation operators (which is intended indeed) this
    would be possible. I plan to realize concatenation of
    strings by mere concatenation of expressions, so
    "abc\adef" could be written [abc]*BELL[def], that is
    a sequence of a string literal, a name, and another
    string literal (names would have to be marked in this
    language, I used an asterisk in this post as an example
    for a marker for a reference by name).

    That's interesting. I tried the same. I found it would work especially
    well and usefully for a trailing newline. In your syntax:

    [abc] ;Just the three letters abc
    [abc]*n ;abc and newline


    I decided to use []` for the closing bracket as part of the
    text, as I wrote. If I had decided to use `] for the closing
    bracket as part of the text, this would mean that a backtick
    cannot be the last character in a string. So, I could have
    used ]` instead, but using []` instead means that my strings
    always have properly nested brackets, which helps when using
    editors with functions to find matching brackets.

    Understood, but AIUI your idea of having

    [`

    for a de-signified opening bracket would also make it hard to put such a backtick at the /beginning/ of a string.

    All told, escapes are not the worst idea in the world.


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to Stefan Ram on Fri Oct 1 16:01:16 2021
    On 01/10/2021 15:12, Stefan Ram wrote:
    James Harris <james.harris.1@gmail.com> writes:
    On 29/09/2021 21:31, Bart wrote:
    On 29/09/2021 20:46, Stefan Ram wrote:
    How do you deal with the usual non-printable characters that need escape >>> sequences such as CR, LF, TAB, BELL, etc?
    That was my main question. AISI, if Stefan uses an escape sequence for
    LF etc then a string's opening and closing delimiters could be escaped
    in order to embed them.

    CR, LF, TAB, and BELL do not need escape sequences in my
    notation as they can be included either literally or via
    the embedding language if need be.

    [ a bracketed string
    can span several lines,
    and it may
    contain literal tab
    characters if need be.
    BELL signs are antecipated to be rarely needed.]

    That won't work well in general because newline sequences depend on both
    the OS and the editor, or even on the source of the text if it was
    pasted elsewhere.

    Newlines may be CR, CRLF, LF, something else entirely, or may not even
    exist. (In my editor, newlines do not exist while editing and displaying
    text, which is a list of strings. They are discarded when reading from
    disk, and added back again when writing to a file.)

    It means that that string can contain have unknown sequences, and what
    are superfically the same strings in two source files, may not compare
    equal.

    Literal tabs are another problem, as they are so often expanded. Then
    they turn into spaces, but now a fixed number of spaces.

    Yet another, is that without delimiters before the editor's natural end-of-line, there can be trailing spaces (and tabs) that are now invisible.

    Two bonus problems: this makes it impossible to have those intermediate
    lines ending with a comment, and you can't indent this text to bring it (literally) into line with the surrounding code.


    "abc" + LBRACKET + RBRACKET + BACKAPOSTROPHE + "def"

    If these strings are part of a languages with string
    concatenation operators (which is intended indeed) this
    would be possible.

    In this case why bother with trying to represent embedded [ and ] at all?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rod Pemberton@21:1/5 to Stefan Ram on Sun Oct 10 03:38:04 2021
    On 29 Sep 2021 19:46:07 GMT
    ram@zedat.fu-berlin.de (Stefan Ram) wrote:

    I have the following ideas for string literals in a new language
    (first the string, then the string literal is given):

    String literals start with an opening bracket and end with
    a closing bracket.

    abc
    [abc]

    Having different initial and terminal delimiters makes it slightly
    easier to parse the string than using quotes, but this typically
    requires escapes too.

    My advice would be to pick delimiters that would not normally be needed
    within typical typed text e.g., for ASCII, possibly a backquote `,
    backslash \, caret ^, tilde ~, or quote ". I would avoid brackets [],
    braces {}, parens (), guillemets <>, as string delimiters due to their usefulness in pairing items within the language. The other ASCII
    symbols are used for punctuation, mathematics, or accounting.

    Brackets within the string literal are allowed when properly
    nested.

    abc[def]ghi
    [abc[def]gih]


    Why would you need to nest string delimiters? ...

    In other words, why are you nesting a string within a string?
    (IMO, that's the biggest elephant in the room ...)

    So, I'm beginning to think that you may mean something different by the
    term "string literal" that what I understand a "string literal" to be: https://en.wikipedia.org/wiki/String_literal

    Or, is the usage of nesting just a way to embed non-delimiter brackets
    within the string without using escapes? ... If so, your choice of
    brackets as delimiters is probably non-optimal. Pick something else.

    A single opening or closing bracket is written as "[`]" or
    "[]`", respectively. This rule has higher precedence than the
    preceding rule: whenever there is a "[`]" or "[]`" within
    a string literal, it means "[" and "]", with no exceptions.

    The backquote ` is acting as an escape, but since it comes after the
    character being escaped, your lexer would need look-back. AIUI, the
    majority of lexers use look-ahead. What does yours do? Is this a
    concern?

    abc[def
    [abc[`]def]

    abc]def
    [abc[]`def]

    abc[`]def
    [abc[`]`[]`def]

    abc[]`def
    [abc[`][]``def]

    The notation for "[`]" and "[]`" within a string is awkward,
    but is antecipated to be required only rarely. Most texts will
    contain brackets that are properly nested, and this was made
    to be easy.

    So, are there any problems with this specification I have missed?
    Strings that are impossible to encode or string literals whose
    interpretation is ambiguous? Cases where frequent strings are
    cumbersome to encode? TIA!


    I'm really not sure why the nesting of strings is needed, assuming
    (probably incorrectly) that's what is being done here, so I'd personally eliminate the nesting, or change the delimiters, if not. That would
    eliminate some or all of the need for escapes (like C) or string
    concatenation (like BASIC). If you need an escape, use an escape, or
    select different terminators to reduce/eliminate the need for escapes.

    --
    Things are only going to become worse for Joe Biden. His only chance
    at salvation will come from the thing he hates the most: Donald Trump.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)