• Code gen - calling sequences

    From Dmitry A. Kazakov@21:1/5 to Bart on Sun Aug 29 15:57:38 2021
    On 2021-08-29 15:38, Bart wrote:
    On 29/08/2021 14:12, Dmitry A. Kazakov wrote:
    On 2021-08-29 14:49, Bart wrote:
    On 29/08/2021 12:34, Dmitry A. Kazakov wrote:
    On 2021-08-29 13:04, Bart wrote:

    BTW what peripheral device needs 200MB of code?

    Modern protocols are extremely complicated as well as the end
    devices. Consider a radiator thermostat. It is a very simple device.
    Yet it has hundred parameters, a dozen of modes, a weekly schedule
    you must be able to query and program. So you can imagine the
    complexity of its protocol. If you are very lucky that would be a
    vendor-specific protocol. If it is a "standard" protocol you are in
    a deep trouble. The standard protocols are gigantic piles of cra*p.
    You can take a look on AMQP or any of ASN.1 based protocols  to get
    an impression. ASN.1 description of certificate files is almost
    comical, if you do not need to implement it.

    Worse, you could not throw the useless stuff out, because you must
    certify your implementation of the protocol.

    On top of that come configuration stuff you must address in the GUI,
    in the persistent storage. The on-line data you have to handle and
    log and so on. Procedures to replace defective device, flash the
    device's firmware.

    Then you have not just one device, you have an array of, e.g.
    several radiator thermostats and a dozen of other device types, e.g.
    shutter contacts, wall panels, sensors etc.


    By my measure, 200MB would equate to (very roughly) 20M lines of code

    You must count the language run-time and other system libraries. E.g.
    libc is 1.6MB, SQLite3 is 1.3MB, GTK is about 25MB and so on.

    GTK would be statically linked into an application (which I thought you
    said was to do with peripherals)?

    GTK cannot be linked statically.

    That doesn't make any sense. So if 50 apps all needed GTK, each would
    carry their own copies. And if several are running at the same time,
    there will be multiple copies of the code in memory.

    You run 50 GUIs at a time? But no, GTK is linked dynamically due to some licensing decisions, I believe. I do not remember.

    However, suppose 50MB of that 200MB /was/ GTK.

    No it is not only GTK. It was an example that 200MB is very modest
    assuming the number of protocols a typical application uses. Each
    protocol comes with several libraries each of them might be 1MB or so.
    And as I said on top of that there are layers of application code
    necessary to run the protocol stack, to configure, to store/restore configurations, to visualize etc.

    It seems that you think that a typical application reads from the
    keyboard and prints on printer. It is not so, many decades, actually.

    It seems GTK itself
    already is logically divided into dozens of separate libraries.

    Yes, it is.

    This is the point I made some posts ago.

    Maybe. My comment was that 200MB of code is not that much.

    --
    Regards,
    Dmitry A. Kazakov
    http://www.dmitry-kazakov.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Dmitry A. Kazakov on Sun Aug 29 16:54:32 2021
    On 29/08/2021 12:50, Dmitry A. Kazakov wrote:
    On 2021-08-29 11:36, David Brown wrote:
    On 29/08/2021 02:01, Bart wrote:

    So 10 million lines of code represents a single 100MB program,
    approximately.

    The biggest single executable I see on my machine (without digging too
    hard) is 25 MB.  I have also found a shared library at 125 MB.

    If you use GCC and generic instances put in a shared library, you easily
    come to such numbers. GCC generates lots of stuff.

    Funny thing, you cannot even build some of such shared libraries under Windows because the number of exported symbols easily exceeds 2**16-1 (Windows limit). You must split the library into parts...


    I didn't know of that limit. I did know that Windows was still limited
    by its 16-bit ancestry, but not that specific one.

    I also did not mean to imply that these big builds result in a single
    binary - they are often split into multiple "shared" libraries.  (I put
    "shared" in quotations, because the libraries are typically dedicated to
    the program rather than shared by other applications.)  This can be
    convenient during development, building and testing.

    100-200MB is a medium-sized production application: peripheral devices,
    HTTP server, database, cloud connectivity, user management, things start
    to explode quickly.


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From antispam@math.uni.wroc.pl@21:1/5 to Bart on Sun Aug 29 16:19:15 2021
    Bart <bc@freeuk.com> wrote:
    On 29/08/2021 13:24, antispam@math.uni.wroc.pl wrote:
    Bart <bc@freeuk.com> wrote:

    A rule of thumb I've sometimes observed is that, for x64 anyway, 1 line
    of source code maps to about 10 bytes of binary machine code.

    Depends on the language. For C it may be lower, for some other
    languages much higher.

    So 10 million lines of code represents a single 100MB program,
    approximately.

    I work on a program when executable is 64 M. However, significant
    part of executable code is in loadable modules that take another
    64 M. Guess how big is the source?

    By my metric it would be about 6M lines of source code, if most of the
    64KB was executable x64 code (rather than initialised data, embedded
    data files, or other exe overheads).

    That assumes a certain proportion of declaration lines to lines of
    executable code.

    Now you're going to tell me it's either a lot fewer or a lot more.

    40 M in executable is "statically" linked code from outside, probably corresponding to 0.5M lines os source. 24 M corresponds to about 80 K
    lines. 64 M in loadable modules corresponds to 210 K lines (actual
    code lines is closer to 120 K, rest is comments and empty lines).

    It is hard to distinguish between executable code and data. Due
    to semantics initialized data needs executable code to perform
    initialization. There are dispatch tables, all data and code is
    tagged (has identifying headers). There is runtime type info.
    OTOH, there is lot of code due to compiler aggressivly optimizing
    for speed at cost of code size. There is exception handling code
    inserted by compiler.

    If the language is C, then I guess that could be anything: you can have macros that expand to many times there size, and instantiated at
    multiple sites; include files that can do the same trick. Or lot of boilerplate code that reduces to nothing.

    Or there is lots of inlining that pushes the size the other way again.

    Compiler may compile the same code multiple times, each time with
    different assumptions about type (effectively producing several
    specialized variants from the same code).

    Well, there is also issue of memory size. SmartEiffel used (uses???) whole-program optimization and compiled very fast. But for really
    large program it used to run out of memory. I am not sure if this is
    still problem on modern machines, but resonable estimate is that keeping all needed info in memory you may need 1000 times of memory as for source. So you need to carefully optimize space use...

    3 compilers of mine I've just tested use memory equivalent to 15x (C compiler), 20x (Interpreter), and 80x (my systems language) the source size.

    But they all use persistent data structures, especially the last which creates arrays of tokens, a bad idea I've since dropped. All those
    include the source itself.

    ATM I have to keep parse tree of large part of program in memory.
    The parse tree is about 8 times larger than corresponding source. Representation of parse tree is unoptimized and in principle
    packed representation could be smaller. OTOH this is just parse
    tree, without any extra data like types or source locations.
    Once compiler collects enough data to do interesting optimizations,
    data structures may be much larger...

    All the memory is recovered on program termination. If it becomes an
    issue, then unneeded data structured can be destroyed earlier.

    But if we say 40x source size, then capacity of 8GB means /currently/
    being able to deal with source code of something over 10M lines,
    depending one code density.

    It just means being more resourceful, and reintroducing long-forgotten techniques of working with memory-limited hardware.

    ATM, 10M lines is 200 times the size of my typical projects.

    I deal with code written by other folks. And I like generating
    code. You may easily end up with quite large amount of code
    to compile.

    --
    Waldek Hebisch

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to antispam@math.uni.wroc.pl on Sun Aug 29 19:12:16 2021
    On 29/08/2021 17:19, antispam@math.uni.wroc.pl wrote:
    Bart <bc@freeuk.com> wrote:

    I work on a program when executable is 64 M. However, significant
    part of executable code is in loadable modules that take another
    64 M. Guess how big is the source?

    By my metric it would be about 6M lines of source code, if most of the
    64KB was executable x64 code (rather than initialised data, embedded
    data files, or other exe overheads).

    That assumes a certain proportion of declaration lines to lines of
    executable code.

    Now you're going to tell me it's either a lot fewer or a lot more.

    40 M in executable is "statically" linked code from outside, probably corresponding to 0.5M lines os source. 24 M corresponds to about 80 K
    lines. 64 M in loadable modules corresponds to 210 K lines (actual
    code lines is closer to 120 K, rest is comments and empty lines).

    Those are some very large ratios between code lines and bytes of output,
    some 80:1, 300:1 and (assuming 150K for /some/ blank lines and
    comments), about 400:1.

    The largest I've come across is 2500:1, for a program (not mine) with
    some very deeply nested macros.

    It makes it harder to get an idea of the true complexity of a 1MB
    program for example; would it be 100K lines (my 10:1 code), or 2.5K
    lines (your 400:1 code), or something between the two?

    But I think that even C code is typically more like mine than yours. If
    I take the 230Kloc file sqlite3.c, which is very comment-heavy, and
    strip the comments but leaving blank lines, then I get 170Kloc.

    I compile that to a 1.1MB object file, which is between 6:1 and 7:1
    bytes per line of source.

    If I take one of my 740KLoc benchmark programs (fannkuch() repeated
    10,000 times), I get executables of 6MB to 8MB, so bytes:lines ratios of
    8:1 to 11:1 (optimising on/off).

    If you applied that 400:1 ratio to the 10Mloc programs David was talking
    about, then you'd end up with 4GB of code per 10Mloc. My 40Kloc compiler
    would be 16MB in size instead of 0.4MB!

    So I'd say that your programs are rather atypical.


    It is hard to distinguish between executable code and data. Due
    to semantics initialized data needs executable code to perform initialization. There are dispatch tables, all data and code is
    tagged (has identifying headers).

    That sounds more like my interpreted languages. If I take that same
    740Kloc benchmark, which is 670Kloc in this language, it uses 30MB of
    64-bit bytecode, so 45:1 here, ignoring all other requirements.

    ATM I have to keep parse tree of large part of program in memory.
    The parse tree is about 8 times larger than corresponding source.

    I think only 8 times larger is pretty good. Although it does depend on
    whether you like long or short identifiers...

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to James Harris on Mon Aug 30 10:13:32 2021
    On 28/08/2021 16:35, James Harris wrote:
    On 24/08/2021 21:56, David Brown wrote:
    On 24/08/2021 21:06, Bart wrote:
    On 24/08/2021 18:25, James Harris wrote:

    These days why use calling conventions at all? Perhaps they are only
    needed for when there's complete ignorance of the callee. The
    traditional concept of calling conventions may be pass\acute/e. ;-)

    James, aren't you using Linux?  The compose key makes it easy to write
    letters like é - it's just compose, ´, e - "passé".  (It's even easier >> if you have a non-English keyboard layout, in Windows or Linux, as these
    usually have "dead keys" for accents.)

    Thanks, I've now enabled the compose key though I wrote passé in the way
    I did as it's the way I am thinking of for my language - which, as it
    was unfamiliar to others was why I added the smiley.


    I don't imagine anyone is going to want to write "pass\acute/e" as an identifier in any language. And the last thing anyone needs is another
    way to write that kind of thing.

    There are, I think, only two sensible options here:

    1. Disallow any identifier letters outside of ASCII.
    2. Make everything UTF-8.


    If you desperately want to allow some way to write non-ASCII characters
    without UTF-8, then please do not invent your own new way to do it.
    There are more than enough standards here already - use HTMl/XML names,
    or Unicode descriptions.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dmitry A. Kazakov@21:1/5 to David Brown on Mon Aug 30 10:38:18 2021
    On 2021-08-30 10:13, David Brown wrote:

    There are, I think, only two sensible options here:

    1. Disallow any identifier letters outside of ASCII.
    2. Make everything UTF-8.

    Yes. Though people preferring #2 are usually English speakers who are
    not really aware of the consequences. Like having E, Ε, Е three
    different identifies. One could try to maintain language-defined
    homographs in order to prevent mess, introducing even bigger mess...

    --
    Regards,
    Dmitry A. Kazakov
    http://www.dmitry-kazakov.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Dmitry A. Kazakov on Mon Aug 30 11:50:48 2021
    On 30/08/2021 10:38, Dmitry A. Kazakov wrote:
    On 2021-08-30 10:13, David Brown wrote:

    There are, I think, only two sensible options here:

    1. Disallow any identifier letters outside of ASCII.
    2. Make everything UTF-8.

    Yes. Though people preferring #2 are usually English speakers who are
    not really aware of the consequences. Like having E, Ε, Е three
    different identifies. One could try to maintain language-defined
    homographs in order to prevent mess, introducing even bigger mess...


    I'm an English speaker, and a Norwegian speaker (we have three extra
    letters, åøæ). And I am well aware of the potential complication of different Unicode code points with very similar (or even identical) glyphs.

    It can also be difficult for people to type, which can quickly be a pain
    for collaboration. How would you type "bøk", for example? That's
    "book" in Norwegian, and I have a key labelled "ø". James, on Linux,
    can use compose + / + o to get the letter. But for you on Windows, with
    a German keyboard layout (I'm guessing from your email address), I
    expect you are stuck with copy-and-paste from my post, or using the
    "character map" utility, or typing "alt+0248".

    Then there is the question of displaying the characters. I have a font
    that includes vast numbers of obscure symbols, so I could use ↀ for the
    Roman numeral for 1000 (using the traditional symbol, rather than the
    modern replacement of M). Other people reading this might not see it.

    All in all, non-ASCII letters in identifiers can pose a lot of
    challenges. But they are nonetheless important for people around the
    world, and despite the disadvantages, UTF-8 is far and away the best
    choice. You simply have to trust programmers to be sensible in their
    usage. (You need to to that anyway, even with ASCII - in many fonts, l,
    1 and I can be hard to distinguish, as can O and 0.)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dmitry A. Kazakov@21:1/5 to David Brown on Mon Aug 30 13:37:05 2021
    On 2021-08-30 11:50, David Brown wrote:
    On 30/08/2021 10:38, Dmitry A. Kazakov wrote:
    On 2021-08-30 10:13, David Brown wrote:

    There are, I think, only two sensible options here:

    1. Disallow any identifier letters outside of ASCII.
    2. Make everything UTF-8.

    Yes. Though people preferring #2 are usually English speakers who are
    not really aware of the consequences. Like having E, Ε, Е three
    different identifies. One could try to maintain language-defined
    homographs in order to prevent mess, introducing even bigger mess...

    I'm an English speaker, and a Norwegian speaker (we have three extra
    letters, åøæ). And I am well aware of the potential complication of different Unicode code points with very similar (or even identical) glyphs.

    It can also be difficult for people to type, which can quickly be a pain
    for collaboration. How would you type "bøk", for example? That's
    "book" in Norwegian, and I have a key labelled "ø". James, on Linux,
    can use compose + / + o to get the letter. But for you on Windows, with
    a German keyboard layout (I'm guessing from your email address), I
    expect you are stuck with copy-and-paste from my post, or using the "character map" utility, or typing "alt+0248".

    Right, character map is what I use.

    Germans have it easy way, you can drop diacritical marks ä=ae ö=oe ü=ue
    and the ligature SZ ß=ss.

    Then there is the question of displaying the characters. I have a font
    that includes vast numbers of obscure symbols, so I could use ↀ for the Roman numeral for 1000 (using the traditional symbol, rather than the
    modern replacement of M). Other people reading this might not see it.

    It is a lesser problem now than it was before. I remember the time
    Windows was unable to display most of special symbols.

    All in all, non-ASCII letters in identifiers can pose a lot of
    challenges. But they are nonetheless important for people around the
    world, and despite the disadvantages, UTF-8 is far and away the best
    choice. You simply have to trust programmers to be sensible in their
    usage. (You need to to that anyway, even with ASCII - in many fonts, l,
    1 and I can be hard to distinguish, as can O and 0.)

    Actually, this is again sort of Europocentric POV. In reality, if you
    have a truly international team with speakers outside Western Europe,
    you must agree on some strict rules regarding comments and identifiers.

    You might be able to remember a German or even a Czech word. Cyrillic
    would be rather more challenging. But what would you do with Armenian or Chinese?

    And the least common denominator is English.

    --
    Regards,
    Dmitry A. Kazakov
    http://www.dmitry-kazakov.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to David Brown on Mon Aug 30 13:04:28 2021
    On 30/08/2021 09:13, David Brown wrote:
    On 28/08/2021 16:35, James Harris wrote:
    On 24/08/2021 21:56, David Brown wrote:
    On 24/08/2021 21:06, Bart wrote:
    On 24/08/2021 18:25, James Harris wrote:

    These days why use calling conventions at all? Perhaps they are only >>>>> needed for when there's complete ignorance of the callee. The
    traditional concept of calling conventions may be pass\acute/e. ;-)

    James, aren't you using Linux?  The compose key makes it easy to write
    letters like é - it's just compose, ´, e - "passé".  (It's even easier >>> if you have a non-English keyboard layout, in Windows or Linux, as these >>> usually have "dead keys" for accents.)

    Thanks, I've now enabled the compose key though I wrote passé in the way
    I did as it's the way I am thinking of for my language - which, as it
    was unfamiliar to others was why I added the smiley.


    I don't imagine anyone is going to want to write "pass\acute/e" as an identifier in any language.

    It's for string literals!

    IMO programs and identifiers should use ascii, even in non-English
    languages.


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Dmitry A. Kazakov on Mon Aug 30 20:13:10 2021
    On 30/08/2021 13:37, Dmitry A. Kazakov wrote:
    On 2021-08-30 11:50, David Brown wrote:
    On 30/08/2021 10:38, Dmitry A. Kazakov wrote:
    On 2021-08-30 10:13, David Brown wrote:

    There are, I think, only two sensible options here:

    1. Disallow any identifier letters outside of ASCII.
    2. Make everything UTF-8.

    Yes. Though people preferring #2 are usually English speakers who are
    not really aware of the consequences. Like having E, Ε, Е three
    different identifies. One could try to maintain language-defined
    homographs in order to prevent mess, introducing even bigger mess...

    I'm an English speaker, and a Norwegian speaker (we have three extra
    letters, åøæ).  And I am well aware of the potential complication of
    different Unicode code points with very similar (or even identical)
    glyphs.

    It can also be difficult for people to type, which can quickly be a pain
    for collaboration.  How would you type "bøk", for example?  That's
    "book" in Norwegian, and I have a key labelled "ø".  James, on Linux,
    can use compose + / + o to get the letter.  But for you on Windows, with
    a German keyboard layout (I'm guessing from your email address), I
    expect you are stuck with copy-and-paste from my post, or using the
    "character map" utility, or typing "alt+0248".

    Right, character map is what I use.

    Germans have it easy way, you can drop diacritical marks ä=ae ö=oe ü=ue and the ligature SZ ß=ss.


    You can do that too in Norwegian (though people are not always
    consistent about their choices of transliteration), if you can't use the
    proper letters (you can also substitute the Swedish versions). But the preference is to use the correct letters.

    Then there is the question of displaying the characters.  I have a font
    that includes vast numbers of obscure symbols, so I could use ↀ for the
    Roman numeral for 1000 (using the traditional symbol, rather than the
    modern replacement of M).  Other people reading this might not see it.

    It is a lesser problem now than it was before. I remember the time
    Windows was unable to display most of special symbols.

    Slowly, in some ways, Windows has been catching up with the *nix world.


    All in all, non-ASCII letters in identifiers can pose a lot of
    challenges.  But they are nonetheless important for people around the
    world, and despite the disadvantages, UTF-8 is far and away the best
    choice.  You simply have to trust programmers to be sensible in their
    usage.  (You need to to that anyway, even with ASCII - in many fonts, l,
    1 and I can be hard to distinguish, as can O and 0.)

    Actually, this is again sort of Europocentric POV. In reality, if you
    have a truly international team with speakers outside Western Europe,
    you must agree on some strict rules regarding comments and identifiers.


    If you have an international team, then it is standard practice to keep everything in English. But most teams are not international. Why
    should a group of Greek or Japanese programmers be forced to write
    everything in a foreign language? You can view the keywords as fixed -
    almost like symbols, rather than words - but they may prefer to have
    other parts written in their own language.

    You might be able to remember a German or even a Czech word. Cyrillic
    would be rather more challenging. But what would you do with Armenian or Chinese?

    And the least common denominator is English.


    It is the least common denominator for most international groups, but
    not for most national teams.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to James Harris on Mon Aug 30 20:16:19 2021
    On 30/08/2021 14:04, James Harris wrote:
    On 30/08/2021 09:13, David Brown wrote:
    On 28/08/2021 16:35, James Harris wrote:
    On 24/08/2021 21:56, David Brown wrote:
    On 24/08/2021 21:06, Bart wrote:
    On 24/08/2021 18:25, James Harris wrote:

    These days why use calling conventions at all? Perhaps they are only >>>>>> needed for when there's complete ignorance of the callee. The
    traditional concept of calling conventions may be pass\acute/e. ;-)

    James, aren't you using Linux?  The compose key makes it easy to write >>>> letters like é - it's just compose, ´, e - "passé".  (It's even easier >>>> if you have a non-English keyboard layout, in Windows or Linux, as
    these
    usually have "dead keys" for accents.)

    Thanks, I've now enabled the compose key though I wrote passé in the way >>> I did as it's the way I am thinking of for my language - which, as it
    was unfamiliar to others was why I added the smiley.


    I don't imagine anyone is going to want to write "pass\acute/e" as an
    identifier in any language.

    It's for string literals!

    IMO programs and identifiers should use ascii, even in non-English
    languages.


    See the rest of the thread for a discussion on non-ASCII identifiers.
    (I am not suggesting that you implement them, or don't implement them -
    that's your choice. Some languages go one way, others go the other way.)

    But don't make up your own language for special characters in strings or comments. Again, UTF-8 is far and away the best option. If you feel
    that is a problem, then at least stick to an existing standard -
    HTML/XML character entities would almost certainly be the most
    convenient choice: "pass&eacute;".

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dmitry A. Kazakov@21:1/5 to David Brown on Mon Aug 30 21:10:52 2021
    On 2021-08-30 20:13, David Brown wrote:
    On 30/08/2021 13:37, Dmitry A. Kazakov wrote:

    It is a lesser problem now than it was before. I remember the time
    Windows was unable to display most of special symbols.

    Slowly, in some ways, Windows has been catching up with the *nix world.

    I must defend Windows. Linux adopted UTF-8 very late. I well remember
    the mess it had with 8-bit code pages.

    BTW, there still exist file utilities to check filenames in Linux. I had
    an old filesystem with some file names in German encoded in Latin-1. It
    was connected to a FreeNAS (BSD-based). These files caused mysterious
    FreeNAS crashes when a remote host tried to browse files over a network
    share. Once I fixed the names it almost stopped crashing. I ditched
    FreeNAS anyway in favor of Ubuntu.

    --
    Regards,
    Dmitry A. Kazakov
    http://www.dmitry-kazakov.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to David Brown on Mon Aug 30 20:36:27 2021
    On 30/08/2021 19:13, David Brown wrote:
    On 30/08/2021 13:37, Dmitry A. Kazakov wrote:

    Actually, this is again sort of Europocentric POV. In reality, if you
    have a truly international team with speakers outside Western Europe,
    you must agree on some strict rules regarding comments and identifiers.


    If you have an international team, then it is standard practice to keep everything in English. But most teams are not international. Why
    should a group of Greek or Japanese programmers be forced to write
    everything in a foreign language? You can view the keywords as fixed - almost like symbols, rather than words - but they may prefer to have
    other parts written in their own language.

    You might be able to remember a German or even a Czech word. Cyrillic
    would be rather more challenging. But what would you do with Armenian or
    Chinese?

    And the least common denominator is English.


    It is the least common denominator for most international groups, but
    not for most national teams.

    If they are using a mainstream language, then it's about more than using Unicode in identifiers:

    * Keywords are likely to be in English still

    * Standard type names will be English-based (and, in C, codes like %ll
    and -LL and INT_MAX)

    * The function names in the standard library will probably be English-based

    * Compiler option names may be English based (eg. --version)

    * Error messages from the compiler may be in English (I don't know how internationalised such programs are)

    * Most of the exported functions and enums of general-purpose libraries
    are likely to be in English (eg. SDL_BUTTON_LEFT)

    So I'd say it's hard to get away from English even if they wanted.

    But string literals and comments in source code: they can be anything;
    the language just needs to allow UTF8.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Dmitry A. Kazakov on Mon Aug 30 21:18:13 2021
    On 30/08/2021 21:10, Dmitry A. Kazakov wrote:
    On 2021-08-30 20:13, David Brown wrote:
    On 30/08/2021 13:37, Dmitry A. Kazakov wrote:

    It is a lesser problem now than it was before. I remember the time
    Windows was unable to display most of special symbols.

    Slowly, in some ways, Windows has been catching up with the *nix world.

    I must defend Windows. Linux adopted UTF-8 very late. I well remember
    the mess it had with 8-bit code pages.

    Windows also had a mess with 8-bit code pages.

    Windows /was/ earlier with Unicode, that's true - unfortunately, they
    picked UCS-2 and then got stuck with that instead of UTF-8. Linux
    picked UTF-8 by laziness, as pretty much everything involving strings
    (except displaying them) just works as before. There is no need to
    re-invent everything in a 16-bit manner, as Windows did, and there are
    no problems when it turns out 16 bits are not enough.


    BTW, there still exist file utilities to check filenames in Linux. I had
    an old filesystem with some file names in German encoded in Latin-1. It
    was connected to a FreeNAS (BSD-based). These files caused mysterious
    FreeNAS crashes when a remote host tried to browse files over a network share. Once I fixed the names it almost stopped crashing. I ditched
    FreeNAS anyway in favor of Ubuntu.


    FreeNAS is BSD, which is not Linux. Not that BSD has any problems with non-ASCII filenames either. An application might be made ASCII only,
    however, regardless of the system.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dmitry A. Kazakov@21:1/5 to David Brown on Mon Aug 30 21:37:29 2021
    On 2021-08-30 21:18, David Brown wrote:
    On 30/08/2021 21:10, Dmitry A. Kazakov wrote:
    On 2021-08-30 20:13, David Brown wrote:
    On 30/08/2021 13:37, Dmitry A. Kazakov wrote:

    It is a lesser problem now than it was before. I remember the time
    Windows was unable to display most of special symbols.

    Slowly, in some ways, Windows has been catching up with the *nix world.

    I must defend Windows. Linux adopted UTF-8 very late. I well remember
    the mess it had with 8-bit code pages.

    Windows also had a mess with 8-bit code pages.

    Oh, yes.

    If I correctly remember, you needed "professional" rather than "home" in
    order to switch the system default.

    Windows /was/ earlier with Unicode, that's true - unfortunately, they
    picked UCS-2 and then got stuck with that instead of UTF-8.

    Worse, later they changed UCS-2 to UTF-16 under the rug. All system
    calls are duplicated, one ASCII A-call, another UTF-16 W-call.

    Linux
    picked UTF-8 by laziness, as pretty much everything involving strings
    (except displaying them) just works as before. There is no need to
    re-invent everything in a 16-bit manner, as Windows did, and there are
    no problems when it turns out 16 bits are not enough.

    It is UTF-16 now. But of course, UTF-16 is a monstrosity compared with
    UTF-8. Fortunately third party libraries ignore the mess. E.g. GTK port
    for Windows converts all filenames to UTF-8.

    --
    Regards,
    Dmitry A. Kazakov
    http://www.dmitry-kazakov.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to David Brown on Mon Aug 30 20:52:59 2021
    On 30/08/2021 19:16, David Brown wrote:
    On 30/08/2021 14:04, James Harris wrote:
    On 30/08/2021 09:13, David Brown wrote:

    ...

    I don't imagine anyone is going to want to write "pass\acute/e" as an
    identifier in any language.

    It's for string literals!

    IMO programs and identifiers should use ascii, even in non-English
    languages.


    See the rest of the thread for a discussion on non-ASCII identifiers.
    (I am not suggesting that you implement them, or don't implement them - that's your choice. Some languages go one way, others go the other way.)

    But don't make up your own language for special characters in strings or comments. Again, UTF-8 is far and away the best option. If you feel
    that is a problem, then at least stick to an existing standard -
    HTML/XML character entities would almost certainly be the most
    convenient choice: "pass&eacute;".

    Any UTF is no good for source code - e.g. for reasons Dmitry mentioned.
    In addition, characters which people cannot identify or recognise should
    not be part of source code because they make it unreadable.

    I am considering allowing external identifier names to include unusual characters so as to link with routines which use such characters - but
    the programmer would have to write the identifiers in ascii characters.

    I doubt I'd use HTML entities as they are a mess (e.g. having multiple
    names for the same character) but I would need the names to come from an
    online database.


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to David Brown on Mon Aug 30 20:59:05 2021
    On 29/08/2021 10:36, David Brown wrote:
    On 29/08/2021 02:01, Bart wrote:
    On 28/08/2021 17:21, David Brown wrote:
    On 27/08/2021 22:07, Bart wrote:

    As James suggested, the object files are basically just the internal
    representation of the compilation before code generation.

    Then 'object file' is a complete misnomer.

    Yes, that's a fair comment. "Linking" is also a misnomer in link-time optimisation. The names are historical, rather than technically accurate.

    This is a first: three of us in agreement!

    In my outline design the IR does a lot of the heavy lifting, including
    being the preferred form for distributing software.


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Dmitry A. Kazakov on Tue Aug 31 09:36:51 2021
    On 30/08/2021 21:37, Dmitry A. Kazakov wrote:
    On 2021-08-30 21:18, David Brown wrote:

    Windows /was/ earlier with Unicode, that's true - unfortunately, they
    picked UCS-2 and then got stuck with that instead of UTF-8.

    Worse, later they changed UCS-2 to UTF-16 under the rug. All system
    calls are duplicated, one ASCII A-call, another UTF-16 W-call.

    Linux
    picked UTF-8 by laziness, as pretty much everything involving strings
    (except displaying them) just works as before.  There is no need to
    re-invent everything in a 16-bit manner, as Windows did, and there are
    no problems when it turns out 16 bits are not enough.

    It is UTF-16 now. But of course, UTF-16 is a monstrosity compared with
    UTF-8. Fortunately third party libraries ignore the mess. E.g. GTK port
    for Windows converts all filenames to UTF-8.


    My understanding (which may be wrong, as I don't do much Windows
    programming) is that there is a gradual move to UTF-8 support in
    Windows. These things take time of course, and while there is no doubt
    that Microsoft backed the wrong horse here with 16-bit encodings, they
    made the right choice at the time. I blame MS for a lot of bad things,
    but not this one! And they are not alone - Java, QT and Python are
    other big players that picked UCS-2, leading to much regret and slow
    progress towards a changeover to UTF-8.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dmitry A. Kazakov@21:1/5 to David Brown on Tue Aug 31 11:33:55 2021
    On 2021-08-31 09:36, David Brown wrote:

    My understanding (which may be wrong, as I don't do much Windows
    programming) is that there is a gradual move to UTF-8 support in
    Windows.

    I think you are right. Actually they could proclaim A-calls UTF-8 as
    they did with W-calls. That would break some legacy code, only French
    will be annoyed. Germans will be apathic, small European countries
    resigned, I guess...

    These things take time of course, and while there is no doubt
    that Microsoft backed the wrong horse here with 16-bit encodings, they
    made the right choice at the time.

    I blame MS for a lot of bad things,
    but not this one! And they are not alone - Java, QT and Python are
    other big players that picked UCS-2, leading to much regret and slow
    progress towards a changeover to UTF-8.

    I believe that UTF-8 was introduced later. It is impossible that
    everybody was wrong. E.g. Ada also adopted UCS-2 in 1995. Later on Ada
    added UCS-4. Just same mess as with Windows, alas. But most Ada
    programmers ignore UCS-2/4 and use UTF-8 where the standard mandates
    Latin-1.

    --
    Regards,
    Dmitry A. Kazakov
    http://www.dmitry-kazakov.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Dmitry A. Kazakov on Tue Aug 31 13:06:18 2021
    On 31/08/2021 11:33, Dmitry A. Kazakov wrote:
    On 2021-08-31 09:36, David Brown wrote:

    My understanding (which may be wrong, as I don't do much Windows
    programming) is that there is a gradual move to UTF-8 support in
    Windows.

    I think you are right. Actually they could proclaim A-calls UTF-8 as
    they did with W-calls. That would break some legacy code, only French
    will be annoyed. Germans will be apathic, small European countries
    resigned, I guess...

    You are just listing the advantages :-)


    These things take time of course, and while there is no doubt
    that Microsoft backed the wrong horse here with 16-bit encodings, they
    made the right choice at the time.

    I blame MS for a lot of bad things,
    but not this one!  And they are not alone - Java, QT and Python are
    other big players that picked UCS-2, leading to much regret and slow
    progress towards a changeover to UTF-8.

    I believe that UTF-8 was introduced later.

    Yes. Unicode was first conceives as 16-bit, with UCS-2. Then they
    started extending it beyond 16-bit, and had to make UCS-4. UTF-16 was developed as a way to access the rest of the characters with 16-bit code
    units, and then I think UTF-8 came after that. (UTF-32 is the same as
    UCS-4.)

    It is impossible that
    everybody was wrong.

    They were not wrong at the time - it was later changes that made them
    wrong. It is a sometimes unfortunate fact of life that backwards
    compatibility is king, and it's hard to undo decisions even when we know
    things could have been better. (That's why x86 is popular, despite
    being an appallingly bad architecture, it's why we have Windows, it's
    why we have qwerty keyboards, it's why we all use English with its silly inconsistent spelling.)

    E.g. Ada also adopted UCS-2 in 1995. Later on Ada
    added UCS-4. Just same mess as with Windows, alas. But most Ada
    programmers ignore UCS-2/4 and use UTF-8 where the standard mandates
    Latin-1.


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to David Brown on Tue Aug 31 19:37:35 2021
    On 30/08/2021 19:13, David Brown wrote:
    On 30/08/2021 13:37, Dmitry A. Kazakov wrote:
    On 2021-08-30 11:50, David Brown wrote:

    ...

    All in all, non-ASCII letters in identifiers can pose a lot of
    challenges.  But they are nonetheless important for people around the
    world, and despite the disadvantages, UTF-8 is far and away the best
    choice.  You simply have to trust programmers to be sensible in their
    usage.  (You need to to that anyway, even with ASCII - in many fonts, l, >>> 1 and I can be hard to distinguish, as can O and 0.)

    Actually, this is again sort of Europocentric POV. In reality, if you
    have a truly international team with speakers outside Western Europe,
    you must agree on some strict rules regarding comments and identifiers.


    If you have an international team, then it is standard practice to keep everything in English. But most teams are not international. Why
    should a group of Greek or Japanese programmers be forced to write
    everything in a foreign language? You can view the keywords as fixed - almost like symbols, rather than words - but they may prefer to have
    other parts written in their own language.

    AISI: Have the master copy of /all/ programs in American English, and
    support translation of identifier names, comments, string literals etc
    to other languages.


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to James Harris on Wed Sep 1 10:16:13 2021
    On 31/08/2021 20:37, James Harris wrote:
    On 30/08/2021 19:13, David Brown wrote:
    On 30/08/2021 13:37, Dmitry A. Kazakov wrote:
    On 2021-08-30 11:50, David Brown wrote:

    ...

    All in all, non-ASCII letters in identifiers can pose a lot of
    challenges.  But they are nonetheless important for people around the >>>> world, and despite the disadvantages, UTF-8 is far and away the best
    choice.  You simply have to trust programmers to be sensible in their >>>> usage.  (You need to to that anyway, even with ASCII - in many
    fonts, l,
    1 and I can be hard to distinguish, as can O and 0.)

    Actually, this is again sort of Europocentric POV. In reality, if you
    have a truly international team with speakers outside Western Europe,
    you must agree on some strict rules regarding comments and identifiers.


    If you have an international team, then it is standard practice to keep
    everything in English.  But most teams are not international.  Why
    should a group of Greek or Japanese programmers be forced to write
    everything in a foreign language?  You can view the keywords as fixed -
    almost like symbols, rather than words - but they may prefer to have
    other parts written in their own language.

    AISI: Have the master copy of /all/ programs in American English, and
    support translation of identifier names, comments, string literals etc
    to other languages.


    Why would anyone choose the dialect of one particular ex colony, rather
    than using /real/ English?

    I know that in the USA it is common to think that America is the only
    country, or at least the only one worth considering, but the rest of the
    world begs to differ.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From antispam@math.uni.wroc.pl@21:1/5 to David Brown on Wed Sep 1 15:18:25 2021
    David Brown <david.brown@hesbynett.no> wrote:
    On 31/08/2021 11:33, Dmitry A. Kazakov wrote:
    On 2021-08-31 09:36, David Brown wrote:

    My understanding (which may be wrong, as I don't do much Windows
    programming) is that there is a gradual move to UTF-8 support in
    Windows.

    I think you are right. Actually they could proclaim A-calls UTF-8 as
    they did with W-calls. That would break some legacy code, only French
    will be annoyed. Germans will be apathic, small European countries resigned, I guess...

    You are just listing the advantages :-)


    These things take time of course, and while there is no doubt
    that Microsoft backed the wrong horse here with 16-bit encodings, they
    made the right choice at the time.

    I blame MS for a lot of bad things,
    but not this one!? And they are not alone - Java, QT and Python are
    other big players that picked UCS-2, leading to much regret and slow
    progress towards a changeover to UTF-8.

    I believe that UTF-8 was introduced later.

    Yes. Unicode was first conceives as 16-bit, with UCS-2. Then they
    started extending it beyond 16-bit, and had to make UCS-4. UTF-16 was developed as a way to access the rest of the characters with 16-bit code units, and then I think UTF-8 came after that. (UTF-32 is the same as UCS-4.)

    Well, there was insane ISO proposal, which was then partially
    unified with Unicode: ISO had 31-bit characters, with first
    2^16 codes (BMP) identical to Unicode. At that time ISO proposed
    their 8-bit transportation format. Around this time UTF-8 was
    born, as simpler alternative to ISO format. Later, ISO
    agreed to limit charaters to about 20 bits, Unicode agreed to expand
    to match and UTF-16 was born. So, in fact UTF-8 came first
    and UTF-16 later. Of course, 16-bit Unicode was before UTF-8.

    --
    Waldek Hebisch

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to David Brown on Mon Sep 6 19:30:12 2021
    On 30/08/2021 10:50, David Brown wrote:
    On 30/08/2021 10:38, Dmitry A. Kazakov wrote:
    On 2021-08-30 10:13, David Brown wrote:

    There are, I think, only two sensible options here:

    1. Disallow any identifier letters outside of ASCII.
    2. Make everything UTF-8.

    IMO there's a better option:

    3. Use purely ASCII but allow escape sequences to be coded in ASCII.

    ...

    It can also be difficult for people to type, which can quickly be a pain
    for collaboration. How would you type "bøk", for example?

    I'd could allow that to be used in string literals with something like

    "b\slash:o/k"

    As well as string literals it is unlikely but possible that a program
    written in my language would have to call a function from another
    language which has been written in Norwegian where the function name
    included a non-ASCII character. For that, I am considering allowing

    \slash:o/

    and similar to appear in the name of external functions. It would be
    ugly but clear. And programmers could limit the ugliness to one place by defining an alias as in

    namedef book = b\slash:o/k

    book()

    Wouldn't that be better than either pure ASCII or allowing Unicode?

    Have I got all bases covered? I hope so!


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to David Brown on Mon Sep 6 19:34:09 2021
    On 30/08/2021 19:13, David Brown wrote:
    On 30/08/2021 13:37, Dmitry A. Kazakov wrote:

    ...

    Actually, this is again sort of Europocentric POV. In reality, if you
    have a truly international team with speakers outside Western Europe,
    you must agree on some strict rules regarding comments and identifiers.

    ...

    And the least common denominator is English.


    It is the least common denominator for most international groups, but
    not for most national teams.

    A program whose master copy was in a well-known language - such as
    American English - would be a lot easier to translate to other languages
    than normal prose.


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to James Harris on Mon Sep 6 23:12:32 2021
    On 06/09/2021 20:30, James Harris wrote:
    On 30/08/2021 10:50, David Brown wrote:
    On 30/08/2021 10:38, Dmitry A. Kazakov wrote:
    On 2021-08-30 10:13, David Brown wrote:

    There are, I think, only two sensible options here:

    1. Disallow any identifier letters outside of ASCII.
    2. Make everything UTF-8.

    IMO there's a better option:

    3. Use purely ASCII but allow escape sequences to be coded in ASCII.

    ...

    It can also be difficult for people to type, which can quickly be a pain
    for collaboration.  How would you type "bøk", for example?

    I'd could allow that to be used in string literals with something like

      "b\slash:o/k"

    As well as string literals it is unlikely but possible that a program
    written in my language would have to call a function from another
    language which has been written in Norwegian where the function name
    included a non-ASCII character. For that, I am considering allowing

      \slash:o/

    and similar to appear in the name of external functions. It would be
    ugly but clear. And programmers could limit the ugliness to one place by defining an alias as in

      namedef book = b\slash:o/k

      book()

    Wouldn't that be better than either pure ASCII or allowing Unicode?

    Have I got all bases covered? I hope so!


    All the bases except for the ones concerning what people writing other languages would actually see as usable. If this is your "solution", you
    are better off saying "pure 7-bit ASCII only" and be done with it,
    because no one would /ever/ want to use that.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to David Brown on Tue Sep 7 08:13:04 2021
    On 06/09/2021 22:12, David Brown wrote:
    On 06/09/2021 20:30, James Harris wrote:
    On 30/08/2021 10:50, David Brown wrote:
    On 30/08/2021 10:38, Dmitry A. Kazakov wrote:
    On 2021-08-30 10:13, David Brown wrote:

    There are, I think, only two sensible options here:

    1. Disallow any identifier letters outside of ASCII.
    2. Make everything UTF-8.

    IMO there's a better option:

    3. Use purely ASCII but allow escape sequences to be coded in ASCII.

    ...

    it is unlikely but possible that a program
    written in my language would have to call a function from another
    language which has been written in Norwegian where the function name
    included a non-ASCII character. For that, I am considering allowing

      \slash:o/

    and similar to appear in the name of external functions. It would be
    ugly but clear. And programmers could limit the ugliness to one place by
    defining an alias as in

      namedef book = b\slash:o/k

      book()

    Wouldn't that be better than either pure ASCII or allowing Unicode?

    Have I got all bases covered? I hope so!


    All the bases except for the ones concerning what people writing other languages would actually see as usable.

    This is not for writing in other languages. Under the scheme I have in
    mind Norwegians could write and edit a program in Norwegian! They could
    use Norwegian identifiers and Norwegian comments; they could type
    Norwegian characters in string literals and even in external
    identifiers, if they felt it necessary. But there would be a table-based
    (i.e. programmer-configured) translation between the Norwegian version
    of a program and the American English master version. (E.g. a table
    would be used to allow bidirectional translation of identifier name
    "bøk" to or from "book".) So Norwegian speakers and English speakers
    should be able to work on the same program.

    The issue I was talking about was just about linking.

    Linking to other modules written in my language should be easy as they,
    also, would have to have American English master copies with matching
    external identifier names.

    Linking to programs written in other languages should also be doable
    (with a bit of control of calling sequences) if those programs use Ascii identifiers.

    It's only if someone wanted to link with a program written in another programming language which used a non-Ascii identifier that the
    aforementioned escape sequence would be needed to refer to that
    identifier. So a rare case, indeed, I believe, but one which is possible.


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to David Brown on Fri Oct 22 16:09:40 2021
    On 01/09/2021 09:16, David Brown wrote:
    On 31/08/2021 20:37, James Harris wrote:
    On 30/08/2021 19:13, David Brown wrote:
    On 30/08/2021 13:37, Dmitry A. Kazakov wrote:
    On 2021-08-30 11:50, David Brown wrote:

    ...

    All in all, non-ASCII letters in identifiers can pose a lot of
    challenges.  But they are nonetheless important for people around the >>>>> world, and despite the disadvantages, UTF-8 is far and away the best >>>>> choice.  You simply have to trust programmers to be sensible in their >>>>> usage.  (You need to to that anyway, even with ASCII - in many
    fonts, l,
    1 and I can be hard to distinguish, as can O and 0.)

    Actually, this is again sort of Europocentric POV. In reality, if you
    have a truly international team with speakers outside Western Europe,
    you must agree on some strict rules regarding comments and identifiers. >>>>

    If you have an international team, then it is standard practice to keep
    everything in English.  But most teams are not international.  Why
    should a group of Greek or Japanese programmers be forced to write
    everything in a foreign language?  You can view the keywords as fixed - >>> almost like symbols, rather than words - but they may prefer to have
    other parts written in their own language.

    AISI: Have the master copy of /all/ programs in American English, and
    support translation of identifier names, comments, string literals etc
    to other languages.


    Why would anyone choose the dialect of one particular ex colony, rather
    than using /real/ English?

    I know that in the USA it is common to think that America is the only country, or at least the only one worth considering, but the rest of the world begs to differ.


    AmE is more heavily used - especially in IT - so it makes more sense to
    use it as a lingua franca. For example, an identifier might be called
    TextColor rather than TextColour.

    There is precedent for that kind of choice. Music terms such as andante
    and pianissimo are in Italian. Speakers of other languages still work
    with them.


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to James Harris on Fri Oct 22 20:20:38 2021
    On 22/10/2021 16:09, James Harris wrote:
    On 01/09/2021 09:16, David Brown wrote:

    Why would anyone choose the dialect of one particular ex colony, rather
    than using /real/ English?

    I know that in the USA it is common to think that America is the only
    country, or at least the only one worth considering, but the rest of the
    world begs to differ.


    AmE is more heavily used - especially in IT - so it makes more sense to
    use it as a lingua franca. For example, an identifier might be called TextColor rather than TextColour.

    There is precedent for that kind of choice. Music terms such as andante
    and pianissimo are in Italian. Speakers of other languages still work
    with them.

    Those are pure Italian.

    Not Anglicised-Italian that is going to annoy natives of that country if
    they were obliged to use them.

    For example, 'panini' used to refer to a single bun, when 'panini' is
    actually plural.

    So I like to write Colour not Color. We invented the language after all!

    (By 'we' I mean the British, though my passport says otherwise.)

    However, I do use 'disk' and 'program' rather than 'disc' and
    'programme', as the former are now firmly associated with computing.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to Bart on Sat Oct 23 09:59:02 2021
    On 22/10/2021 20:20, Bart wrote:
    On 22/10/2021 16:09, James Harris wrote:
    On 01/09/2021 09:16, David Brown wrote:

    Why would anyone choose the dialect of one particular ex colony, rather
    than using /real/ English?

    I know that in the USA it is common to think that America is the only
    country, or at least the only one worth considering, but the rest of the >>> world begs to differ.


    AmE is more heavily used - especially in IT - so it makes more sense
    to use it as a lingua franca. For example, an identifier might be
    called TextColor rather than TextColour.

    There is precedent for that kind of choice. Music terms such as
    andante and pianissimo are in Italian. Speakers of other languages
    still work with them.

    Those are pure Italian.

    Not Anglicised-Italian that is going to annoy natives of that country if
    they were obliged to use them.

    For example, 'panini' used to refer to a single bun, when 'panini' is actually plural.

    Similar with graffiti. Or paparazi.


    So I like to write Colour not Color. We invented the language after all!

    That's all very well but with most IT standards coming out of America
    the spelling 'Color' is used - and inbuilt - more frequently. Surely
    it's better to have one spelling than to continually ask which spelling
    is used in a certain case.

    One could still present to users the spelling which suits them while
    program object names would be in American English. That applies in the filesystem, too. For example, yesterday I got a message about not being
    able to move files to my 'rubbish bin'. American users get told about
    the 'trash can'. French users possibly get told about the 'poubelle'.
    But the folder in the file system still has the American name
    '.Trash-UID'. That's easier to work with than renaming the folder for
    each locale, isn't it?!!!


    (By 'we' I mean the British, though my passport says otherwise.)

    Curious. If you can reply without giving too much away what does it say?


    However, I do use 'disk' and 'program' rather than 'disc' and
    'programme', as the former are now firmly associated with computing.

    As long as you don't try to catch fishes. ;-)

    BTW, for computer programs, at school I was taught that 'program' was
    correct.


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bart@21:1/5 to James Harris on Sat Oct 23 11:09:06 2021
    On 23/10/2021 09:59, James Harris wrote:
    On 22/10/2021 20:20, Bart wrote:
    On 22/10/2021 16:09, James Harris wrote:
    On 01/09/2021 09:16, David Brown wrote:

    Why would anyone choose the dialect of one particular ex colony, rather >>>> than using /real/ English?

    I know that in the USA it is common to think that America is the only
    country, or at least the only one worth considering, but the rest of
    the
    world begs to differ.


    AmE is more heavily used - especially in IT - so it makes more sense
    to use it as a lingua franca. For example, an identifier might be
    called TextColor rather than TextColour.

    There is precedent for that kind of choice. Music terms such as
    andante and pianissimo are in Italian. Speakers of other languages
    still work with them.

    Those are pure Italian.

    Not Anglicised-Italian that is going to annoy natives of that country
    if they were obliged to use them.

    For example, 'panini' used to refer to a single bun, when 'panini' is
    actually plural.

    Similar with graffiti. Or paparazi.


    So I like to write Colour not Color. We invented the language after all!

    That's all very well but with most IT standards coming out of America
    the spelling 'Color' is used - and inbuilt - more frequently. Surely
    it's better to have one spelling than to continually ask which spelling
    is used in a certain case.


    This is a line from one of my interface files for WinAPI functions:

    windows function "GetSysColor" as getsyscolour (wt_int)wt_dword

    So I can refer to it as getsyscolor (proper case can be dropped), or getsyscolour; I use the latter. Typing 'color' makes it look like I
    can't spell.

    One could still present to users the spelling which suits them while
    program object names would be in American English. That applies in the filesystem, too. For example, yesterday I got a message about not being
    able to move files to my 'rubbish bin'. American users get told about
    the 'trash can'. French users possibly get told about the 'poubelle'.
    But the folder in the file system still has the American name
    '.Trash-UID'. That's easier to work with than renaming the folder for
    each locale, isn't it?!!!

    The APIs I've made available use British English spellings of words like 'colour'. (I can't think of any other examples; terms like 'windscreen'
    or 'boot' don't really come up in my libraries.)

    I don't think there's any need to pander to American spellings and make
    that form of English even more dominant. It is annoying that
    spell-checkers on various sites default to US dictionaries so don't like
    my -ise endings or 'll's in certain workds.



    (By 'we' I mean the British, though my passport says otherwise.)

    Curious. If you can reply without giving too much away what does it say?

    I gave a hint in my post...

    However, I do use 'disk' and 'program' rather than 'disc' and
    'programme', as the former are now firmly associated with computing.

    As long as you don't try to catch fishes. ;-)

    BTW, for computer programs, at school I was taught that 'program' was correct.

    'Programme' is more what the BBC produces; it doesn't sound right for
    computer code. I'm still trying to figure out the fishes..

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rod Pemberton@21:1/5 to James Harris on Sun Nov 7 20:48:50 2021
    On Fri, 27 Aug 2021 16:04:03 +0100
    James Harris <james.harris.1@gmail.com> wrote:

    I've been taking the top off a chimney stack

    Why?

    It's letting water in.

    I was thinking to remove the whole chimney stack as it is no longer
    used but then I wondered whether the local authority might tell me to reinstate it. That would not be good, especially as the bricks I've
    removed so far are soft and are breaking. So current idea is to
    remove the dodgy top layer or two and cap it while we have some dry
    weather. (Though I am very wary about being able to lift a 2' square
    concrete cap up the ladder without putting so much sideways pressure
    on the stack such that it falls over!)


    Well, if you haven't already, you might just check your home owner's
    insurance. The insurance company might prohibit "major" construction
    jobs being done by home owner.

    <end O/T>

    --
    Is this the year that Oregon ceases to exist?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rod Pemberton@21:1/5 to David Brown on Sun Nov 7 21:21:40 2021
    On Wed, 1 Sep 2021 10:16:13 +0200
    David Brown <david.brown@hesbynett.no> wrote:

    On 31/08/2021 20:37, James Harris wrote:
    On 30/08/2021 19:13, David Brown wrote:
    On 30/08/2021 13:37, Dmitry A. Kazakov wrote:
    On 2021-08-30 11:50, David Brown wrote:

    ...

    All in all, non-ASCII letters in identifiers can pose a lot of
    challenges.  But they are nonetheless important for people
    around the world, and despite the disadvantages, UTF-8 is far
    and away the best choice.  You simply have to trust programmers
    to be sensible in their usage.  (You need to to that anyway,
    even with ASCII - in many fonts, l,
    1 and I can be hard to distinguish, as can O and 0.)

    Actually, this is again sort of Europocentric POV. In reality, if
    you have a truly international team with speakers outside Western
    Europe, you must agree on some strict rules regarding comments
    and identifiers.


    If you have an international team, then it is standard practice to
    keep everything in English.  But most teams are not international.
    Why should a group of Greek or Japanese programmers be forced to
    write everything in a foreign language?  You can view the keywords
    as fixed - almost like symbols, rather than words - but they may
    prefer to have other parts written in their own language.

    AISI: Have the master copy of /all/ programs in American English,
    and support translation of identifier names, comments, string
    literals etc to other languages.


    Why would anyone choose the dialect of one particular ex colony,
    rather than using /real/ English?

    I always wondered why people in France or Spain or Italy would choose
    to program in a programming language where they had to first learn a
    foreign language, such as English, to do so. Ditto China, Japan,
    ...

    It's enough of a task to learn to program, but to then need to learn a
    foreing language too? In C, they could place a bunch of #define's to
    convert English to French or Spanish or Italian, but that wouldn't work
    so well with Mandarin or Japanese, etc.

    Now, if the code was save as tokenized, then it could rather easily be displayed for multiple languages.

    I know that in the USA it is common to think that America is the only country, or at least the only one worth considering, but the rest of
    the world begs to differ.

    If you meant to insult James here, I think you failed. IIRC, he's in
    the U.K. So, he's probably British. On the other hand, I'm in the
    U.S., descended from British ancestors (AFAIK), and, yes, the U.S.
    population and it's news media isn't generally globally oriented. We
    often don't even know what is going on in Mexico or Canada. That is
    often, unfortunately, portrayed as self-centered or narcissistic. In
    reality, it's more of a too much to do, too much to enjoy, too little
    time for it all, kind of issue, combined with a non-global mentality.

    --
    Is this the year that Oregon ceases to exist?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Harris@21:1/5 to Rod Pemberton on Mon Nov 8 08:54:26 2021
    On 08/11/2021 02:21, Rod Pemberton wrote:
    On Wed, 1 Sep 2021 10:16:13 +0200
    David Brown <david.brown@hesbynett.no> wrote:

    ...

    [OT comments below]

    I know that in the USA it is common to think that America is the only
    country, or at least the only one worth considering, but the rest of
    the world begs to differ.

    If you meant to insult James here, I think you failed. IIRC, he's in
    the U.K. So, he's probably British.

    Yes, I am British. English, in fact.

    On the other hand, I'm in the
    U.S., descended from British ancestors (AFAIK), and, yes, the U.S.
    population and it's news media isn't generally globally oriented. We
    often don't even know what is going on in Mexico or Canada. That is
    often, unfortunately, portrayed as self-centered or narcissistic. In reality, it's more of a too much to do, too much to enjoy, too little
    time for it all, kind of issue, combined with a non-global mentality.


    You may be surprised to find how similar it is in the UK. The TV media
    here have narrow viewpoints, often focussing on UK and US events (or
    causes celebres that they have, themselves, created) and telling us
    little about what's happening in mainland Europe, Asia or Africa etc.
    It's exasperating and leaves viewers ignorant of important issues in the
    wider world.


    --
    James Harris

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dmitry A. Kazakov@21:1/5 to James Harris on Mon Nov 8 10:01:46 2021
    On 2021-11-08 09:54, James Harris wrote:
    On 08/11/2021 02:21, Rod Pemberton wrote:

    On the other hand, I'm in the
    U.S., descended from British ancestors (AFAIK), and, yes, the U.S.
    population and it's news media isn't generally globally oriented.  We
    often don't even know what is going on in Mexico or Canada.  That is
    often, unfortunately, portrayed as self-centered or narcissistic.  In
    reality, it's more of a too much to do, too much to enjoy, too little
    time for it all, kind of issue, combined with a non-global mentality.

    You may be surprised to find how similar it is in the UK. The TV media
    here have narrow viewpoints, often focussing on UK and US events (or
    causes celebres that they have, themselves, created) and telling us
    little about what's happening in mainland Europe, Asia or Africa etc.

    And why do you think that mainland Europe is any different in that respect?

    It's exasperating and leaves viewers ignorant of important issues in the wider world.

    Do not tell anybody, it is a state secret, there is no important issues
    in the wider world!

    --
    Regards,
    Dmitry A. Kazakov
    http://www.dmitry-kazakov.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to James Harris on Mon Nov 8 16:18:48 2021
    On 08/11/2021 09:54, James Harris wrote:
    On 08/11/2021 02:21, Rod Pemberton wrote:
    On Wed, 1 Sep 2021 10:16:13 +0200
    David Brown <david.brown@hesbynett.no> wrote:

    ...

    [OT comments below]

    I know that in the USA it is common to think that America is the only
    country, or at least the only one worth considering, but the rest of
    the world begs to differ.

    If you meant to insult James here, I think you failed.  IIRC, he's in
    the U.K.  So, he's probably British.

    Yes, I am British. English, in fact.

    I did not intend to insult anyone - I intended to provoke US-centered
    thinkers to take a wider view through a tongue-in-cheek comment. But
    while I am used to Americans (US Americans, to be precise) being unaware
    of other countries or that their language is just one variation of
    English, and I am used to British people (especially English)
    considering the UK to be naturally superior to the rest of the world, it
    is not often one comes across a Brit who is convinced the American
    version of English should be considered the "main" language.


    On the other hand, I'm in the
    U.S., descended from British ancestors (AFAIK), and, yes, the U.S.
    population and it's news media isn't generally globally oriented.  We
    often don't even know what is going on in Mexico or Canada.  That is
    often, unfortunately, portrayed as self-centered or narcissistic.  In
    reality, it's more of a too much to do, too much to enjoy, too little
    time for it all, kind of issue, combined with a non-global mentality.


    You may be surprised to find how similar it is in the UK. The TV media
    here have narrow viewpoints, often focussing on UK and US events (or
    causes celebres that they have, themselves, created) and telling us
    little about what's happening in mainland Europe, Asia or Africa etc.
    It's exasperating and leaves viewers ignorant of important issues in the wider world.



    I live in Norway, and see things from a little bit more outside. There
    is a clear tendency that the bigger a country is, the less it bothers
    about what is happening in smaller countries. However, the USA is an
    outlier here amongst Western democracies (as it is in many aspects) -
    not only do most Americans care little about what happens outside their borders, but a substantial number /know/ very little about the world
    outside of the USA. Many have trouble even naming or placing other
    countries on a world map. I think a lot of it is the public school
    system, where it seems to be more about surviving to adulthood than
    getting an education. (Of course this does not apply to everyone in the
    USA.) The UK is not as bad - though it is trying to follow the USA
    here. At least in the UK there is always the BBC for news - it's not
    perfect, but it is pretty good.

    Social media is, unfortunately, just making things worse - many people
    are getting more and more disconnected from reality as their main source
    of information has topics based on what they and their "friends" have
    looked at before, with a bias towards controversial topics since those
    incite more time spent on the platform.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)