On 29/08/2021 14:12, Dmitry A. Kazakov wrote:
On 2021-08-29 14:49, Bart wrote:
On 29/08/2021 12:34, Dmitry A. Kazakov wrote:
On 2021-08-29 13:04, Bart wrote:
BTW what peripheral device needs 200MB of code?
Modern protocols are extremely complicated as well as the end
devices. Consider a radiator thermostat. It is a very simple device.
Yet it has hundred parameters, a dozen of modes, a weekly schedule
you must be able to query and program. So you can imagine the
complexity of its protocol. If you are very lucky that would be a
vendor-specific protocol. If it is a "standard" protocol you are in
a deep trouble. The standard protocols are gigantic piles of cra*p.
You can take a look on AMQP or any of ASN.1 based protocols to get
an impression. ASN.1 description of certificate files is almost
comical, if you do not need to implement it.
Worse, you could not throw the useless stuff out, because you must
certify your implementation of the protocol.
On top of that come configuration stuff you must address in the GUI,
in the persistent storage. The on-line data you have to handle and
log and so on. Procedures to replace defective device, flash the
device's firmware.
Then you have not just one device, you have an array of, e.g.
several radiator thermostats and a dozen of other device types, e.g.
shutter contacts, wall panels, sensors etc.
By my measure, 200MB would equate to (very roughly) 20M lines of code
You must count the language run-time and other system libraries. E.g.
libc is 1.6MB, SQLite3 is 1.3MB, GTK is about 25MB and so on.
GTK would be statically linked into an application (which I thought you
said was to do with peripherals)?
That doesn't make any sense. So if 50 apps all needed GTK, each would
carry their own copies. And if several are running at the same time,
there will be multiple copies of the code in memory.
However, suppose 50MB of that 200MB /was/ GTK.
It seems GTK itself
already is logically divided into dozens of separate libraries.
This is the point I made some posts ago.
On 2021-08-29 11:36, David Brown wrote:
On 29/08/2021 02:01, Bart wrote:
So 10 million lines of code represents a single 100MB program,
approximately.
The biggest single executable I see on my machine (without digging too
hard) is 25 MB. I have also found a shared library at 125 MB.
If you use GCC and generic instances put in a shared library, you easily
come to such numbers. GCC generates lots of stuff.
Funny thing, you cannot even build some of such shared libraries under Windows because the number of exported symbols easily exceeds 2**16-1 (Windows limit). You must split the library into parts...
I also did not mean to imply that these big builds result in a single
binary - they are often split into multiple "shared" libraries. (I put
"shared" in quotations, because the libraries are typically dedicated to
the program rather than shared by other applications.)Â This can be
convenient during development, building and testing.
100-200MB is a medium-sized production application: peripheral devices,
HTTP server, database, cloud connectivity, user management, things start
to explode quickly.
On 29/08/2021 13:24, antispam@math.uni.wroc.pl wrote:
Bart <bc@freeuk.com> wrote:
A rule of thumb I've sometimes observed is that, for x64 anyway, 1 line
of source code maps to about 10 bytes of binary machine code.
Depends on the language. For C it may be lower, for some other
languages much higher.
So 10 million lines of code represents a single 100MB program,
approximately.
I work on a program when executable is 64 M. However, significant
part of executable code is in loadable modules that take another
64 M. Guess how big is the source?
By my metric it would be about 6M lines of source code, if most of the
64KB was executable x64 code (rather than initialised data, embedded
data files, or other exe overheads).
That assumes a certain proportion of declaration lines to lines of
executable code.
Now you're going to tell me it's either a lot fewer or a lot more.
If the language is C, then I guess that could be anything: you can have macros that expand to many times there size, and instantiated at
multiple sites; include files that can do the same trick. Or lot of boilerplate code that reduces to nothing.
Or there is lots of inlining that pushes the size the other way again.
Well, there is also issue of memory size. SmartEiffel used (uses???) whole-program optimization and compiled very fast. But for really
large program it used to run out of memory. I am not sure if this is
still problem on modern machines, but resonable estimate is that keeping all needed info in memory you may need 1000 times of memory as for source. So you need to carefully optimize space use...
3 compilers of mine I've just tested use memory equivalent to 15x (C compiler), 20x (Interpreter), and 80x (my systems language) the source size.
But they all use persistent data structures, especially the last which creates arrays of tokens, a bad idea I've since dropped. All those
include the source itself.
All the memory is recovered on program termination. If it becomes an
issue, then unneeded data structured can be destroyed earlier.
But if we say 40x source size, then capacity of 8GB means /currently/
being able to deal with source code of something over 10M lines,
depending one code density.
It just means being more resourceful, and reintroducing long-forgotten techniques of working with memory-limited hardware.
ATM, 10M lines is 200 times the size of my typical projects.
Bart <bc@freeuk.com> wrote:
I work on a program when executable is 64 M. However, significant
part of executable code is in loadable modules that take another
64 M. Guess how big is the source?
By my metric it would be about 6M lines of source code, if most of the
64KB was executable x64 code (rather than initialised data, embedded
data files, or other exe overheads).
That assumes a certain proportion of declaration lines to lines of
executable code.
Now you're going to tell me it's either a lot fewer or a lot more.
40 M in executable is "statically" linked code from outside, probably corresponding to 0.5M lines os source. 24 M corresponds to about 80 K
lines. 64 M in loadable modules corresponds to 210 K lines (actual
code lines is closer to 120 K, rest is comments and empty lines).
It is hard to distinguish between executable code and data. Due
to semantics initialized data needs executable code to perform initialization. There are dispatch tables, all data and code is
tagged (has identifying headers).
ATM I have to keep parse tree of large part of program in memory.
The parse tree is about 8 times larger than corresponding source.
On 24/08/2021 21:56, David Brown wrote:
On 24/08/2021 21:06, Bart wrote:
On 24/08/2021 18:25, James Harris wrote:
These days why use calling conventions at all? Perhaps they are only
needed for when there's complete ignorance of the callee. The
traditional concept of calling conventions may be pass\acute/e. ;-)
James, aren't you using Linux? The compose key makes it easy to write
letters like é - it's just compose, ´, e - "passé". (It's even easier >> if you have a non-English keyboard layout, in Windows or Linux, as these
usually have "dead keys" for accents.)
Thanks, I've now enabled the compose key though I wrote passé in the way
I did as it's the way I am thinking of for my language - which, as it
was unfamiliar to others was why I added the smiley.
There are, I think, only two sensible options here:
1. Disallow any identifier letters outside of ASCII.
2. Make everything UTF-8.
On 2021-08-30 10:13, David Brown wrote:
There are, I think, only two sensible options here:
1. Disallow any identifier letters outside of ASCII.
2. Make everything UTF-8.
Yes. Though people preferring #2 are usually English speakers who are
not really aware of the consequences. Like having E, Ε, Е three
different identifies. One could try to maintain language-defined
homographs in order to prevent mess, introducing even bigger mess...
On 30/08/2021 10:38, Dmitry A. Kazakov wrote:
On 2021-08-30 10:13, David Brown wrote:
There are, I think, only two sensible options here:
1. Disallow any identifier letters outside of ASCII.
2. Make everything UTF-8.
Yes. Though people preferring #2 are usually English speakers who are
not really aware of the consequences. Like having E, Ε, Е three
different identifies. One could try to maintain language-defined
homographs in order to prevent mess, introducing even bigger mess...
I'm an English speaker, and a Norwegian speaker (we have three extra
letters, åøæ). And I am well aware of the potential complication of different Unicode code points with very similar (or even identical) glyphs.
It can also be difficult for people to type, which can quickly be a pain
for collaboration. How would you type "bøk", for example? That's
"book" in Norwegian, and I have a key labelled "ø". James, on Linux,
can use compose + / + o to get the letter. But for you on Windows, with
a German keyboard layout (I'm guessing from your email address), I
expect you are stuck with copy-and-paste from my post, or using the "character map" utility, or typing "alt+0248".
Then there is the question of displaying the characters. I have a font
that includes vast numbers of obscure symbols, so I could use ↀ for the Roman numeral for 1000 (using the traditional symbol, rather than the
modern replacement of M). Other people reading this might not see it.
All in all, non-ASCII letters in identifiers can pose a lot of
challenges. But they are nonetheless important for people around the
world, and despite the disadvantages, UTF-8 is far and away the best
choice. You simply have to trust programmers to be sensible in their
usage. (You need to to that anyway, even with ASCII - in many fonts, l,
1 and I can be hard to distinguish, as can O and 0.)
On 28/08/2021 16:35, James Harris wrote:
On 24/08/2021 21:56, David Brown wrote:
On 24/08/2021 21:06, Bart wrote:
On 24/08/2021 18:25, James Harris wrote:
These days why use calling conventions at all? Perhaps they are only >>>>> needed for when there's complete ignorance of the callee. The
traditional concept of calling conventions may be pass\acute/e. ;-)
James, aren't you using Linux? The compose key makes it easy to write
letters like é - it's just compose, ´, e - "passé". (It's even easier >>> if you have a non-English keyboard layout, in Windows or Linux, as these >>> usually have "dead keys" for accents.)
Thanks, I've now enabled the compose key though I wrote passé in the way
I did as it's the way I am thinking of for my language - which, as it
was unfamiliar to others was why I added the smiley.
I don't imagine anyone is going to want to write "pass\acute/e" as an identifier in any language.
On 2021-08-30 11:50, David Brown wrote:
On 30/08/2021 10:38, Dmitry A. Kazakov wrote:
On 2021-08-30 10:13, David Brown wrote:
There are, I think, only two sensible options here:
1. Disallow any identifier letters outside of ASCII.
2. Make everything UTF-8.
Yes. Though people preferring #2 are usually English speakers who are
not really aware of the consequences. Like having E, Ε, Е three
different identifies. One could try to maintain language-defined
homographs in order to prevent mess, introducing even bigger mess...
I'm an English speaker, and a Norwegian speaker (we have three extra
letters, åøæ). And I am well aware of the potential complication of
different Unicode code points with very similar (or even identical)
glyphs.
It can also be difficult for people to type, which can quickly be a pain
for collaboration. How would you type "bøk", for example? That's
"book" in Norwegian, and I have a key labelled "ø". James, on Linux,
can use compose + / + o to get the letter. But for you on Windows, with
a German keyboard layout (I'm guessing from your email address), I
expect you are stuck with copy-and-paste from my post, or using the
"character map" utility, or typing "alt+0248".
Right, character map is what I use.
Germans have it easy way, you can drop diacritical marks ä=ae ö=oe ü=ue and the ligature SZ ß=ss.
Then there is the question of displaying the characters. I have a font
that includes vast numbers of obscure symbols, so I could use ↀ for the
Roman numeral for 1000 (using the traditional symbol, rather than the
modern replacement of M). Other people reading this might not see it.
It is a lesser problem now than it was before. I remember the time
Windows was unable to display most of special symbols.
All in all, non-ASCII letters in identifiers can pose a lot of
challenges. But they are nonetheless important for people around the
world, and despite the disadvantages, UTF-8 is far and away the best
choice. You simply have to trust programmers to be sensible in their
usage. (You need to to that anyway, even with ASCII - in many fonts, l,
1 and I can be hard to distinguish, as can O and 0.)
Actually, this is again sort of Europocentric POV. In reality, if you
have a truly international team with speakers outside Western Europe,
you must agree on some strict rules regarding comments and identifiers.
You might be able to remember a German or even a Czech word. Cyrillic
would be rather more challenging. But what would you do with Armenian or Chinese?
And the least common denominator is English.
On 30/08/2021 09:13, David Brown wrote:
On 28/08/2021 16:35, James Harris wrote:
On 24/08/2021 21:56, David Brown wrote:
On 24/08/2021 21:06, Bart wrote:
On 24/08/2021 18:25, James Harris wrote:
These days why use calling conventions at all? Perhaps they are only >>>>>> needed for when there's complete ignorance of the callee. The
traditional concept of calling conventions may be pass\acute/e. ;-)
James, aren't you using Linux? The compose key makes it easy to write >>>> letters like é - it's just compose, ´, e - "passé". (It's even easier >>>> if you have a non-English keyboard layout, in Windows or Linux, as
these
usually have "dead keys" for accents.)
Thanks, I've now enabled the compose key though I wrote passé in the way >>> I did as it's the way I am thinking of for my language - which, as it
was unfamiliar to others was why I added the smiley.
I don't imagine anyone is going to want to write "pass\acute/e" as an
identifier in any language.
It's for string literals!
IMO programs and identifiers should use ascii, even in non-English
languages.
On 30/08/2021 13:37, Dmitry A. Kazakov wrote:
It is a lesser problem now than it was before. I remember the time
Windows was unable to display most of special symbols.
Slowly, in some ways, Windows has been catching up with the *nix world.
On 30/08/2021 13:37, Dmitry A. Kazakov wrote:
Actually, this is again sort of Europocentric POV. In reality, if you
have a truly international team with speakers outside Western Europe,
you must agree on some strict rules regarding comments and identifiers.
If you have an international team, then it is standard practice to keep everything in English. But most teams are not international. Why
should a group of Greek or Japanese programmers be forced to write
everything in a foreign language? You can view the keywords as fixed - almost like symbols, rather than words - but they may prefer to have
other parts written in their own language.
You might be able to remember a German or even a Czech word. Cyrillic
would be rather more challenging. But what would you do with Armenian or
Chinese?
And the least common denominator is English.
It is the least common denominator for most international groups, but
not for most national teams.
On 2021-08-30 20:13, David Brown wrote:
On 30/08/2021 13:37, Dmitry A. Kazakov wrote:
It is a lesser problem now than it was before. I remember the time
Windows was unable to display most of special symbols.
Slowly, in some ways, Windows has been catching up with the *nix world.
I must defend Windows. Linux adopted UTF-8 very late. I well remember
the mess it had with 8-bit code pages.
BTW, there still exist file utilities to check filenames in Linux. I had
an old filesystem with some file names in German encoded in Latin-1. It
was connected to a FreeNAS (BSD-based). These files caused mysterious
FreeNAS crashes when a remote host tried to browse files over a network share. Once I fixed the names it almost stopped crashing. I ditched
FreeNAS anyway in favor of Ubuntu.
On 30/08/2021 21:10, Dmitry A. Kazakov wrote:
On 2021-08-30 20:13, David Brown wrote:
On 30/08/2021 13:37, Dmitry A. Kazakov wrote:
It is a lesser problem now than it was before. I remember the time
Windows was unable to display most of special symbols.
Slowly, in some ways, Windows has been catching up with the *nix world.
I must defend Windows. Linux adopted UTF-8 very late. I well remember
the mess it had with 8-bit code pages.
Windows also had a mess with 8-bit code pages.
Windows /was/ earlier with Unicode, that's true - unfortunately, they
picked UCS-2 and then got stuck with that instead of UTF-8.
Linux
picked UTF-8 by laziness, as pretty much everything involving strings
(except displaying them) just works as before. There is no need to
re-invent everything in a 16-bit manner, as Windows did, and there are
no problems when it turns out 16 bits are not enough.
On 30/08/2021 14:04, James Harris wrote:
On 30/08/2021 09:13, David Brown wrote:
I don't imagine anyone is going to want to write "pass\acute/e" as an
identifier in any language.
It's for string literals!
IMO programs and identifiers should use ascii, even in non-English
languages.
See the rest of the thread for a discussion on non-ASCII identifiers.
(I am not suggesting that you implement them, or don't implement them - that's your choice. Some languages go one way, others go the other way.)
But don't make up your own language for special characters in strings or comments. Again, UTF-8 is far and away the best option. If you feel
that is a problem, then at least stick to an existing standard -
HTML/XML character entities would almost certainly be the most
convenient choice: "passé".
On 29/08/2021 02:01, Bart wrote:
On 28/08/2021 17:21, David Brown wrote:
On 27/08/2021 22:07, Bart wrote:
As James suggested, the object files are basically just the internal
representation of the compilation before code generation.
Then 'object file' is a complete misnomer.
Yes, that's a fair comment. "Linking" is also a misnomer in link-time optimisation. The names are historical, rather than technically accurate.
On 2021-08-30 21:18, David Brown wrote:
Windows /was/ earlier with Unicode, that's true - unfortunately, they
picked UCS-2 and then got stuck with that instead of UTF-8.
Worse, later they changed UCS-2 to UTF-16 under the rug. All system
calls are duplicated, one ASCII A-call, another UTF-16 W-call.
Linux
picked UTF-8 by laziness, as pretty much everything involving strings
(except displaying them) just works as before. There is no need to
re-invent everything in a 16-bit manner, as Windows did, and there are
no problems when it turns out 16 bits are not enough.
It is UTF-16 now. But of course, UTF-16 is a monstrosity compared with
UTF-8. Fortunately third party libraries ignore the mess. E.g. GTK port
for Windows converts all filenames to UTF-8.
My understanding (which may be wrong, as I don't do much Windows
programming) is that there is a gradual move to UTF-8 support in
Windows.
These things take time of course, and while there is no doubt
that Microsoft backed the wrong horse here with 16-bit encodings, they
made the right choice at the time.
I blame MS for a lot of bad things,
but not this one! And they are not alone - Java, QT and Python are
other big players that picked UCS-2, leading to much regret and slow
progress towards a changeover to UTF-8.
On 2021-08-31 09:36, David Brown wrote:
My understanding (which may be wrong, as I don't do much Windows
programming) is that there is a gradual move to UTF-8 support in
Windows.
I think you are right. Actually they could proclaim A-calls UTF-8 as
they did with W-calls. That would break some legacy code, only French
will be annoyed. Germans will be apathic, small European countries
resigned, I guess...
These things take time of course, and while there is no doubt
that Microsoft backed the wrong horse here with 16-bit encodings, they
made the right choice at the time.
I blame MS for a lot of bad things,
but not this one! And they are not alone - Java, QT and Python are
other big players that picked UCS-2, leading to much regret and slow
progress towards a changeover to UTF-8.
I believe that UTF-8 was introduced later.
It is impossible that
everybody was wrong.
E.g. Ada also adopted UCS-2 in 1995. Later on Ada
added UCS-4. Just same mess as with Windows, alas. But most Ada
programmers ignore UCS-2/4 and use UTF-8 where the standard mandates
Latin-1.
On 30/08/2021 13:37, Dmitry A. Kazakov wrote:
On 2021-08-30 11:50, David Brown wrote:
All in all, non-ASCII letters in identifiers can pose a lot of
challenges. But they are nonetheless important for people around the
world, and despite the disadvantages, UTF-8 is far and away the best
choice. You simply have to trust programmers to be sensible in their
usage. (You need to to that anyway, even with ASCII - in many fonts, l, >>> 1 and I can be hard to distinguish, as can O and 0.)
Actually, this is again sort of Europocentric POV. In reality, if you
have a truly international team with speakers outside Western Europe,
you must agree on some strict rules regarding comments and identifiers.
If you have an international team, then it is standard practice to keep everything in English. But most teams are not international. Why
should a group of Greek or Japanese programmers be forced to write
everything in a foreign language? You can view the keywords as fixed - almost like symbols, rather than words - but they may prefer to have
other parts written in their own language.
On 30/08/2021 19:13, David Brown wrote:
On 30/08/2021 13:37, Dmitry A. Kazakov wrote:
On 2021-08-30 11:50, David Brown wrote:
...
All in all, non-ASCII letters in identifiers can pose a lot of
challenges. But they are nonetheless important for people around the >>>> world, and despite the disadvantages, UTF-8 is far and away the best
choice. You simply have to trust programmers to be sensible in their >>>> usage. (You need to to that anyway, even with ASCII - in many
fonts, l,
1 and I can be hard to distinguish, as can O and 0.)
Actually, this is again sort of Europocentric POV. In reality, if you
have a truly international team with speakers outside Western Europe,
you must agree on some strict rules regarding comments and identifiers.
If you have an international team, then it is standard practice to keep
everything in English. But most teams are not international. Why
should a group of Greek or Japanese programmers be forced to write
everything in a foreign language? You can view the keywords as fixed -
almost like symbols, rather than words - but they may prefer to have
other parts written in their own language.
AISI: Have the master copy of /all/ programs in American English, and
support translation of identifier names, comments, string literals etc
to other languages.
On 31/08/2021 11:33, Dmitry A. Kazakov wrote:
On 2021-08-31 09:36, David Brown wrote:
My understanding (which may be wrong, as I don't do much Windows
programming) is that there is a gradual move to UTF-8 support in
Windows.
I think you are right. Actually they could proclaim A-calls UTF-8 as
they did with W-calls. That would break some legacy code, only French
will be annoyed. Germans will be apathic, small European countries resigned, I guess...
You are just listing the advantages :-)
These things take time of course, and while there is no doubt
that Microsoft backed the wrong horse here with 16-bit encodings, they
made the right choice at the time.
I blame MS for a lot of bad things,
but not this one!? And they are not alone - Java, QT and Python are
other big players that picked UCS-2, leading to much regret and slow
progress towards a changeover to UTF-8.
I believe that UTF-8 was introduced later.
Yes. Unicode was first conceives as 16-bit, with UCS-2. Then they
started extending it beyond 16-bit, and had to make UCS-4. UTF-16 was developed as a way to access the rest of the characters with 16-bit code units, and then I think UTF-8 came after that. (UTF-32 is the same as UCS-4.)
On 30/08/2021 10:38, Dmitry A. Kazakov wrote:
On 2021-08-30 10:13, David Brown wrote:
There are, I think, only two sensible options here:
1. Disallow any identifier letters outside of ASCII.
2. Make everything UTF-8.
It can also be difficult for people to type, which can quickly be a pain
for collaboration. How would you type "bøk", for example?
On 30/08/2021 13:37, Dmitry A. Kazakov wrote:
Actually, this is again sort of Europocentric POV. In reality, if you
have a truly international team with speakers outside Western Europe,
you must agree on some strict rules regarding comments and identifiers.
And the least common denominator is English.
It is the least common denominator for most international groups, but
not for most national teams.
On 30/08/2021 10:50, David Brown wrote:
On 30/08/2021 10:38, Dmitry A. Kazakov wrote:
On 2021-08-30 10:13, David Brown wrote:
There are, I think, only two sensible options here:
1. Disallow any identifier letters outside of ASCII.
2. Make everything UTF-8.
IMO there's a better option:
3. Use purely ASCII but allow escape sequences to be coded in ASCII.
...
It can also be difficult for people to type, which can quickly be a pain
for collaboration. How would you type "bøk", for example?
I'd could allow that to be used in string literals with something like
 "b\slash:o/k"
As well as string literals it is unlikely but possible that a program
written in my language would have to call a function from another
language which has been written in Norwegian where the function name
included a non-ASCII character. For that, I am considering allowing
 \slash:o/
and similar to appear in the name of external functions. It would be
ugly but clear. And programmers could limit the ugliness to one place by defining an alias as in
 namedef book = b\slash:o/k
 book()
Wouldn't that be better than either pure ASCII or allowing Unicode?
Have I got all bases covered? I hope so!
On 06/09/2021 20:30, James Harris wrote:
On 30/08/2021 10:50, David Brown wrote:
On 30/08/2021 10:38, Dmitry A. Kazakov wrote:
On 2021-08-30 10:13, David Brown wrote:
There are, I think, only two sensible options here:
1. Disallow any identifier letters outside of ASCII.
2. Make everything UTF-8.
IMO there's a better option:
3. Use purely ASCII but allow escape sequences to be coded in ASCII.
it is unlikely but possible that a program
written in my language would have to call a function from another
language which has been written in Norwegian where the function name
included a non-ASCII character. For that, I am considering allowing
 \slash:o/
and similar to appear in the name of external functions. It would be
ugly but clear. And programmers could limit the ugliness to one place by
defining an alias as in
 namedef book = b\slash:o/k
 book()
Wouldn't that be better than either pure ASCII or allowing Unicode?
Have I got all bases covered? I hope so!
All the bases except for the ones concerning what people writing other languages would actually see as usable.
On 31/08/2021 20:37, James Harris wrote:
On 30/08/2021 19:13, David Brown wrote:
On 30/08/2021 13:37, Dmitry A. Kazakov wrote:
On 2021-08-30 11:50, David Brown wrote:
...
All in all, non-ASCII letters in identifiers can pose a lot of
challenges. But they are nonetheless important for people around the >>>>> world, and despite the disadvantages, UTF-8 is far and away the best >>>>> choice. You simply have to trust programmers to be sensible in their >>>>> usage. (You need to to that anyway, even with ASCII - in many
fonts, l,
1 and I can be hard to distinguish, as can O and 0.)
Actually, this is again sort of Europocentric POV. In reality, if you
have a truly international team with speakers outside Western Europe,
you must agree on some strict rules regarding comments and identifiers. >>>>
If you have an international team, then it is standard practice to keep
everything in English. But most teams are not international. Why
should a group of Greek or Japanese programmers be forced to write
everything in a foreign language? You can view the keywords as fixed - >>> almost like symbols, rather than words - but they may prefer to have
other parts written in their own language.
AISI: Have the master copy of /all/ programs in American English, and
support translation of identifier names, comments, string literals etc
to other languages.
Why would anyone choose the dialect of one particular ex colony, rather
than using /real/ English?
I know that in the USA it is common to think that America is the only country, or at least the only one worth considering, but the rest of the world begs to differ.
On 01/09/2021 09:16, David Brown wrote:
Why would anyone choose the dialect of one particular ex colony, rather
than using /real/ English?
I know that in the USA it is common to think that America is the only
country, or at least the only one worth considering, but the rest of the
world begs to differ.
AmE is more heavily used - especially in IT - so it makes more sense to
use it as a lingua franca. For example, an identifier might be called TextColor rather than TextColour.
There is precedent for that kind of choice. Music terms such as andante
and pianissimo are in Italian. Speakers of other languages still work
with them.
On 22/10/2021 16:09, James Harris wrote:
On 01/09/2021 09:16, David Brown wrote:
Why would anyone choose the dialect of one particular ex colony, rather
than using /real/ English?
I know that in the USA it is common to think that America is the only
country, or at least the only one worth considering, but the rest of the >>> world begs to differ.
AmE is more heavily used - especially in IT - so it makes more sense
to use it as a lingua franca. For example, an identifier might be
called TextColor rather than TextColour.
There is precedent for that kind of choice. Music terms such as
andante and pianissimo are in Italian. Speakers of other languages
still work with them.
Those are pure Italian.
Not Anglicised-Italian that is going to annoy natives of that country if
they were obliged to use them.
For example, 'panini' used to refer to a single bun, when 'panini' is actually plural.
So I like to write Colour not Color. We invented the language after all!
(By 'we' I mean the British, though my passport says otherwise.)
However, I do use 'disk' and 'program' rather than 'disc' and
'programme', as the former are now firmly associated with computing.
On 22/10/2021 20:20, Bart wrote:
On 22/10/2021 16:09, James Harris wrote:
On 01/09/2021 09:16, David Brown wrote:
Why would anyone choose the dialect of one particular ex colony, rather >>>> than using /real/ English?
I know that in the USA it is common to think that America is the only
country, or at least the only one worth considering, but the rest of
the
world begs to differ.
AmE is more heavily used - especially in IT - so it makes more sense
to use it as a lingua franca. For example, an identifier might be
called TextColor rather than TextColour.
There is precedent for that kind of choice. Music terms such as
andante and pianissimo are in Italian. Speakers of other languages
still work with them.
Those are pure Italian.
Not Anglicised-Italian that is going to annoy natives of that country
if they were obliged to use them.
For example, 'panini' used to refer to a single bun, when 'panini' is
actually plural.
Similar with graffiti. Or paparazi.
So I like to write Colour not Color. We invented the language after all!
That's all very well but with most IT standards coming out of America
the spelling 'Color' is used - and inbuilt - more frequently. Surely
it's better to have one spelling than to continually ask which spelling
is used in a certain case.
One could still present to users the spelling which suits them while
program object names would be in American English. That applies in the filesystem, too. For example, yesterday I got a message about not being
able to move files to my 'rubbish bin'. American users get told about
the 'trash can'. French users possibly get told about the 'poubelle'.
But the folder in the file system still has the American name
'.Trash-UID'. That's easier to work with than renaming the folder for
each locale, isn't it?!!!
(By 'we' I mean the British, though my passport says otherwise.)
Curious. If you can reply without giving too much away what does it say?
However, I do use 'disk' and 'program' rather than 'disc' and
'programme', as the former are now firmly associated with computing.
As long as you don't try to catch fishes. ;-)
BTW, for computer programs, at school I was taught that 'program' was correct.
I've been taking the top off a chimney stack
Why?
It's letting water in.
I was thinking to remove the whole chimney stack as it is no longer
used but then I wondered whether the local authority might tell me to reinstate it. That would not be good, especially as the bricks I've
removed so far are soft and are breaking. So current idea is to
remove the dodgy top layer or two and cap it while we have some dry
weather. (Though I am very wary about being able to lift a 2' square
concrete cap up the ladder without putting so much sideways pressure
on the stack such that it falls over!)
On 31/08/2021 20:37, James Harris wrote:
On 30/08/2021 19:13, David Brown wrote:
On 30/08/2021 13:37, Dmitry A. Kazakov wrote:
On 2021-08-30 11:50, David Brown wrote:
...
All in all, non-ASCII letters in identifiers can pose a lot of
challenges. But they are nonetheless important for people
around the world, and despite the disadvantages, UTF-8 is far
and away the best choice. You simply have to trust programmers
to be sensible in their usage. (You need to to that anyway,
even with ASCII - in many fonts, l,
1 and I can be hard to distinguish, as can O and 0.)
Actually, this is again sort of Europocentric POV. In reality, if
you have a truly international team with speakers outside Western
Europe, you must agree on some strict rules regarding comments
and identifiers.
If you have an international team, then it is standard practice to
keep everything in English. But most teams are not international.
Why should a group of Greek or Japanese programmers be forced to
write everything in a foreign language? You can view the keywords
as fixed - almost like symbols, rather than words - but they may
prefer to have other parts written in their own language.
AISI: Have the master copy of /all/ programs in American English,
and support translation of identifier names, comments, string
literals etc to other languages.
Why would anyone choose the dialect of one particular ex colony,
rather than using /real/ English?
I know that in the USA it is common to think that America is the only country, or at least the only one worth considering, but the rest of
the world begs to differ.
On Wed, 1 Sep 2021 10:16:13 +0200
David Brown <david.brown@hesbynett.no> wrote:
I know that in the USA it is common to think that America is the only
country, or at least the only one worth considering, but the rest of
the world begs to differ.
If you meant to insult James here, I think you failed. IIRC, he's in
the U.K. So, he's probably British.
On the other hand, I'm in the
U.S., descended from British ancestors (AFAIK), and, yes, the U.S.
population and it's news media isn't generally globally oriented. We
often don't even know what is going on in Mexico or Canada. That is
often, unfortunately, portrayed as self-centered or narcissistic. In reality, it's more of a too much to do, too much to enjoy, too little
time for it all, kind of issue, combined with a non-global mentality.
On 08/11/2021 02:21, Rod Pemberton wrote:
On the other hand, I'm in the
U.S., descended from British ancestors (AFAIK), and, yes, the U.S.
population and it's news media isn't generally globally oriented. We
often don't even know what is going on in Mexico or Canada. That is
often, unfortunately, portrayed as self-centered or narcissistic. In
reality, it's more of a too much to do, too much to enjoy, too little
time for it all, kind of issue, combined with a non-global mentality.
You may be surprised to find how similar it is in the UK. The TV media
here have narrow viewpoints, often focussing on UK and US events (or
causes celebres that they have, themselves, created) and telling us
little about what's happening in mainland Europe, Asia or Africa etc.
It's exasperating and leaves viewers ignorant of important issues in the wider world.
On 08/11/2021 02:21, Rod Pemberton wrote:
On Wed, 1 Sep 2021 10:16:13 +0200
David Brown <david.brown@hesbynett.no> wrote:
...
[OT comments below]
I know that in the USA it is common to think that America is the only
country, or at least the only one worth considering, but the rest of
the world begs to differ.
If you meant to insult James here, I think you failed. IIRC, he's in
the U.K. So, he's probably British.
Yes, I am British. English, in fact.
On the other hand, I'm in the
U.S., descended from British ancestors (AFAIK), and, yes, the U.S.
population and it's news media isn't generally globally oriented. We
often don't even know what is going on in Mexico or Canada. That is
often, unfortunately, portrayed as self-centered or narcissistic. In
reality, it's more of a too much to do, too much to enjoy, too little
time for it all, kind of issue, combined with a non-global mentality.
You may be surprised to find how similar it is in the UK. The TV media
here have narrow viewpoints, often focussing on UK and US events (or
causes celebres that they have, themselves, created) and telling us
little about what's happening in mainland Europe, Asia or Africa etc.
It's exasperating and leaves viewers ignorant of important issues in the wider world.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 292 |
Nodes: | 16 (2 / 14) |
Uptime: | 206:59:57 |
Calls: | 6,618 |
Files: | 12,168 |
Messages: | 5,316,897 |