• Hardening Defined Words

    From Krishna Myneni@21:1/5 to All on Fri Aug 5 22:06:11 2022
    Summary: for some non-native Forth systems, it should be possible to
    relocate the compiled code of a colon definition into memory which can
    be marked read-only, to protect against corruption. For this to be
    feasible the run-time xt for a word should have at least one level of indirection to the code being executed by the virtual machine.

    Consider the following ordinary colon definition in kForth, an indirect threaded code Forth interpreter/compiler:

    : foo 0 ;

    ' foo execute . \ same as typing FOO
    0 ok

    see foo
    565403F101F0 #0
    565403F101F9 RET
    ok

    Now, let's do some bad things to FOO.

    0 ' foo ! \ store a zero at the execution address for FOO
    foo
    Segmentation fault (core dumped)
    $

    Start kForth again and define FOO as above.

    ' foo a@ execute-bc . \ execute the byte code for FOO

    One may infer that the xt for FOO is an address at which the compiled
    byte code for FOO resides. The byte code is the code executed by
    kForth's virtual machine.

    Now, let's define a word BAR and demonstrate that we can modify the byte
    code for BAR directly from the Forth interpreter.

    : bar 10 0 do i . loop ;

    see bar
    560DD5DABDA0 #10
    560DD5DABDA9 #0
    560DD5DABDB2 >R
    560DD5DABDB3 >R
    560DD5DABDB4 IP>R
    560DD5DABDB5 I
    560DD5DABDB6 .
    560DD5DABDB7 LOOP
    560DD5DABDB8 RET
    ok

    To see the actual byte code of BAR,

    ' bar a@ 32 dump

    560DD5DABDA0 : 49 0A 00 00 00 00 00 00 00 49 00 00 00 00
    00 00 I........I......
    560DD5DABDB0 : 00 00 DC DC DE 69 2E E9 EE 00 00 00 00 00
    00 00 .....i..........

    ( the RET instruction for the virtual machine is byte EE ).

    Now, we may corrupt the byte code, for example, by changing the loop
    count to 5, instead of 10:

    5 ' bar a@ 1+ !

    Now, when BAR is executed, it will output "0 1 2 3 4 ok"

    It is possible to use mmap and mprotect system calls (or equivalents
    under Windows) to relocate the byte code to a new memory region and mark
    that memory region as read-only, thereby avoiding this type of
    corruption. It is relatively simple to do this from Forth itself,
    although the details are obviously system-dependent. In this way, we
    can, in principle, protect the executed code for a colon definition.

    It's important to note that the dictionary structure for the word itself
    is not able to be protected from being overwritten in this scheme.
    Protecting the dictionary headers for colon definitions would require a significant change in architecture, but it's not out of the question.

    Although I used kForth as the example system since I'm familiar with its internals, other systems may be able to do the same. I don't know the
    internals of Gforth, but one can see that at least one level of
    indirection appears to be involved in going from the xt to the executed
    code, e.g., in Gforth,

    see execute
    Code execute
    404AB9: mov $50[r13],r15
    404ABD: mov rdx,[r14]
    404AC0: add r14,$08
    404AC4: mov rcx,-$10[rdx]
    404AC8: jmp ecx
    end-code

    Here, the assembly code gives us the hint that r14 is the TOS (top of
    stack) and there seems to be one level of indirection from the xt on top
    of the stack to the code which is subsequently executed. The code
    pointed to by xt can be overwritten, e.g., in Gforth,

    : bar 10 0 do i . loop ; ok
    bar 0 1 2 3 4 5 6 7 8 9 ok

    0 ' bar @ ! ok
    bar
    *the terminal*:3:1: error: Stack underflow

    I don't know enough about Gforth internals to be able to say that a
    relocation of the code for BAR to a region which can be protected as
    read only is possible. Perhaps one of the Gforth developers can say definitively whether or not this is possible.

    --
    Krishna Myneni

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From dxforth@21:1/5 to All on Sat Aug 6 14:09:13 2022
    The only protected system I've used is FlashForth. It attempts to protect
    the kernel on the basis a user should be able restart forth after a crash without having to re-flash the system. It's hard for me to evaluate the benefits of such a system without disabling the protection (not easy). The costs are known but the gain remains nebulous. Is there a developer who wouldn't have access to a programmer should re-flashing become necessary?
    And what failure rate are we talking about - once a day, once a month?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Krishna Myneni on Sat Aug 6 05:48:23 2022
    Krishna Myneni <krishna.myneni@ccreweb.org> writes:
    It is possible to use mmap and mprotect system calls (or equivalents
    under Windows) to relocate the byte code to a new memory region and mark
    that memory region as read-only, thereby avoiding this type of
    corruption. It is relatively simple to do this from Forth itself,
    although the details are obviously system-dependent. In this way, we
    can, in principle, protect the executed code for a colon definition.

    It's important to note that the dictionary structure for the word itself
    is not able to be protected from being overwritten in this scheme.

    I don't see a reason why not. Compile-to-flash systems do it. If you
    don't want to change protection on every IMMEDIATE, DOES> etc., keep
    the most recent header in writeable memory, and only move it to
    read-only memory when the next header is created.

    I don't know the
    internals of Gforth, but one can see that at least one level of
    indirection appears to be involved in going from the xt to the executed
    code, e.g., in Gforth,

    see execute
    Code execute
    404AB9: mov $50[r13],r15
    404ABD: mov rdx,[r14]
    404AC0: add r14,$08
    404AC4: mov rcx,-$10[rdx]
    404AC8: jmp ecx
    end-code

    Here, the assembly code gives us the hint that r14 is the TOS (top of
    stack) and there seems to be one level of indirection from the xt on top
    of the stack to the code which is subsequently executed.

    As far as EXECUTE is concerned, Gforth uses indirect-threaded code.
    That's the indirection you are seeing.

    It would require substantial changes to make the threaded code and/or
    the headers read-only; for the native code it would be relatively straight-forward to make all but the most recent native-code page
    unwriteable.

    Bugs where code or headers were overwritten have not been problematic
    enough in our experience to take any such action. I have not had such
    a request by users, either.

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2022: https://euro.theforth.net

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From minforth@arcor.de@21:1/5 to Krishna Myneni on Fri Aug 5 23:19:32 2022
    Krishna Myneni schrieb am Samstag, 6. August 2022 um 05:06:15 UTC+2:
    Summary: for some non-native Forth systems, it should be possible to
    relocate the compiled code of a colon definition into memory which can
    be marked read-only, to protect against corruption. For this to be
    feasible the run-time xt for a word should have at least one level of indirection to the code being executed by the virtual machine.


    The easiest way in a VM-based Forth would be to just add
    address-checking to all words that write to memory.
    Eg
    ! (store sanitized, safe but slow)
    _! (store naked, fast and unaccessible to the user)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marcel Hendrix@21:1/5 to Anton Ertl on Sat Aug 6 00:49:31 2022
    On Saturday, August 6, 2022 at 8:02:26 AM UTC+2, Anton Ertl wrote:
    [..]
    Bugs where code or headers were overwritten have not been problematic
    enough in our experience to take any such action. I have not had such
    a request by users, either.

    I have had a problem with this a few times. Actually, I'm extremely glad that code is *not* read protected: how would I have noticed that something was wrong? A slightly off final result in a big program is not straightforward.

    An overwite of native code causes an almost immediate crash
    that leads to a useful exception report. It can be problematic
    to instrument the calling code to catch the reason (i.e. if the overwrite happens very infrequently under special conditions). I have had one or two cases (in 40 years) where I had to use an external debugger which supports break on memory access. The steps are: start Forth first, then attach
    the debugger to the image. Run Forth until you get the exception address, switch to the debugger and setup the breakpoint there, then go back to
    Forth and restart or halt the program in the vicinity of the problem. Inspecting memory and data is much more convenient at the Forth end.

    -marcel

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From dxforth@21:1/5 to minf...@arcor.de on Sat Aug 6 20:14:24 2022
    On 6/08/2022 16:19, minf...@arcor.de wrote:
    Krishna Myneni schrieb am Samstag, 6. August 2022 um 05:06:15 UTC+2:
    Summary: for some non-native Forth systems, it should be possible to
    relocate the compiled code of a colon definition into memory which can
    be marked read-only, to protect against corruption. For this to be
    feasible the run-time xt for a word should have at least one level of
    indirection to the code being executed by the virtual machine.


    The easiest way in a VM-based Forth would be to just add
    address-checking to all words that write to memory.
    Eg
    ! (store sanitized, safe but slow)
    _! (store naked, fast and unaccessible to the user)

    ! need not be slow - at least not for RAM - where it matters.
    If application RAM in a system is segregated then it's a simple
    test for ! to determine. Storing to CODE/FLASH/EEPROM can afford to
    be slower as such operations are either atypical or inherently slow.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Marcel Hendrix on Sat Aug 6 09:21:07 2022
    Marcel Hendrix <mhx@iae.nl> writes:
    Actually, I'm extremely glad that
    code is *not* read protected: how would I have noticed that something was >wrong?

    If the code was write-protected and you tried to write to it, you
    would get a SIGSEGV on Unix. E.g., when you do SEE FSIN in Gforth,
    you see the code for FSIN coming from the gcc, which is
    write-protected. Now let's see what happens when I try to write
    there:

    see fsin
    Code fsin
    5586FF58AFFB: movapd xmm0,xmm15
    5586FF58B000: mov $20[rsp],r8
    5586FF58B005: add r15,$08
    5586FF58B009: call $5586FF5876E0
    5586FF58B00E: mov r8,$20[rsp]
    5586FF58B013: movapd xmm15,xmm0
    5586FF58B018: mov rcx,-$08[r15]
    5586FF58B01C: jmp ecx
    end-code
    ok
    1 $5586FF58AFFB c!
    *the terminal*:3:17: error: Invalid memory address
    1 $5586FF58AFFB >>>c!<<<

    An overwite of native code causes an almost immediate crash
    that leads to a useful exception report.

    If you are unlucky, the code is executed long after the write, and you
    have to puzzle out what went wrong. With write-protected code you see
    the write that would otherwise cause the problem, as shown above.

    So write-protecting the code can have an advantage. The question is
    if the advantage is big enough to justify the effort.

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2022: https://euro.theforth.net

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marcel Hendrix@21:1/5 to Anton Ertl on Sat Aug 6 04:12:14 2022
    On Saturday, August 6, 2022 at 11:34:00 AM UTC+2, Anton Ertl wrote:
    [..]
    If the code was write-protected and you tried to write to it, you
    would get a SIGSEGV on Unix. E.g., when you do SEE FSIN in Gforth,
    you see the code for FSIN coming from the gcc, which is
    write-protected. Now let's see what happens when I try to write
    there:
    [..]
    1 $5586FF58AFFB c!
    *the terminal*:3:17: error: Invalid memory address
    1 $5586FF58AFFB >>>c!<<<

    Indeed useful: the exception is generated immediately
    when the overwrite happens. The stack trace shows
    who tried to do that.

    FORTH> ' fsin idis
    $01250FE0 : FSIN
    $01250FEA call REDUCE.2PI ( $0124AD00 ) offset NEAR
    $01250FEF fsin
    $01250FF1 ;
    FORTH> 1 $01250FEA c! ok
    FORTH> ' fsin idis
    $01250FE0 : FSIN
    $01250FEA add [ecx] dword, rdx
    $01250FEC popfq
    $01250FED ??? rdi
    $01250FEF fsin
    $01250FF1 ;
    FORTH> 0e fsin
    Caught exception 0xc0000005
    ACCESS VIOLATION
    instruction pointer = $0000000001250FEA
    RAX = $01253425 RBX = $01250FE0
    RCX = $00000000 RDX = $0000028E
    RSI = $01155C00 RDI = $2C9EF798
    RBP = $01125F88 RSP = $2C9EF7D8
    R8 = $01099A20 R9 = $00000020
    R10 = $01046F50 R11 = $011128A5
    R12 = $01099AC0 R13 = $01156FF0
    R14 = $01136000 R15 = $01110000
    Hardware exception in ``FSIN''+$0000000A
    **** RETURN STACK DUMP **** for MAIN-THREAD

    Only shows the problem after the fact, needing an external
    debugger to set up a break.
    Knowing how much effort/nuisance it is to make words
    r/o during compilation and debugging would make it possible
    to weigh advantages and disadvantages. It seems that
    code and data can't be interleaved at all.

    -marcel

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Krishna Myneni@21:1/5 to Anton Ertl on Sat Aug 6 10:56:42 2022
    On 8/6/22 04:21, Anton Ertl wrote:
    Marcel Hendrix <mhx@iae.nl> writes:
    Actually, I'm extremely glad that
    code is *not* read protected: how would I have noticed that something was
    wrong?

    If the code was write-protected and you tried to write to it, you
    would get a SIGSEGV on Unix. ...
    So write-protecting the code can have an advantage. The question is
    if the advantage is big enough to justify the effort.


    Yes, even when the code to be executed is in the form of tokenized byte
    code and executed by a virtual machine, protecting the memory containing
    the byte code by making it read only will cause an overwrite to fail immediately and unambiguously.

    I've run into this type of memory corruption problem a few times and I
    remember they were extremely difficult to debug. It seems more common to
    see memory corruption of the dictionary headers via a stray address, so
    that problem may be more pressing than overwriting the byte code. In
    kForth, corruption of the dictionary headers is often indicated by a
    core dump upon performing bye, when the dynamically allocated dictionary
    space is freed.

    The machine code part of all code words in kForth are stored in memory
    which is read-execute except when new code is added (see mc.4th). It is
    not terribly difficult to protect ordinary colon definitions in kForth, although it is a bit of a hack write now. The dictionary header needs
    another field or two, one of which indicates whether or not the word's executable code is read-only, and another to store the executable code size.

    I wrote a variant of mc.4th, called protect.4th, which allows a colon definition to be protected. Going back to our BAR example,

    : bar 10 0 do i . loop ;
    ok
    see bar
    55C34CBC89C0 #10
    55C34CBC89C9 #0
    55C34CBC89D2 >R
    55C34CBC89D3 >R
    55C34CBC89D4 IP>R
    55C34CBC89D5 I
    55C34CBC89D6 .
    55C34CBC89D7 LOOP
    55C34CBC89D8 RET
    ok
    bar
    0 1 2 3 4 5 6 7 8 9 ok

    ' bar 32 Protect-Def \ relocate BAR's byte code to protected memory
    ok

    \ Protect-Def also updates the dictionary header for BAR

    see bar
    7F09C30E3000 #10
    7F09C30E3009 #0
    7F09C30E3012 >R
    7F09C30E3013 >R
    7F09C30E3014 IP>R
    7F09C30E3015 I
    7F09C30E3016 .
    7F09C30E3017 LOOP
    7F09C30E3018 RET
    ok
    \ Note the new address space of the relocated byte code.
    bar
    0 1 2 3 4 5 6 7 8 9 ok

    Now, unlike before, when we try to overwrite the byte code memory there
    is an immediate and hard failure.

    5 ' bar a@ 1+ !
    Segmentation fault (core dumped)
    $

    The details of PROTECT-DEF are, of course, Forth system and OS system-dependent. The source code for protect.4th is posted at

    https://github.com/mynenik/kForth-64/blob/master/forth-src/protect.4th

    One problem with the current approach is a segmentation fault on
    executing BYE , because the cleanup code executed upon BYE tries to free
    the new byte code memory. This is why a protection flag is needed in the dictionary header, which involves changes to the source code for the
    Forth system. However, these are relatively simple changes to kForth.

    --
    Krishna

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Marcel Hendrix on Sat Aug 6 16:21:16 2022
    Marcel Hendrix <mhx@iae.nl> writes:
    Knowing how much effort/nuisance it is to make words
    r/o during compilation and debugging would make it possible
    to weigh advantages and disadvantages. It seems that
    code and data can't be interleaved at all.

    If you want to write-protect code, you cannot have writable data in
    the same page. You can have read-only data (e.g., settled headers,
    constant values) in the same page (and that's good enough to avoid the
    cache consistency performance problem on the Pentium Pro, Athlon, and
    later CPUs), but cache utilization is better if you separate code and
    data.

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2022: https://euro.theforth.net

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Krishna Myneni@21:1/5 to Anton Ertl on Sat Aug 6 14:39:34 2022
    On 8/6/22 00:48, Anton Ertl wrote:
    Krishna Myneni <krishna.myneni@ccreweb.org> writes:
    ...
    It's important to note that the dictionary structure for the word itself
    is not able to be protected from being overwritten in this scheme.

    I don't see a reason why not. Compile-to-flash systems do it. If you
    don't want to change protection on every IMMEDIATE, DOES> etc., keep
    the most recent header in writeable memory, and only move it to
    read-only memory when the next header is created.


    All dictionary headers don't correspond to ordinary colon definitions.
    If one were to protect all headers, there may be issues with relocation affecting previously compiled code, such as with DEFERred words. I
    haven't thought through this problem enough yet to say with certainty
    that all headers can be protected as read-only. It may be highly system-dependent.

    ...

    It would require substantial changes to make the threaded code and/or
    the headers read-only; for the native code it would be relatively straight-forward to make all but the most recent native-code page unwriteable.


    I expect that your use of memory segments in Gforth should simplify the
    problem of placing the threaded code in write-protected segments.

    Bugs where code or headers were overwritten have not been problematic
    enough in our experience to take any such action. I have not had such
    a request by users, either.


    Well, such bugs may be occurring more often than you realize. Such bugs
    often don't have immediate consequences. I can run Forth code which
    works perfectly fine because it hasn't made use of corrupt parts of the
    system, but if such corruption has occurred I often find out when I type
    BYE and then see a Seg Fault as memory is freed while the application is terminating. This tells me to go back and find the bugs in my
    application Forth code.

    --
    Krishna

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Krishna Myneni@21:1/5 to dxforth on Sat Aug 6 14:45:20 2022
    On 8/5/22 23:09, dxforth wrote:
    The only protected system I've used is FlashForth. It attempts to protect the kernel on the basis a user should be able restart forth after a crash without having to re-flash the system. It's hard for me to evaluate the benefits of such a system without disabling the protection (not easy). The costs are known but the gain remains nebulous. Is there a developer who wouldn't have access to a programmer should re-flashing become necessary?
    And what failure rate are we talking about - once a day, once a month?

    I expect the failure rate is highly application dependent. An
    alternative to write-protecting the executable code and constant data,
    is to store hash/checksum of the data within a read-only region (even an
    EEPROM for a release application). Then, the user can check for
    corruption on demand by recomputing the hashes/checksums and comparing
    with the read-only data.

    I expect the probability of a failure to be highly dependent on the
    complexity of the application, and possibly on the hardware operating environment, e.g. if the power cycles on an off frequently while writing
    to the storage medium.

    --
    Krishna

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Krishna Myneni@21:1/5 to minf...@arcor.de on Sat Aug 6 14:24:15 2022
    On 8/6/22 01:19, minf...@arcor.de wrote:
    Krishna Myneni schrieb am Samstag, 6. August 2022 um 05:06:15 UTC+2:
    Summary: for some non-native Forth systems, it should be possible to
    relocate the compiled code of a colon definition into memory which can
    be marked read-only, to protect against corruption. For this to be
    feasible the run-time xt for a word should have at least one level of
    indirection to the code being executed by the virtual machine.


    The easiest way in a VM-based Forth would be to just add
    address-checking to all words that write to memory.
    Eg
    ! (store sanitized, safe but slow)
    _! (store naked, fast and unaccessible to the user)

    Not sure what you mean. How does a VM-based Forth distinguish between
    addresses which are data space vs addresses which contain the virtual
    machine code?

    kForth uses a separate type stack to distinguish between ordinary
    numbers and addresses. This has proven useful in flagging common
    mistakes caused by incorrect stack manipulation. It is quite useful, I
    think, in aiding beginning Forth programmers to reveal the source of the problem. The added complexity of the type stack gives a performance hit
    of about 15% in kForth.

    --
    Krishna

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From minforth@arcor.de@21:1/5 to Krishna Myneni on Sat Aug 6 22:33:35 2022
    Krishna Myneni schrieb am Samstag, 6. August 2022 um 21:24:20 UTC+2:
    On 8/6/22 01:19, minf...@arcor.de wrote:
    Krishna Myneni schrieb am Samstag, 6. August 2022 um 05:06:15 UTC+2:
    Summary: for some non-native Forth systems, it should be possible to
    relocate the compiled code of a colon definition into memory which can
    be marked read-only, to protect against corruption. For this to be
    feasible the run-time xt for a word should have at least one level of
    indirection to the code being executed by the virtual machine.


    The easiest way in a VM-based Forth would be to just add
    address-checking to all words that write to memory.
    Eg
    ! (store sanitized, safe but slow)
    _! (store naked, fast and unaccessible to the user)
    Not sure what you mean. How does a VM-based Forth distinguish between addresses which are data space vs addresses which contain the virtual
    machine code?

    When the dataspace is allocated at program start, it will contain no VM code.

    kForth uses a separate type stack to distinguish between ordinary
    numbers and addresses. This has proven useful in flagging common
    mistakes caused by incorrect stack manipulation. It is quite useful, I
    think, in aiding beginning Forth programmers to reveal the source of the problem. The added complexity of the type stack gives a performance hit
    of about 15% in kForth.

    IIRC StrongForth used a type stack to type-tag _all_ numeric types. But I
    don't know how much performance it eat.

    For pure stack and address checking (hard-coded in the words) I measured
    a runtime penalty of about 5%-7% in my system. But it is C-based and therefore slower anyhow so that those different performance hits cannot be compared.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Krishna Myneni on Sun Aug 7 14:11:58 2022
    Krishna Myneni <krishna.myneni@ccreweb.org> writes:
    On 8/6/22 00:48, Anton Ertl wrote:
    Krishna Myneni <krishna.myneni@ccreweb.org> writes:
    ...
    It's important to note that the dictionary structure for the word itself >>> is not able to be protected from being overwritten in this scheme.

    I don't see a reason why not. Compile-to-flash systems do it. If you
    don't want to change protection on every IMMEDIATE, DOES> etc., keep
    the most recent header in writeable memory, and only move it to
    read-only memory when the next header is created.


    All dictionary headers don't correspond to ordinary colon definitions.

    ?

    If one were to protect all headers, there may be issues with relocation >affecting previously compiled code, such as with DEFERred words.

    It's unclear to me how relocation comes in here, but for DEFERed words
    the thing that IS changes is not part of the header, just like for a
    VALUE the thing the TO changes is not part of the header. And of
    course you would need one indirection to get from the header to the
    data in these cases.

    I
    haven't thought through this problem enough yet to say with certainty
    that all headers can be protected as read-only. It may be highly >system-dependent.

    If you put in enough effort, they can.

    I expect that your use of memory segments in Gforth should simplify the >problem of placing the threaded code in write-protected segments.

    Yes, one can use sections for that purpose, but there is still a
    substantial amount of changes to make.

    Well, such bugs may be occurring more often than you realize. Such bugs
    often don't have immediate consequences. I can run Forth code which
    works perfectly fine because it hasn't made use of corrupt parts of the >system, but if such corruption has occurred I often find out when I type
    BYE and then see a Seg Fault as memory is freed while the application is >terminating. This tells me to go back and find the bugs in my
    application Forth code.

    Apart from your use as a debugging tool, freeing before exit()ing the
    process is a waste of time.

    One extreme case was Mozilla, which
    apparently leaked memory, and that memory was paged out over time
    (that was at a time when we still used swap space). When exiting
    Mozilla, it took several minutes to page the leaked memory back in in
    order to free() it. Only then it performed the exit(). It would have
    exited much faster if it had not freed first.

    My guess is that a manager saw the memory leaks, and demanded that the programmers fix them; so they dutifully recorded all memory that they allocated, and freed it when the user wanted to quit Mozilla. As a
    result, the memory leak tool they used for finding leaks reported that
    no memory was leaked, and the manager was satisfied. In reality the
    leaks were still there, and Mozilla was now sluggish on termination.

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2022: https://euro.theforth.net

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Krishna Myneni@21:1/5 to Anton Ertl on Sun Aug 7 11:03:22 2022
    On 8/7/22 09:11, Anton Ertl wrote:
    Krishna Myneni <krishna.myneni@ccreweb.org> writes:
    On 8/6/22 00:48, Anton Ertl wrote:
    Krishna Myneni <krishna.myneni@ccreweb.org> writes:
    ...
    It's important to note that the dictionary structure for the word itself >>>> is not able to be protected from being overwritten in this scheme.

    I don't see a reason why not. Compile-to-flash systems do it. If you
    don't want to change protection on every IMMEDIATE, DOES> etc., keep
    the most recent header in writeable memory, and only move it to
    read-only memory when the next header is created.


    All dictionary headers don't correspond to ordinary colon definitions.

    ?


    I was thinking of CREATEd words and whether or not protecting the
    dictionary headers for such words could cause problems for subsequently
    defined words which call the earlier words. It seems that as long as
    each dictionary header is protected after the corresponding word is
    protected (relocated) there shouldn't arise a problem with incorrect
    references to header entry fields, referenced by subsequent code.

    If one were to protect all headers, there may be issues with relocation
    affecting previously compiled code, such as with DEFERred words.

    It's unclear to me how relocation comes in here, but for DEFERed words
    the thing that IS changes is not part of the header, just like for a
    VALUE the thing the TO changes is not part of the header. And of
    course you would need one indirection to get from the header to the
    data in these cases.


    As long as the indirection is not bypassed by a compiler, there
    shouldn't be a problem.

    I
    haven't thought through this problem enough yet to say with certainty
    that all headers can be protected as read-only. It may be highly
    system-dependent.

    If you put in enough effort, they can.


    I agree that a Forth system architecture which provides memory
    protection for dictionary headers, non-native executable code of colon definitions, and for native code of CODE words/ordinary definitions is possible.

    I expect that your use of memory segments in Gforth should simplify the
    problem of placing the threaded code in write-protected segments.

    Yes, one can use sections for that purpose, but there is still a
    substantial amount of changes to make.


    I'm taking a cautious approach, focusing on protecting the non-native (tokenized) executable code for colon definitions for now, and seeing if
    there are any issues which come up. If there aren't any significant
    issues, then I can tackle dictionary header protection.

    Well, such bugs may be occurring more often than you realize. Such bugs
    often don't have immediate consequences. I can run Forth code which
    works perfectly fine because it hasn't made use of corrupt parts of the
    system, but if such corruption has occurred I often find out when I type
    BYE and then see a Seg Fault as memory is freed while the application is
    terminating. This tells me to go back and find the bugs in my
    application Forth code.

    Apart from your use as a debugging tool, freeing before exit()ing the
    process is a waste of time.


    My recollection is that, in Linux, it wasn't always the case that the OS cleaned up dynamically allocated memory for an application after
    termination -- that was the original reason for freeing memory prior to
    exit(). In almost all of my use cases, the exit() time is ignorable/not noticeable and, thus, there was no need to remove the memory freeing
    step. But it has the highly useful benefit of warning me via a seg fault
    that my session was corrupt and that any results from the session should
    be checked after fixing the bug(s) causing the corruption.

    --
    Krishna

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Krishna Myneni@21:1/5 to Krishna Myneni on Sun Aug 7 17:41:24 2022
    On 8/6/22 10:56, Krishna Myneni wrote:
    ...
    The details of PROTECT-DEF are, of course, Forth system and OS system-dependent. The source code for protect.4th is posted at

    https://github.com/mynenik/kForth-64/blob/master/forth-src/protect.4th


    I revised protect.4th to provide general purpose data buffer protection,
    e.g. create write-protected data tables. In addition to PROTECT-DEF for
    write protecting the executable data of colon definitions, PROTECT-DATA
    may be used to protect a data buffer of constant values,

    PROTECT-DATA ( aold u -- anew u )

    aold is the address of the existing read/write-able data buffer and u is
    its size in bytes. PROTECT-DATA copies the constant data into a
    read-only buffer of the same size. Currently there is a size limit of 1K
    bytes for the constant data buffer -- the memory management code in
    protect.4th may be revised to overcome this limit. PROTECT-DATA does not
    tamper with the original data buffer

    An example (in kForth) where this is useful is in the checksum program, sha512.4th, which uses a table of constants for computing the 512 byte
    checksum of a data buffer.

    -----
    include ans-words
    include strings
    include modules
    include utils
    include ssd
    include protect

    \ Hash constant words K for SHA-512
    HEX
    428A2F98D728AE22 7137449123EF65CD B5C0FBCFEC4D3B2F E9B5DBA58189DBBC 3956C25BF348B538 59F111F1B605D019 923F82A4AF194F9B AB1C5ED5DA6D8118 D807AA98A3030242 12835B0145706FBE 243185BE4EE4B28C 550C7DC3D5FFB4E2 72BE5D74F27B896F 80DEB1FE3B1696B1 9BDC06A725C71235 C19BF174CF692694 E49B69C19EF14AD2 EFBE4786384F25E3 0FC19DC68B8CD5B5 240CA1CC77AC9C65 2DE92C6F592B0275 4A7484AA6EA6E483 5CB0A9DCBD41FBD4 76F988DA831153B5 983E5152EE66DFAB A831C66D2DB43210 B00327C898FB213F BF597FC7BEEF0EE4 C6E00BF33DA88FC2 D5A79147930AA725 06CA6351E003826F 142929670A0E6E70 27B70A8546D22FFC 2E1B21385C26C926 4D2C6DFC5AC42AED 53380D139D95B3DF 650A73548BAF63DE 766A0ABB3C77B2A8 81C2C92E47EDAEE6 92722C851482353B A2BFE8A14CF10364 A81A664BBC423001 C24B8B70D0F89791 C76C51A30654BE30 D192E819D6EF5218 D69906245565A910 F40E35855771202A 106AA07032BBD1B8 19A4C116B8D2D0C8 1E376C085141AB53 2748774CDF8EEB99 34B0BCB5E19B48A8 391C0CB3C5C95A63 4ED8AA4AE3418ACB 5B9CCA4F7763E373 682E6FF3D6B2B8A3 748F82EE5DEFB2FC 78A5636F43172F60 84C87814A1F0AB72 8CC702081A6439EC 90BEFFFA23631E28 A4506CEBDE82BDE9 BEF9A3F7B2C67915 C67178F2E372532B CA273ECEEA26619C D186B8C721C0C207 EADA7DD6CDE0EB1E F57D4F7FEE6ED178 06F067AA72176FBA 0A637DC5A2C898A6 113F9804BEF90DAE 1B710B35131C471B 28DB77F523047D84 32CAAB7B40C72493 3C9EBE0A15C9BEBC 431D67C49C100D4C 4CC5D4BECB3E42B6 597F299CFC657E2A 5FCB6FAB3AD6FAEC 6C44198C4A475817
    50 table K512[]

    \ Read-only versiion of K512[]
    K512[] 50 cells Protect-Data drop constant K512P[]

    cr .( Base is HEX ) cr
    -----

    The word TABLE is defined in utils.4th -- it creates the ordinary data
    buffer, the address of which is returned by the word K512[]. To make a read-only version of this data buffer,

    Example of using protect.4th (in kForth-64)
    -----
    include protect-ex1

    Base is HEX
    ok
    K512P[] @ .
    428A2F98D728AE22 ok

    K512P[] 8 cells + @ u.
    D807AA98A3030242 ok
    ok
    0 K512[] ! \ the original buffer still exists and we can modify it.
    ok
    0 K512P[] ! \ attempt to modify the read-only version
    Segmentation fault (core dumped)
    $
    ------

    One problem with the current approach is a segmentation fault on
    executing BYE , because the cleanup code executed upon BYE tries to free
    the new byte code memory. This is why a protection flag is needed in the dictionary header, which involves changes to the source code for the
    Forth system. However, these are relatively simple changes to kForth.
    ...

    Unlike protecting the executable code of colon definitions, PROTECT-DATA
    does not require any changes to the internals of the Forth system, so it
    is immediately useful for creating read-only data buffers. There is no segmentation fault upon exiting the Forth system when using
    PROTECT-DATA, because K512P[] is simply a constant, and there is no
    attempt to free the read-only buffer.

    --
    Krishna

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From antispam@math.uni.wroc.pl@21:1/5 to Krishna Myneni on Mon Aug 8 01:07:29 2022
    Krishna Myneni <krishna.myneni@ccreweb.org> wrote:

    kForth uses a separate type stack to distinguish between ordinary
    numbers and addresses. This has proven useful in flagging common
    mistakes caused by incorrect stack manipulation. It is quite useful, I
    think, in aiding beginning Forth programmers to reveal the source of the problem. The added complexity of the type stack gives a performance hit
    of about 15% in kForth.

    If you limit this to stack, then it may help beginners, but
    will miss more "interesting" errors, like having a record
    with numbers in some fields and xt's in other. To detect
    access to wrong field you would need tags on _all_ data.

    --
    Waldek Hebisch

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From none) (albert@21:1/5 to krishna.myneni@ccreweb.org on Tue Aug 16 22:05:54 2022
    In article <tconob$klc8$1@dont-email.me>,
    Krishna Myneni <krishna.myneni@ccreweb.org> wrote:
    <SNIP>
    I agree that a Forth system architecture which provides memory
    protection for dictionary headers, non-native executable code of colon >definitions, and for native code of CODE words/ordinary definitions is >possible.

    Note that all this effort expended is for the case of defects in the
    program. It is much more useful to prevent defects.
    Making the the architecture more complicated doesn't help for preventing defects.

    --
    Krishna


    Groetjes Albert
    --
    "in our communism country Viet Nam, people are forced to be
    alive and in the western country like US, people are free to
    die from Covid 19 lol" duc ha
    albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From S Jack@21:1/5 to none albert on Wed Aug 17 05:42:27 2022
    On Tuesday, August 16, 2022 at 3:05:59 PM UTC-5, none albert wrote:
    In article <tconob$klc8$1...@dont-email.me>,
    Krishna Myneni <krishna...@ccreweb.org> wrote:

    Note that all this effort expended is for the case of defects in the
    program. It is much more useful to prevent defects.

    I don't discount hardening of "perfect" code not because the code may
    not be perfect but because hardware in the field under stress doesn't
    always follow the code.
    --
    me

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Hans Bezemer@21:1/5 to Krishna Myneni on Wed Aug 17 08:27:14 2022
    On Saturday, August 6, 2022 at 5:06:15 AM UTC+2, Krishna Myneni wrote:
    Well, 4tH doesn't have that issue.

    : foo 0 ;
    ' foo execute . \ same as typing FOO
    0 ' foo ! \ store a zero at the execution address for FOO
    foo

    Compiles to:

    4tH message: No errors at word 10
    Object size: 10 words
    String size: 0 chars
    Variables : 0 cells
    Strings : 0 chars
    Symbols : 1 names
    Reliable : Yes

    Addr| Opcode Operand Argument

    0| branch 2 foo
    1| literal 0
    2| exit 0
    3| literal 0
    4| execute 0
    5| . 0
    6| literal 0
    7| literal 0
    8| ! 0
    9| call 0 foo

    And executes: "0 Executing; Word 8: Bad variable"
    Where of course, the leading zero is generated by the program.

    What ' returns is the address of "foo" in the Code Segment and "!" treats this as an address in the Integer Segment.
    Unfortunately, address 0 of the Code Segment points to an address in the Integer Segment that is protected and
    cannot be overwritten. Hence, it is a "bad variable".

    4tH segmentation has been there since its very inception - and some segments are r/o (like the code- or the string
    segment), some are partially r/o (system vars in the integer segment) and some are completely r/w (like the
    character segment). Every access to these segments is either closely guarded or just impossible, since there
    are no words to write anything there.

    I don't have to go into "bar", because that one is equally impossible.

    Maybe one could design an equally segmented Forth compiler, idunno. Never tried. But it *has* worked for me the last
    30 odd years.

    Hans Bezemer

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Krishna Myneni@21:1/5 to albert on Thu Aug 18 08:30:42 2022
    On 8/16/22 15:05, albert wrote:
    In article <tconob$klc8$1@dont-email.me>,
    Krishna Myneni <krishna.myneni@ccreweb.org> wrote:
    <SNIP>
    I agree that a Forth system architecture which provides memory
    protection for dictionary headers, non-native executable code of colon
    definitions, and for native code of CODE words/ordinary definitions is
    possible.

    Note that all this effort expended is for the case of defects in the
    program. It is much more useful to prevent defects.

    Write protecting the virtual threaded code using low-level OS methods is
    a means of *detecting* program defects which corrupt the Forth system's
    code. Otherwise, a defective word may corrupt a part of the Forth system
    for which the consequences may not be readily apparent when executing
    words. With low level memory protection of the virtual threaded code,
    such corruption becomes immediately obvious.

    Making the the architecture more complicated doesn't help for preventing defects.


    The people who write link loaders may disagree with you -- such
    protection usually exists for native code on desktop systems -- the
    suggestion here is to extend memory protection to virtual threaded code
    on the same type of systems.

    --
    Krishna

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Krishna Myneni@21:1/5 to antispam@math.uni.wroc.pl on Thu Aug 18 08:43:55 2022
    On 8/7/22 20:07, antispam@math.uni.wroc.pl wrote:
    Krishna Myneni <krishna.myneni@ccreweb.org> wrote:

    kForth uses a separate type stack to distinguish between ordinary
    numbers and addresses. This has proven useful in flagging common
    mistakes caused by incorrect stack manipulation. It is quite useful, I
    think, in aiding beginning Forth programmers to reveal the source of the
    problem. The added complexity of the type stack gives a performance hit
    of about 15% in kForth.

    If you limit this to stack, then it may help beginners, but
    will miss more "interesting" errors, like having a record
    with numbers in some fields and xt's in other. To detect
    access to wrong field you would need tags on _all_ data.


    The programmer must keep track of which member fields of a structure are addresses (pointers) and which are not when accessing them through fetch operators. kForth provides distinct fetch operators for single cell
    non-address values and for addresses: @ for non-address values, and A@
    for address values. This allows the value to be type-tagged when it is
    fetched onto the data stack and for subsequent errors to be caught. For example, the sequences "@ @" and "@ A@" will result in a virtual machine
    error, while "A@ @" or "A@ A@" are legal. It is up to the programmer to
    use @ and A@ appropriately when accessing member fields of a structure.
    kForth does not catch usage errors when a field is accessed, but such
    errors are likely to result in the VM reporting a type mismatch error
    further down the execution chain.

    --
    Krishna

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From dxforth@21:1/5 to Krishna Myneni on Fri Aug 19 10:19:35 2022
    On 18/08/2022 23:30, Krishna Myneni wrote:
    On 8/16/22 15:05, albert wrote:
    In article <tconob$klc8$1@dont-email.me>,
    Krishna Myneni  <krishna.myneni@ccreweb.org> wrote:
    <SNIP>
    I agree that a Forth system architecture which provides memory
    protection for dictionary headers, non-native executable code of colon
    definitions, and for native code of CODE words/ordinary definitions is
    possible.

    Note that all this effort expended is for the case of defects in the
    program. It is much more useful to prevent defects.

    Write protecting the virtual threaded code using low-level OS methods is a means of *detecting* program defects which corrupt the Forth system's code. Otherwise, a defective word may corrupt a part of the Forth system for which the consequences may not
    be readily apparent when executing words. With low level memory protection of the virtual threaded code, such corruption becomes immediately obvious.

    Lack of checking in general should mean Forth applications are the most unreliable there are. Yet reports I've seen suggest opposite is true.
    Working 'closer to the metal' I believe forth programmers are in a better position to know what can go wrong. In contrast, programmers in other languages rely on the compiler to tell them what they're doing is wrong.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Andy Valencia@21:1/5 to dxforth on Thu Aug 18 18:01:18 2022
    dxforth <dxforth@gmail.com> writes:
    Lack of checking in general should mean Forth applications are the most unreliable there are. Yet reports I've seen suggest opposite is true. Working 'closer to the metal' I believe forth programmers are in a better position to know what can go wrong. In contrast, programmers in other languages rely on the compiler to tell them what they're doing is wrong.

    I'll go ahead and admit it: the hardest bugs to find I've ever written
    were in ForthOS. Next was VSTa (in C), and then downward from there.
    I think Golang let me write the hairiest performance intensive code
    while still hitting reliability with little effort.

    But, admittedly, it wasn't OS kernel code. Nor was the Python code
    which was not far away from Golang in ease, though its performance
    and scalability are a pathetic shadow of Golang.

    Andy Valencia
    Home page: https://www.vsta.org/andy/
    To contact me: https://www.vsta.org/contact/andy.html

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From none) (albert@21:1/5 to vandys@vsta.org on Fri Aug 19 12:06:36 2022
    In article <166087087894.31034.14766942655302290779@media.vsta.org>,
    Andy Valencia <vandys@vsta.org> wrote:
    dxforth <dxforth@gmail.com> writes:
    Lack of checking in general should mean Forth applications are the most
    unreliable there are. Yet reports I've seen suggest opposite is true.
    Working 'closer to the metal' I believe forth programmers are in a better
    position to know what can go wrong. In contrast, programmers in other
    languages rely on the compiler to tell them what they're doing is wrong.

    I'll go ahead and admit it: the hardest bugs to find I've ever written
    were in ForthOS. Next was VSTa (in C), and then downward from there.
    I think Golang let me write the hairiest performance intensive code
    while still hitting reliability with little effort.

    But, admittedly, it wasn't OS kernel code. Nor was the Python code
    which was not far away from Golang in ease, though its performance
    and scalability are a pathetic shadow of Golang.

    Note how a defect (bug?) in ForthOs doesn't profit from an elaborate
    protection scheme, because for this type of software this is not
    in place yet.


    Andy Valencia

    Groetjes Albert
    --
    "in our communism country Viet Nam, people are forced to be
    alive and in the western country like US, people are free to
    die from Covid 19 lol" duc ha
    albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From none) (albert@21:1/5 to sdwjack69@gmail.com on Fri Aug 19 12:50:08 2022
    In article <cf38725e-5d33-4792-a9ed-2280e9b573ean@googlegroups.com>,
    S Jack <sdwjack69@gmail.com> wrote:
    On Tuesday, August 16, 2022 at 3:05:59 PM UTC-5, none albert wrote:
    In article <tconob$klc8$1...@dont-email.me>,
    Krishna Myneni <krishna...@ccreweb.org> wrote:

    Note that all this effort expended is for the case of defects in the
    program. It is much more useful to prevent defects.

    I don't discount hardening of "perfect" code not because the code may
    not be perfect but because hardware in the field under stress doesn't
    always follow the code.

    I agree. However retrofitting these methods on Forth I consider
    doubtful.
    I would rather use strict languages that has built safety in,
    possibly augmented with a periodic crc check of code, and of course
    parity memory and watch dog timers.
    Modern language in this vein are Ada and go.

    My first Algol60 experience had two errors:
    array index out of bounds
    memory exhausted
    In each case you were given chapter and verse were the error occurred.
    ('Memory exhausted' was more often than not caused by infinite recursion.
    You had to known that.)

    --
    me

    Groetjes Albert
    --
    "in our communism country Viet Nam, people are forced to be
    alive and in the western country like US, people are free to
    die from Covid 19 lol" duc ha
    albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Krishna Myneni@21:1/5 to dxforth on Fri Aug 19 08:11:30 2022
    On 8/18/22 19:19, dxforth wrote:
    On 18/08/2022 23:30, Krishna Myneni wrote:
    On 8/16/22 15:05, albert wrote:
    In article <tconob$klc8$1@dont-email.me>,
    Krishna Myneni  <krishna.myneni@ccreweb.org> wrote:
    <SNIP>
    I agree that a Forth system architecture which provides memory
    protection for dictionary headers, non-native executable code of colon >>>> definitions, and for native code of CODE words/ordinary definitions is >>>> possible.

    Note that all this effort expended is for the case of defects in the
    program. It is much more useful to prevent defects.

    Write protecting the virtual threaded code using low-level OS methods
    is a means of *detecting* program defects which corrupt the Forth
    system's code. Otherwise, a defective word may corrupt a part of the
    Forth system for which the consequences may not be readily apparent
    when executing words. With low level memory protection of the virtual
    threaded code, such corruption becomes immediately obvious.

    Lack of checking in general should mean Forth applications are the most unreliable there are.  Yet reports I've seen suggest opposite is true. Working 'closer to the metal' I believe forth programmers are in a better position to know what can go wrong.  In contrast, programmers in other languages rely on the compiler to tell them what they're doing is wrong.



    The discussion up to now is unrelated to compiler features -- it's about
    the Forth system design enabling detection of coding errors. In the case
    of the compiler, the Forth language does not provide strict syntax rules
    and strong typing to allow for compiler checking, to the extent of other languages. Perhaps this does make for better programmers in the long run through a trial by fire experience -- such claims made here on c.l.f
    appear to be purely anecdotal and if there's hard evidence for working
    Forth programmers producing more robust code it would certainly be
    interesting to see. However, to the extent that a Forth system or
    compiler can aid in detection and reporting of program errors, I fail to
    see how that's a bad thing.

    --
    Krishna

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)