It is possible to use mmap and mprotect system calls (or equivalents
under Windows) to relocate the byte code to a new memory region and mark
that memory region as read-only, thereby avoiding this type of
corruption. It is relatively simple to do this from Forth itself,
although the details are obviously system-dependent. In this way, we
can, in principle, protect the executed code for a colon definition.
It's important to note that the dictionary structure for the word itself
is not able to be protected from being overwritten in this scheme.
I don't know the
internals of Gforth, but one can see that at least one level of
indirection appears to be involved in going from the xt to the executed
code, e.g., in Gforth,
see execute
Code execute
404AB9: mov $50[r13],r15
404ABD: mov rdx,[r14]
404AC0: add r14,$08
404AC4: mov rcx,-$10[rdx]
404AC8: jmp ecx
end-code
Here, the assembly code gives us the hint that r14 is the TOS (top of
stack) and there seems to be one level of indirection from the xt on top
of the stack to the code which is subsequently executed.
Summary: for some non-native Forth systems, it should be possible to
relocate the compiled code of a colon definition into memory which can
be marked read-only, to protect against corruption. For this to be
feasible the run-time xt for a word should have at least one level of indirection to the code being executed by the virtual machine.
Bugs where code or headers were overwritten have not been problematic
enough in our experience to take any such action. I have not had such
a request by users, either.
Krishna Myneni schrieb am Samstag, 6. August 2022 um 05:06:15 UTC+2:
Summary: for some non-native Forth systems, it should be possible to
relocate the compiled code of a colon definition into memory which can
be marked read-only, to protect against corruption. For this to be
feasible the run-time xt for a word should have at least one level of
indirection to the code being executed by the virtual machine.
The easiest way in a VM-based Forth would be to just add
address-checking to all words that write to memory.
Eg
! (store sanitized, safe but slow)
_! (store naked, fast and unaccessible to the user)
Actually, I'm extremely glad that
code is *not* read protected: how would I have noticed that something was >wrong?
An overwite of native code causes an almost immediate crash
that leads to a useful exception report.
If the code was write-protected and you tried to write to it, you[..]
would get a SIGSEGV on Unix. E.g., when you do SEE FSIN in Gforth,
you see the code for FSIN coming from the gcc, which is
write-protected. Now let's see what happens when I try to write
there:
1 $5586FF58AFFB c!
*the terminal*:3:17: error: Invalid memory address
1 $5586FF58AFFB >>>c!<<<
Marcel Hendrix <mhx@iae.nl> writes:
Actually, I'm extremely glad that
code is *not* read protected: how would I have noticed that something was
wrong?
If the code was write-protected and you tried to write to it, you
would get a SIGSEGV on Unix. ...
So write-protecting the code can have an advantage. The question is
if the advantage is big enough to justify the effort.
Knowing how much effort/nuisance it is to make words
r/o during compilation and debugging would make it possible
to weigh advantages and disadvantages. It seems that
code and data can't be interleaved at all.
Krishna Myneni <krishna.myneni@ccreweb.org> writes:...
It's important to note that the dictionary structure for the word itself
is not able to be protected from being overwritten in this scheme.
I don't see a reason why not. Compile-to-flash systems do it. If you
don't want to change protection on every IMMEDIATE, DOES> etc., keep
the most recent header in writeable memory, and only move it to
read-only memory when the next header is created.
It would require substantial changes to make the threaded code and/or
the headers read-only; for the native code it would be relatively straight-forward to make all but the most recent native-code page unwriteable.
Bugs where code or headers were overwritten have not been problematic
enough in our experience to take any such action. I have not had such
a request by users, either.
The only protected system I've used is FlashForth. It attempts to protect the kernel on the basis a user should be able restart forth after a crash without having to re-flash the system. It's hard for me to evaluate the benefits of such a system without disabling the protection (not easy). The costs are known but the gain remains nebulous. Is there a developer who wouldn't have access to a programmer should re-flashing become necessary?
And what failure rate are we talking about - once a day, once a month?
Krishna Myneni schrieb am Samstag, 6. August 2022 um 05:06:15 UTC+2:
Summary: for some non-native Forth systems, it should be possible to
relocate the compiled code of a colon definition into memory which can
be marked read-only, to protect against corruption. For this to be
feasible the run-time xt for a word should have at least one level of
indirection to the code being executed by the virtual machine.
The easiest way in a VM-based Forth would be to just add
address-checking to all words that write to memory.
Eg
! (store sanitized, safe but slow)
_! (store naked, fast and unaccessible to the user)
On 8/6/22 01:19, minf...@arcor.de wrote:
Krishna Myneni schrieb am Samstag, 6. August 2022 um 05:06:15 UTC+2:
Summary: for some non-native Forth systems, it should be possible to
relocate the compiled code of a colon definition into memory which can
be marked read-only, to protect against corruption. For this to be
feasible the run-time xt for a word should have at least one level of
indirection to the code being executed by the virtual machine.
The easiest way in a VM-based Forth would be to just addNot sure what you mean. How does a VM-based Forth distinguish between addresses which are data space vs addresses which contain the virtual
address-checking to all words that write to memory.
Eg
! (store sanitized, safe but slow)
_! (store naked, fast and unaccessible to the user)
machine code?
kForth uses a separate type stack to distinguish between ordinary
numbers and addresses. This has proven useful in flagging common
mistakes caused by incorrect stack manipulation. It is quite useful, I
think, in aiding beginning Forth programmers to reveal the source of the problem. The added complexity of the type stack gives a performance hit
of about 15% in kForth.
On 8/6/22 00:48, Anton Ertl wrote:
Krishna Myneni <krishna.myneni@ccreweb.org> writes:...
It's important to note that the dictionary structure for the word itself >>> is not able to be protected from being overwritten in this scheme.
I don't see a reason why not. Compile-to-flash systems do it. If you
don't want to change protection on every IMMEDIATE, DOES> etc., keep
the most recent header in writeable memory, and only move it to
read-only memory when the next header is created.
All dictionary headers don't correspond to ordinary colon definitions.
If one were to protect all headers, there may be issues with relocation >affecting previously compiled code, such as with DEFERred words.
I
haven't thought through this problem enough yet to say with certainty
that all headers can be protected as read-only. It may be highly >system-dependent.
I expect that your use of memory segments in Gforth should simplify the >problem of placing the threaded code in write-protected segments.
Well, such bugs may be occurring more often than you realize. Such bugs
often don't have immediate consequences. I can run Forth code which
works perfectly fine because it hasn't made use of corrupt parts of the >system, but if such corruption has occurred I often find out when I type
BYE and then see a Seg Fault as memory is freed while the application is >terminating. This tells me to go back and find the bugs in my
application Forth code.
Krishna Myneni <krishna.myneni@ccreweb.org> writes:
On 8/6/22 00:48, Anton Ertl wrote:
Krishna Myneni <krishna.myneni@ccreweb.org> writes:...
It's important to note that the dictionary structure for the word itself >>>> is not able to be protected from being overwritten in this scheme.
I don't see a reason why not. Compile-to-flash systems do it. If you
don't want to change protection on every IMMEDIATE, DOES> etc., keep
the most recent header in writeable memory, and only move it to
read-only memory when the next header is created.
All dictionary headers don't correspond to ordinary colon definitions.
?
If one were to protect all headers, there may be issues with relocation
affecting previously compiled code, such as with DEFERred words.
It's unclear to me how relocation comes in here, but for DEFERed words
the thing that IS changes is not part of the header, just like for a
VALUE the thing the TO changes is not part of the header. And of
course you would need one indirection to get from the header to the
data in these cases.
I
haven't thought through this problem enough yet to say with certainty
that all headers can be protected as read-only. It may be highly
system-dependent.
If you put in enough effort, they can.
I expect that your use of memory segments in Gforth should simplify the
problem of placing the threaded code in write-protected segments.
Yes, one can use sections for that purpose, but there is still a
substantial amount of changes to make.
Well, such bugs may be occurring more often than you realize. Such bugs
often don't have immediate consequences. I can run Forth code which
works perfectly fine because it hasn't made use of corrupt parts of the
system, but if such corruption has occurred I often find out when I type
BYE and then see a Seg Fault as memory is freed while the application is
terminating. This tells me to go back and find the bugs in my
application Forth code.
Apart from your use as a debugging tool, freeing before exit()ing the
process is a waste of time.
The details of PROTECT-DEF are, of course, Forth system and OS system-dependent. The source code for protect.4th is posted at
https://github.com/mynenik/kForth-64/blob/master/forth-src/protect.4th
One problem with the current approach is a segmentation fault on
executing BYE , because the cleanup code executed upon BYE tries to free
the new byte code memory. This is why a protection flag is needed in the dictionary header, which involves changes to the source code for the
Forth system. However, these are relatively simple changes to kForth.
...
kForth uses a separate type stack to distinguish between ordinary
numbers and addresses. This has proven useful in flagging common
mistakes caused by incorrect stack manipulation. It is quite useful, I
think, in aiding beginning Forth programmers to reveal the source of the problem. The added complexity of the type stack gives a performance hit
of about 15% in kForth.
I agree that a Forth system architecture which provides memory
protection for dictionary headers, non-native executable code of colon >definitions, and for native code of CODE words/ordinary definitions is >possible.
--
Krishna
In article <tconob$klc8$1...@dont-email.me>,
Krishna Myneni <krishna...@ccreweb.org> wrote:
Note that all this effort expended is for the case of defects in the
program. It is much more useful to prevent defects.
In article <tconob$klc8$1@dont-email.me>,
Krishna Myneni <krishna.myneni@ccreweb.org> wrote:
<SNIP>
I agree that a Forth system architecture which provides memory
protection for dictionary headers, non-native executable code of colon
definitions, and for native code of CODE words/ordinary definitions is
possible.
Note that all this effort expended is for the case of defects in the
program. It is much more useful to prevent defects.
Making the the architecture more complicated doesn't help for preventing defects.
Krishna Myneni <krishna.myneni@ccreweb.org> wrote:
kForth uses a separate type stack to distinguish between ordinary
numbers and addresses. This has proven useful in flagging common
mistakes caused by incorrect stack manipulation. It is quite useful, I
think, in aiding beginning Forth programmers to reveal the source of the
problem. The added complexity of the type stack gives a performance hit
of about 15% in kForth.
If you limit this to stack, then it may help beginners, but
will miss more "interesting" errors, like having a record
with numbers in some fields and xt's in other. To detect
access to wrong field you would need tags on _all_ data.
On 8/16/22 15:05, albert wrote:be readily apparent when executing words. With low level memory protection of the virtual threaded code, such corruption becomes immediately obvious.
In article <tconob$klc8$1@dont-email.me>,
Krishna Myneni <krishna.myneni@ccreweb.org> wrote:
<SNIP>
I agree that a Forth system architecture which provides memory
protection for dictionary headers, non-native executable code of colon
definitions, and for native code of CODE words/ordinary definitions is
possible.
Note that all this effort expended is for the case of defects in the
program. It is much more useful to prevent defects.
Write protecting the virtual threaded code using low-level OS methods is a means of *detecting* program defects which corrupt the Forth system's code. Otherwise, a defective word may corrupt a part of the Forth system for which the consequences may not
Lack of checking in general should mean Forth applications are the most unreliable there are. Yet reports I've seen suggest opposite is true. Working 'closer to the metal' I believe forth programmers are in a better position to know what can go wrong. In contrast, programmers in other languages rely on the compiler to tell them what they're doing is wrong.
dxforth <dxforth@gmail.com> writes:
Lack of checking in general should mean Forth applications are the most
unreliable there are. Yet reports I've seen suggest opposite is true.
Working 'closer to the metal' I believe forth programmers are in a better
position to know what can go wrong. In contrast, programmers in other
languages rely on the compiler to tell them what they're doing is wrong.
I'll go ahead and admit it: the hardest bugs to find I've ever written
were in ForthOS. Next was VSTa (in C), and then downward from there.
I think Golang let me write the hairiest performance intensive code
while still hitting reliability with little effort.
But, admittedly, it wasn't OS kernel code. Nor was the Python code
which was not far away from Golang in ease, though its performance
and scalability are a pathetic shadow of Golang.
Andy Valencia
On Tuesday, August 16, 2022 at 3:05:59 PM UTC-5, none albert wrote:
In article <tconob$klc8$1...@dont-email.me>,
Krishna Myneni <krishna...@ccreweb.org> wrote:
Note that all this effort expended is for the case of defects in the
program. It is much more useful to prevent defects.
I don't discount hardening of "perfect" code not because the code may
not be perfect but because hardware in the field under stress doesn't
always follow the code.
--
me
On 18/08/2022 23:30, Krishna Myneni wrote:
On 8/16/22 15:05, albert wrote:
In article <tconob$klc8$1@dont-email.me>,
Krishna Myneni <krishna.myneni@ccreweb.org> wrote:
<SNIP>
I agree that a Forth system architecture which provides memory
protection for dictionary headers, non-native executable code of colon >>>> definitions, and for native code of CODE words/ordinary definitions is >>>> possible.
Note that all this effort expended is for the case of defects in the
program. It is much more useful to prevent defects.
Write protecting the virtual threaded code using low-level OS methods
is a means of *detecting* program defects which corrupt the Forth
system's code. Otherwise, a defective word may corrupt a part of the
Forth system for which the consequences may not be readily apparent
when executing words. With low level memory protection of the virtual
threaded code, such corruption becomes immediately obvious.
Lack of checking in general should mean Forth applications are the most unreliable there are. Yet reports I've seen suggest opposite is true. Working 'closer to the metal' I believe forth programmers are in a better position to know what can go wrong. In contrast, programmers in other languages rely on the compiler to tell them what they're doing is wrong.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 300 |
Nodes: | 16 (2 / 14) |
Uptime: | 38:43:41 |
Calls: | 6,708 |
Calls today: | 1 |
Files: | 12,241 |
Messages: | 5,353,575 |