Forum: >>> Magnum BBS <<<

Separating .text and .data segment in a assembler Forth

From none) (albert@21:1/5 to All on Wed Oct 18 16:34:47 2023

ciforth is much a classical Forth. The headers are followed by
high level code, machine code or data.

Is there any experience in separating code and data using
the text segment?
[The text segment in Unix parlance is data that cannot be modified
from within the program. ]
Nowadays apparently Apple requires that all executable code resides in
her text segment for the modern systems.
This flies in the face of Forth facilities and
artificial intelligence, e.g. it hinders just in time optimisation.
(High level code is data, merely interpreted.)

I'm interested in the problems encountered, and also if there
is any benefit in speed. For example the code snippets of
ciforth easily fits in the L1 cache and is also not near
any modifiable data.

ciforth can do this relatively easy, because it is indirect
threaded. I can imagine that directly threaded, subroutine
threaded code encounters even more difficulties.

Groetjes Albert
--
Don't praise the day before the evening. One swallow doesn't make spring.
You must not say "hey" before you have crossed the bridge. Don't sell the
hide of the bear until you shot it. Better one bird in the hand than ten in
the air. First gain is a cat spinning. - the Wise from Antrim -

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to albert@cherry. on Wed Oct 18 15:41:33 2023

albert@cherry.(none) (albert) writes:

ciforth is much a classical Forth. The headers are followed by
high level code, machine code or data.

Is there any experience in separating code and data using
the text segment?

In Gforth without dynamic code generation the native code is certainly
in the text segment.

Nowadays apparently Apple requires that all executable code resides in
her text segment for the modern systems.

I think what you mean is that MacOS on Apple Silicon puts additional restrictions on executable segments. Neither Linux on Apple Silicon
nor MacOS on Intel have this "feature".

For a traditional-style threaded-code system written in an assembler
that supports text and data sections (i.e., every normal assembler,
but usually not a Forth assembler), it's easy to satisfy this
restriction. Just do something like

.data
... # header for +
.quad plus
.text
.balign 16
plus:
... #code for +

I'm interested in the problems encountered, and also if there
is any benefit in speed.

There has been a benefit in speed in separating native code from data
for three decades on Intel CPUs (since the original Pentium), and I
have first written about that here in 1995.

E.g, on slide 12 of https://www.complang.tuwien.ac.at/anton/euroforth/ef09/papers/ertl-slides.pdf you see that bigForth performs worse than Gforth by more than a factor
of 2 on brew and by a factor of 5 on cd16sim; gforth also performs
slightly better than spf4 on fcp and slightly better than bigforth on
lexex. I investigated this in the case of cd16sim, and found that it
is due to bigforth mixing code and data; I suspect that the other
cases are also due to mixing code and data, because those are all
native-code systems that should normally outperform gforth.

Many years after 1995 native-code systems like iForth, SwiftForth, and
VFX still mix code and data, but they have worked around the
performance problems by putting padding between code and data. My
impression is that SwiftForth and VFX apparently only put as little
padding there to paper over some performance problem of the data, and
I saw performance problems with these systems several times. By
contrast, I never saw it with iForth (but then I use it less
frequently), and doing a test on iForth reveals that it uses $400
bytes of padding:

FORTH> create foo ok
FORTH> : bar + ; ok
FORTH> ' bar idis
$10226840 : bar 488BC04883ED088F4500 H.@H.m..E. $1022684A pop rbx 5B [
$1022684B pop rdi 5F _
$1022684C lea rbx, [rdi rbx*1] qword
488D1C1F H... $10226850 push rbx 53 S
$10226851 ; 488B45004883C508FFE0 H.E.H.E..` ok FORTH> foo hex . 10226410 ok
FORTH> here hex . 102268B0 ok
FORTH> bla hex . 10226CC0 ok

However, looking at the second-to-last line, I expect that we can
still see a performance problem from code where the data does not
start with a defining word, like (proof-of-concept):

: foo 100000000 0 do 0 over ! loop drop ;
here 0 ,
foo

ciforth can do this relatively easy, because it is indirect
threaded. I can imagine that directly threaded, subroutine
threaded code encounters even more difficulties.

Certainly the way that direct threading was implemented in Gforth in
the early days (up to and including 0.5 in 2000) was slow on the
Pentium and later CPUs (but there was the option of indirect threaded
code), and AFAIK is not supported on MacOS on Apple Silicon. Gforth
then switched to hybrid direct/indirect threaded code:

@InProceedings{ertl02,
author = {M. Anton Ertl},
title = {Threaded Code Variations and Optimizations (Extended
Version)},
booktitle = {Forth-Tagung 2002},
year = {2002},
address = {Garmisch-Partenkirchen},
url = {http://www.complang.tuwien.ac.at/papers/ertl02.ps.gz},
abstract = {Forth has been traditionally implemented as indirect
threaded code, where the code for non-primitives is
the code-field address of the word. To get the
maximum benefit from combining sequences of
primitives into superinstructions, the code produced
for a non-primitive should be a primitive followed
by a parameter (e.g., \code{lit} \emph{addr} for
variables). This paper takes a look at the steps
from a traditional threaded-code implementation to
superinstructions, and at the size and speed effects
of the various steps.\comment{It also compares these
variants of Gforth to various other Forth
implementations on contemporary machines.} The use
of superinstructions gives speedups of up to a
factor of 2 on large benchmarks on processors with
branch target buffers, but requires more space for
the primitives and the optimization tables, and also
a little more space for the threaded code.}
}

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023: https://euro.theforth.net/2023

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Marcel Hendrix@21:1/5 to Anton Ertl on Wed Oct 18 10:34:17 2023

On Wednesday, October 18, 2023 at 7:06:16 PM UTC+2, Anton Ertl wrote:
[..]

However, looking at the second-to-last line, I expect that we can
still see a performance problem from code where the data does not
start with a defining word, like (proof-of-concept):

FORTH> : foo 100000000 0 do 0 over ! loop drop ; ok
FORTH> here 0 , ok
[1]FORTH> ' foo idis
$013CEAC0 : foo
$013CEACA mov rcx, $05F5E100 d#
$013CEAD1 xor rbx, rbx
$013CEAD4 call (DO) offset NEAR
$013CEADE nop
$013CEADF nop
$013CEAE0 mov [rbx] qword, 0 d#
$013CEAE7 add [rbp 0 +] qword, 1 b#
$013CEAEC add [rbp 8 +] qword, 1 b#
$013CEAF1 jno $013CEAE0 offset NEAR
$013CEAF7 add rbp, #24 b#
$013CEAFB ;
$013CEB05 nop
$013CEB06 nop
[1]FORTH> dup h. $013CEB70 ok
[1]FORTH> foo ok
FORTH> $013CEB70 ? 0 ok
FORTH> see foo
Flags: TOKENIZE, ANSI
: foo 100000000 0 DO 0 OVER ! LOOP DROP ; ok
ok
FORTH> : test ( addr -- ) cr dup h. space timer-reset foo .elapsed ; ok FORTH> $013CEB70 test
$013CEB70 0.037 seconds elapsed. ok
FORTH> PAD test
$013CF1B8 0.036 seconds elapsed. ok
FORTH> PAD 4000 + aligned test
$013D0158 0.038 seconds elapsed. ok

Not in this case, at least. However, with a bit more cleverness it is possible to write data in a cached line of preceding code that really needs to execute (CREATE ... DOES> or ... [ 0 , ] ... ). ISTR that in the past I have used ALIGN
once or twice to get rid of a real (or imagined) problem.

iForth is since long prepared for separated data and code, but I never enabled it because I would mean introducing new/non-standard words for CREATE ..
DOES> and , C, etc.. Maybe in next year's release.

-marcel

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From none) (albert@21:1/5 to mhx@iae.nl on Wed Oct 18 20:36:58 2023

In article <53600953-77c8-466c-a1b2-3044388359e9n@googlegroups.com>,
Marcel Hendrix <mhx@iae.nl> wrote:

iForth is since long prepared for separated data and code, but I never enabled >it because I would mean introducing new/non-standard words for CREATE .. >DOES> and , C, etc.. Maybe in next year's release.

I can't see that one has to introduce non-standard words.
Also the changes to CODE ENDCODE ;CODE doesn't seem to be
a bug deal either. But you are right, only do it, if it has
benefits.

-marcel

--
Don't praise the day before the evening. One swallow doesn't make spring.
You must not say "hey" before you have crossed the bridge. Don't sell the
hide of the bear until you shot it. Better one bird in the hand than ten in
the air. First gain is a cat spinning. - the Wise from Antrim -

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to Marcel Hendrix on Wed Oct 18 19:03:01 2023

Marcel Hendrix <mhx@iae.nl> writes:

On Wednesday, October 18, 2023 at 7:06:16=E2=80=AFPM UTC+2, Anton Ertl wrot= >e:
[..]

However, looking at the second-to-last line, I expect that we can
still see a performance problem from code where the data does not
start with a defining word, like (proof-of-concept):

FORTH> : foo 100000000 0 do 0 over ! loop drop ; ok
FORTH> here 0 , ok
[1]FORTH> ' foo idis
$013CEAC0 : foo
$013CEACA mov rcx, $05F5E100 d#
$013CEAD1 xor rbx, rbx
$013CEAD4 call (DO) offset NEAR
$013CEADE nop
$013CEADF nop
$013CEAE0 mov [rbx] qword, 0 d#
$013CEAE7 add [rbp 0 +] qword, 1 b#
$013CEAEC add [rbp 8 +] qword, 1 b#
$013CEAF1 jno $013CEAE0 offset NEAR
$013CEAF7 add rbp, #24 b#
$013CEAFB ;
$013CEB05 nop
$013CEB06 nop
[1]FORTH> dup h. $013CEB70 ok
[1]FORTH> foo ok
FORTH> $013CEB70 ? 0 ok
FORTH> see foo
Flags: TOKENIZE, ANSI
: foo 100000000 0 DO 0 OVER ! LOOP DROP ; ok
ok
FORTH> : test ( addr -- ) cr dup h. space timer-reset foo .elapsed ; = >ok
FORTH> $013CEB70 test
$013CEB70 0.037 seconds elapsed. ok
FORTH> PAD test
$013CF1B8 0.036 seconds elapsed. ok
FORTH> PAD 4000 + aligned test
$013D0158 0.038 seconds elapsed. ok

I measure the following on a 4GHz Core i5-6600K:

: foo 100000000 0 do 0 over ! loop drop ; ok
here 0 , ok
dup h. timer-reset foo .elapsed

foo start loop end cell address time
$10226000 $10226037 $102260B0 0.140s
$102268C0 $102268FF $10226970 5.711s

Why is the code longer in the second case? For some reason, it used a
10-byte instruction to put $00000001:00000000 into rcx, while the
first variant used a 7-byte instruction to put $05F5E100 into rcx.
The rest seems to be due to alignment.

Anyway, for the discussion at hand, if a loop ends close to the end of
a cache line, prefetching is apparently aggressive enough to prefetch
the next two cache lines into the I-cache (although the branch should
be predicted to be taken in the normal case), and this causes the I/D-cache-ping-pong that causes the slowdowns.

I played around with variations on the following:

: foo ( xt addr -- ) 100000000 0 do over execute loop 2drop ;
: bar
$123456789abcdef over ! $12345678 over ! $1234567 over !
0 over ! 1 over ! ;
here 0 , constant addr
addr h. cr
' bar addr timer-reset foo .elapsed
' bar idis
bye

The idea here is that the end of BAR is executed at every iteration,
bringing the code and the data even closer than in the example above.
But in this case I did not see the slowdown, even with BAR ending 1
byte before the end of the cache line, and even if it ends at the end
of a cache line.

So, the padding you put after code is usually enough, but I found one
case where it was not.

iForth is since long prepared for separated data and code, but I never enab= >led
it because I would mean introducing new/non-standard words for CREATE .. >DOES> and , C, etc.. Maybe in next year's release.

I don't see why it should. Gforth keeps the native code elsewhere
without such words.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023: https://euro.theforth.net/2023

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Marcel Hendrix@21:1/5 to Anton Ertl on Wed Oct 18 14:18:51 2023

On Wednesday, October 18, 2023 at 10:00:25 PM UTC+2, Anton Ertl wrote:

Marcel Hendrix <m...@iae.nl> writes:

On Wednesday, October 18, 2023 at 7:06:16 PM UTC+2, Anton Ertl wrote:

[..]

foo start loop end cell address time
$10226000 $10226037 $102260B0 0.140s
$102268C0 $102268FF $10226970 5.711s

Why is the code longer in the second case? For some reason, it used a
10-byte instruction to put $00000001:00000000 into rcx, while the
first variant used a 7-byte instruction to put $05F5E100 into rcx.

It seems you were in HEX, which means your second loop was ...
decimal $0000000100000000 100000000 / .
... 42 times longer than the first loop. Therefore the ratio of timings was 5711 140 / .
... 42 which is no surprise.

When generating code, iForth tries to use 32bit constants when possible,
which explains the 4 byte size difference.
[..]

But in this case I did not see the slowdown, even with BAR ending 1
byte before the end of the cache line, and even if it ends at the end
of a cache line.

So, the padding you put after code is usually enough, but I found one
case where it was not.

If you were in HEX, then you didn't :-)

iForth is since long prepared for separated data and code, but I never enabled
it because I would mean introducing new/non-standard words for CREATE .. >DOES> and , C, etc.. Maybe in next year's release.

I don't see why it should. Gforth keeps the native code elsewhere
without such words.

How do you generate native code with separate code (protected for write)
and data segments, given the assembler is written in Forth. I can't use
the standard !, C!, @, C@, C, and , to access the code segment.

-marcel

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to Marcel Hendrix on Thu Oct 19 10:04:42 2023

Marcel Hendrix <mhx@iae.nl> writes:

On Wednesday, October 18, 2023 at 10:00:25=E2=80=AFPM UTC+2, Anton Ertl wro= >te:

Marcel Hendrix <m...@iae.nl> writes:

On Wednesday, October 18, 2023 at 7:06:16 PM UTC+2, Anton Ertl wrote:

[..]

foo start loop end cell address time
$10226000 $10226037 $102260B0 0.140s
$102268C0 $102268FF $10226970 5.711s

Why is the code longer in the second case? For some reason, it used a
10-byte instruction to put $00000001:00000000 into rcx, while the
first variant used a 7-byte instruction to put $05F5E100 into rcx.

It seems you were in HEX, which means your second loop was ...
decimal $0000000100000000 100000000 / .=20
... 42 times longer than the first loop. Therefore the ratio of timings was >5711 140 / .=20
... 42 which is no surprise.

Ah, yes, thank you. I remember falling into this trap with a
benchmark three decades ago. In the meantime, we have added number
prefixes, and I should make it my custom to write decimal numbers with
a "#" prefix. Still, I would also like if we standardized words like
HEX. (Gforth) or H. (iForth, SwiftForth). I guess I'll add H. to
Gforth.

For DEC. (Gforth, iForth) there is only one name in the systems I
tested.

iForth is since long prepared for separated data and code, but I never e= >nabled
it because I would mean introducing new/non-standard words for CREATE ..
DOES> and , C, etc.. Maybe in next year's release.

I don't see why it should. Gforth keeps the native code elsewhere
without such words.

How do you generate native code with separate code (protected for write)=20 >and data segments, given the assembler is written in Forth. I can't use
the standard !, C!, @, C@, C, and , to access the code segment.

Why can you not use @ and C@?

If the code area is write-protected, you cannot write any code there,
not with ! C!, nor with "new/non-standard words". So you have to make
at least the page(s) in the code area where your code is going to land writeable, maybe only during code generation (but I make the code RWX
all the time, everything else is security theatre in a Forth system
that allows the user to do anything the process can do anyway),
and then ! C! work.

Concerning C, ,: The way I would do it in development Gforth is to
have a separate section, say, NATIVE-CODE (the name may be a little
bit too verbose for constant usage, but good enough for the example
below), and then do something like:

\ now generate some native code
[: ... c, ... c, ... , ... ;] native-code section-execute

Each section has its own dictionary pointer, and SECTION-EXECUTE in
the example above switches to the dictionary pointer of NATIVE-CODE,
then executes the quotation (i.e., the C,s and , in the quotation
append to the NATIVE-CODE section), and then switches back to the
dictionary pointer of the section in use before. For implementing
quotations a stack of native-code sections would be useful (avoids the
need to branch around the code of the quotation).

These features are not yet documented; my EuroForth 2016 paper <http://www.euroforth.org/ef16/papers/ertl-sections.pdf> explains the
basic idea (including a section on separating code and data), but the
current implementation differs from what is proposed in the paper. In particular, named sections do not form a section stack (we had no need
yet), only the main section "Forth" has a stack (for quotations,
strings and the like). OTOH, we have added words like SECTION-EXECUTE
which are not described in the paper. You can find the current
implementation in:

http://git.savannah.gnu.org/cgit/gforth.git/tree/sections.fs http://git.savannah.gnu.org/cgit/gforth.git/tree/sections2.fs

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023: https://euro.theforth.net/2023

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From none) (albert@21:1/5 to mhx@iae.nl on Fri Oct 20 00:58:48 2023

In article <53600953-77c8-466c-a1b2-3044388359e9n@googlegroups.com>,
Marcel Hendrix <mhx@iae.nl> wrote:

On Wednesday, October 18, 2023 at 7:06:16 PM UTC+2, Anton Ertl wrote:
[..]

However, looking at the second-to-last line, I expect that we can
still see a performance problem from code where the data does not
start with a defining word, like (proof-of-concept):

FORTH> : foo 100000000 0 do 0 over ! loop drop ; ok
FORTH> here 0 , ok
[1]FORTH> ' foo idis
$013CEAC0 : foo
$013CEACA mov rcx, $05F5E100 d#
$013CEAD1 xor rbx, rbx
$013CEAD4 call (DO) offset NEAR
$013CEADE nop
$013CEADF nop
$013CEAE0 mov [rbx] qword, 0 d#
$013CEAE7 add [rbp 0 +] qword, 1 b#
$013CEAEC add [rbp 8 +] qword, 1 b#
$013CEAF1 jno $013CEAE0 offset NEAR
$013CEAF7 add rbp, #24 b#
$013CEAFB ;
$013CEB05 nop
$013CEB06 nop
[1]FORTH> dup h. $013CEB70 ok
[1]FORTH> foo ok
FORTH> $013CEB70 ? 0 ok
FORTH> see foo
Flags: TOKENIZE, ANSI
: foo 100000000 0 DO 0 OVER ! LOOP DROP ; ok
ok
FORTH> : test ( addr -- ) cr dup h. space timer-reset foo .elapsed ; ok >FORTH> $013CEB70 test
$013CEB70 0.037 seconds elapsed. ok
FORTH> PAD test
$013CF1B8 0.036 seconds elapsed. ok
FORTH> PAD 4000 + aligned test
$013D0158 0.038 seconds elapsed. ok

Not in this case, at least. However, with a bit more cleverness it is possible >to write data in a cached line of preceding code that really needs to execute >(CREATE ... DOES> or ... [ 0 , ] ... ). ISTR that in the past I have
used ALIGN
once or twice to get rid of a real (or imagined) problem.

iForth is since long prepared for separated data and code, but I never enabled >it because I would mean introducing new/non-standard words for CREATE .. >DOES> and , C, etc.. Maybe in next year's release.

Experimenting with ciforth for AMD 64.
I have done an experiment. Placed cold machine code in a text segment,
that works. Moved the low level code of drop in that segment, that works.
The linker (as it should) takes care of filling in the code field of DROP.

The code of drop could be dumped by lowlevel tools.
I was surprised that I could patch the code of DROP, that was supposedly
in a read only segment. I expected a violation.

The technique sketched by Anton Ertl ought to work.

-marcel

Groetjes Albert
--
Don't praise the day before the evening. One swallow doesn't make spring.
You must not say "hey" before you have crossed the bridge. Don't sell the
hide of the bear until you shot it. Better one bird in the hand than ten in
the air. First gain is a cat spinning. - the Wise from Antrim -

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From none) (albert@21:1/5 to mhx@iae.nl on Mon Oct 23 16:22:49 2023

In article <b739e7b2-56ce-4020-ab84-e05735241725n@googlegroups.com>,
Marcel Hendrix <mhx@iae.nl> wrote:
<SNIP>

How do you generate native code with separate code (protected for write)
and data segments, given the assembler is written in Forth. I can't use
the standard !, C!, @, C@, C, and , to access the code segment.

Time to proceed to 64 bits, with its flat memory space.
In fact in the 32 bits era, it already was behind the times to
have separate data, code, stack, and extra segments.
Linus Torvalds could not be bothered. He wouldn't have started
Linux if he was obliged to.

-marcel

Groetjes Albert
--
Don't praise the day before the evening. One swallow doesn't make spring.
You must not say "hey" before you have crossed the bridge. Don't sell the
hide of the bear until you shot it. Better one bird in the hand than ten in
the air. First gain is a cat spinning. - the Wise from Antrim -

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Marcel Hendrix@21:1/5 to none albert on Mon Oct 23 07:51:50 2023

On Monday, October 23, 2023 at 4:23:22 PM UTC+2, none albert wrote:

In article <b739e7b2-56ce-4020...@googlegroups.com>,
Marcel Hendrix <m...@iae.nl> wrote:
<SNIP>

How do you generate native code with separate code (protected for write) >and data segments, given the assembler is written in Forth. I can't use
the standard !, C!, @, C@, C, and , to access the code segment.

Time to proceed to 64 bits, with its flat memory space.
In fact in the 32 bits era, it already was behind the times to
have separate data, code, stack, and extra segments.
Linus Torvalds could not be bothered. He wouldn't have started
Linux if he was obliged to.

iForth has a 64bit flat model and has no demonstrable slow-down problems
when data is close to code. The question above is in case segments are write-protected.

-marcel

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From none) (albert@21:1/5 to albert on Mon Oct 23 16:16:07 2023

In article <nnd$23e6e7e7$28df6727@b48c89f815d28223>,
none) (albert <albert@cherry.> wrote:

In article <53600953-77c8-466c-a1b2-3044388359e9n@googlegroups.com>,
Marcel Hendrix <mhx@iae.nl> wrote:

iForth is since long prepared for separated data and code, but I never enabled
it because I would mean introducing new/non-standard words for CREATE .. >>DOES> and , C, etc.. Maybe in next year's release.

I can't see that one has to introduce non-standard words.
Also the changes to CODE ENDCODE ;CODE doesn't seem to be
a bug deal either. But you are right, only do it, if it has
benefits.

I have done it. The benefits are general cleaner code and
a preparation in case we are in fact forced to separate for
the newest arm apple computers.

As you know the ciforth's are generated with one source file
regulated by macro's using m4. This is i86 and AMD only.
An addition for separating the code and data sections must
make the main builds for windows and linux, i.e. the following
tests must pass:
make testlina64
make testlina32
make testwina64
make testwina32

These are build with fasm, one of the four assemblers foreseen.
The additions are added to the gnu assembler version.
That is regulated by the same lina64.cfg control file, but
the target in the Makefile is .s.

The first step is the addition of a rule to prelude.m4
define( {_SEPARATED_}, _no)dnl
If you want a feature only with separated code and data you
can do
_SEPARATED({ POP BX _C{ GET INCREMENT} })
and it makes a difference only with
define( {_SEPARATED_}, _yes)dnl

Important to note that ciforth is a compiler factory.
It generates an assembler file, that is passed to the engineer,
comparable to the infamous FIGFORTH listings.
He is not even aware that a facility to separate existed in the
first place. For now the test are bound to pass,
because the assembler file manufactured has not even changed.

The modifications to a code definition as an example to DROP
(the fields are cdflnsx, code data flag link name source extra)
Prior the assumption was made that the code (first field) is
directly after the header.

In the generic file there is only:
CODE_HEADER({DROP},{DROP})
POP AX
_NEXT

DROP:
---- DQ DROP+HEADSIZE -
| DQ DROP+HEADSIZE
| DQ 0x0
| DQ OVER
| DQ N_DROP
| DQ 0
| DQ 0
|
------> POP RAX
LODSQ ; NEXT
JMP QWORD[RAX]

This must be changed to a proper label:

DROP:
DQ X_DROP
DQ DROP+HEADSIZE
DQ 0x0
DQ OVER
DQ N_DROP
DQ 0
DQ 0

X_DROP:
POP RAX
LODSQ ; NEXT
JMP QWORD[RAX]

Or the last part if _SEPARATED_

.section .forthx
POP RAX
LODSQ ; NEXT
JMP QWORD[RAX]
.section .forthd

IN the file header.m4 we changed the macro's
CODE_HEADER and _NEXT . That is all.
At the end of the _NEXT macro we add _DATA_ to switch to the
data segment.
At the end of CODE_HEADER macro we add _TEXT_ to switch to the
code segment.

Nobody knows how the _TEXT_ switching code looks like.
That is assembler dependant. Luckily that is separated
out in the gas.m4 file.

At the expense of few lines changed in the m4 files,
the bulk of the work is done.
There remains situations like the following where
an extra _TEXT_ has be inserted because _NEXT switches to
_DATA_ mode:

XCHG RPO,SPO _C{ GET PARAMETER STACK}
_NEXT
******* _TEXT_
QXDO1: MOV HIP,AX
_NEXT

This _TEXT_ can be added to the generic file. It does no
harm for the fasm compiler, as long as an appropriate _TEXT_
is defined in the fasm.m4 macro file. It does nothing.
At this stage all tests must pass.

In the next stage we handle the defining words.
They use R> to fill in the code field
because that points to the low level code following,
but that is not longer true in general.
The changes are not large. Replace R> by a literal such
as DOCOL. Don't forget put a _TEXT_ in front of the
DOCOL: label.

I have bored you enough. Bottom line there is now a gnu assembler version
of lina64 that has its code and data separate.
Unsurprisingly the difference in speed is unnoticeable. All code
in ciforth is in the innermost cache anyway.

And a last remark. Is the generic file now approaching its collapse
with all the edits?
Far from it. The changes made to the generic file add to the quality.
The R> trick for defining words saves a NEXT and a cell in one definition.
That could be worth it ... in the early 80's.
The (;CODE) word saves 2 cells in 5 cases and eats up a name, 2 cells,
and a header, 7 cells.
You can't use it without ;CODE and the documentation is awkward.
And of course it can't be used with separated sections.
;CODE and (;CODE) are eliminated, at the cost of 6 cells
in file size.

Now look what the definition of CONSTANT has become after
eliminating ;CODE
: CONSTANT NAME (CREATE) LATEST >DFA ! DOCON LATEST CFA ! ;
Get a name, use it to create a header, get a stack time to store
in the data field of the latest definition, store DOCON as the
code for the latest definitions.
The decompilation tools has become simpler because (;CODE) was
a weird exception.

-marcel

Groetjes Albert
--
Don't praise the day before the evening. One swallow doesn't make spring.
You must not say "hey" before you have crossed the bridge. Don't sell the
hide of the bear until you shot it. Better one bird in the hand than ten in
the air. First gain is a cat spinning. - the Wise from Antrim -

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From S Jack@21:1/5 to Marcel Hendrix on Wed Oct 25 05:18:43 2023

On Monday, October 23, 2023 at 9:51:53 AM UTC-5, Marcel Hendrix wrote:

when data is close to code. The question above is in case segments are write-protected.

-marcel

With assembler both name and attribute could be assigned to a section
(Recall having named a section "dictionary" long ago). Text and data
sections have default attributes but could be changed. Now days?
Of late used assembler option to override text read-only. Possibly
ELF has a bit that could be patched if option no longer provided.
--
me

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From none) (albert@21:1/5 to albert on Wed Oct 25 15:11:05 2023

In article <nnd$2c16c062$51f4c8be@ef6686bea0ed6640>,
none) (albert <albert@cherry.> wrote:

In article <nnd$23e6e7e7$28df6727@b48c89f815d28223>,
none) (albert <albert@cherry.> wrote:

In article <53600953-77c8-466c-a1b2-3044388359e9n@googlegroups.com>,
Marcel Hendrix <mhx@iae.nl> wrote:

iForth is since long prepared for separated data and code, but I never enabled
it because I would mean introducing new/non-standard words for CREATE .. >>>DOES> and , C, etc.. Maybe in next year's release.

I can't see that one has to introduce non-standard words.
Also the changes to CODE ENDCODE ;CODE doesn't seem to be
a bug deal either. But you are right, only do it, if it has
benefits.

I have done it. The benefits are general cleaner code and
a preparation in case we are in fact forced to separate for
the newest arm apple computers.

As you know the ciforth's are generated with one source file
regulated by macro's using m4. This is i86 and AMD only.
An addition for separating the code and data sections must
make the main builds for windows and linux, i.e. the following
tests must pass:
make testlina64
make testlina32
make testwina64
make testwina32

These are build with fasm, one of the four assemblers foreseen.
The additions are added to the gnu assembler version.
That is regulated by the same lina64.cfg control file, but
the target in the Makefile is .s.
define( {_SEPARATED_}, _yes)dnl

ci86.lina64.s --> glina64

The new executables have 3 sections (.forthx .forthd. and .dict.)
and I expected that the compilation options no longer worked
because it patches the elf header.
To my surprise
~/PROJECT/ciforths/ciforth: glina64 -c hellow.frt
~/PROJECT/ciforths/ciforth: hellow
Hello world!
~/PROJECT/ciforths/ciforth:

Turns out I removed the table with Sections. Probably it has to
be reinstated if the Operating System requires non-writable execution
sections.

Groetjes Albert
--
Don't praise the day before the evening. One swallow doesn't make spring.
You must not say "hey" before you have crossed the bridge. Don't sell the
hide of the bear until you shot it. Better one bird in the hand than ten in
the air. First gain is a cat spinning. - the Wise from Antrim -

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	300
Nodes:	16 (2 / 14)
Uptime:	53:30:02
Calls:	6,712
Files:	12,243
Messages:	5,355,268

Separating .text and .data segment in a assembler Forth

Who's Online

System Info