• Separating .text and .data segment in a assembler Forth

    From none) (albert@21:1/5 to All on Wed Oct 18 16:34:47 2023
    ciforth is much a classical Forth. The headers are followed by
    high level code, machine code or data.

    Is there any experience in separating code and data using
    the text segment?
    [The text segment in Unix parlance is data that cannot be modified
    from within the program. ]
    Nowadays apparently Apple requires that all executable code resides in
    her text segment for the modern systems.
    This flies in the face of Forth facilities and
    artificial intelligence, e.g. it hinders just in time optimisation.
    (High level code is data, merely interpreted.)

    I'm interested in the problems encountered, and also if there
    is any benefit in speed. For example the code snippets of
    ciforth easily fits in the L1 cache and is also not near
    any modifiable data.

    ciforth can do this relatively easy, because it is indirect
    threaded. I can imagine that directly threaded, subroutine
    threaded code encounters even more difficulties.

    Groetjes Albert
    --
    Don't praise the day before the evening. One swallow doesn't make spring.
    You must not say "hey" before you have crossed the bridge. Don't sell the
    hide of the bear until you shot it. Better one bird in the hand than ten in
    the air. First gain is a cat spinning. - the Wise from Antrim -

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to albert@cherry. on Wed Oct 18 15:41:33 2023
    albert@cherry.(none) (albert) writes:
    ciforth is much a classical Forth. The headers are followed by
    high level code, machine code or data.

    Is there any experience in separating code and data using
    the text segment?

    In Gforth without dynamic code generation the native code is certainly
    in the text segment.

    Nowadays apparently Apple requires that all executable code resides in
    her text segment for the modern systems.

    I think what you mean is that MacOS on Apple Silicon puts additional restrictions on executable segments. Neither Linux on Apple Silicon
    nor MacOS on Intel have this "feature".

    For a traditional-style threaded-code system written in an assembler
    that supports text and data sections (i.e., every normal assembler,
    but usually not a Forth assembler), it's easy to satisfy this
    restriction. Just do something like

    .data
    ... # header for +
    .quad plus
    .text
    .balign 16
    plus:
    ... #code for +

    I'm interested in the problems encountered, and also if there
    is any benefit in speed.

    There has been a benefit in speed in separating native code from data
    for three decades on Intel CPUs (since the original Pentium), and I
    have first written about that here in 1995.

    E.g, on slide 12 of https://www.complang.tuwien.ac.at/anton/euroforth/ef09/papers/ertl-slides.pdf you see that bigForth performs worse than Gforth by more than a factor
    of 2 on brew and by a factor of 5 on cd16sim; gforth also performs
    slightly better than spf4 on fcp and slightly better than bigforth on
    lexex. I investigated this in the case of cd16sim, and found that it
    is due to bigforth mixing code and data; I suspect that the other
    cases are also due to mixing code and data, because those are all
    native-code systems that should normally outperform gforth.

    Many years after 1995 native-code systems like iForth, SwiftForth, and
    VFX still mix code and data, but they have worked around the
    performance problems by putting padding between code and data. My
    impression is that SwiftForth and VFX apparently only put as little
    padding there to paper over some performance problem of the data, and
    I saw performance problems with these systems several times. By
    contrast, I never saw it with iForth (but then I use it less
    frequently), and doing a test on iForth reveals that it uses $400
    bytes of padding:

    FORTH> create foo ok
    FORTH> : bar + ; ok
    FORTH> ' bar idis
    $10226840 : bar 488BC04883ED088F4500 H.@H.m..E. $1022684A pop rbx 5B [
    $1022684B pop rdi 5F _
    $1022684C lea rbx, [rdi rbx*1] qword
    488D1C1F H... $10226850 push rbx 53 S
    $10226851 ; 488B45004883C508FFE0 H.E.H.E..` ok FORTH> foo hex . 10226410 ok
    FORTH> here hex . 102268B0 ok
    FORTH> bla hex . 10226CC0 ok

    However, looking at the second-to-last line, I expect that we can
    still see a performance problem from code where the data does not
    start with a defining word, like (proof-of-concept):

    : foo 100000000 0 do 0 over ! loop drop ;
    here 0 ,
    foo

    ciforth can do this relatively easy, because it is indirect
    threaded. I can imagine that directly threaded, subroutine
    threaded code encounters even more difficulties.

    Certainly the way that direct threading was implemented in Gforth in
    the early days (up to and including 0.5 in 2000) was slow on the
    Pentium and later CPUs (but there was the option of indirect threaded
    code), and AFAIK is not supported on MacOS on Apple Silicon. Gforth
    then switched to hybrid direct/indirect threaded code:

    @InProceedings{ertl02,
    author = {M. Anton Ertl},
    title = {Threaded Code Variations and Optimizations (Extended
    Version)},
    booktitle = {Forth-Tagung 2002},
    year = {2002},
    address = {Garmisch-Partenkirchen},
    url = {http://www.complang.tuwien.ac.at/papers/ertl02.ps.gz},
    abstract = {Forth has been traditionally implemented as indirect
    threaded code, where the code for non-primitives is
    the code-field address of the word. To get the
    maximum benefit from combining sequences of
    primitives into superinstructions, the code produced
    for a non-primitive should be a primitive followed
    by a parameter (e.g., \code{lit} \emph{addr} for
    variables). This paper takes a look at the steps
    from a traditional threaded-code implementation to
    superinstructions, and at the size and speed effects
    of the various steps.\comment{It also compares these
    variants of Gforth to various other Forth
    implementations on contemporary machines.} The use
    of superinstructions gives speedups of up to a
    factor of 2 on large benchmarks on processors with
    branch target buffers, but requires more space for
    the primitives and the optimization tables, and also
    a little more space for the threaded code.}
    }

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2023: https://euro.theforth.net/2023

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marcel Hendrix@21:1/5 to Anton Ertl on Wed Oct 18 10:34:17 2023
    On Wednesday, October 18, 2023 at 7:06:16 PM UTC+2, Anton Ertl wrote:
    [..]
    However, looking at the second-to-last line, I expect that we can
    still see a performance problem from code where the data does not
    start with a defining word, like (proof-of-concept):

    FORTH> : foo 100000000 0 do 0 over ! loop drop ; ok
    FORTH> here 0 , ok
    [1]FORTH> ' foo idis
    $013CEAC0 : foo
    $013CEACA mov rcx, $05F5E100 d#
    $013CEAD1 xor rbx, rbx
    $013CEAD4 call (DO) offset NEAR
    $013CEADE nop
    $013CEADF nop
    $013CEAE0 mov [rbx] qword, 0 d#
    $013CEAE7 add [rbp 0 +] qword, 1 b#
    $013CEAEC add [rbp 8 +] qword, 1 b#
    $013CEAF1 jno $013CEAE0 offset NEAR
    $013CEAF7 add rbp, #24 b#
    $013CEAFB ;
    $013CEB05 nop
    $013CEB06 nop
    [1]FORTH> dup h. $013CEB70 ok
    [1]FORTH> foo ok
    FORTH> $013CEB70 ? 0 ok
    FORTH> see foo
    Flags: TOKENIZE, ANSI
    : foo 100000000 0 DO 0 OVER ! LOOP DROP ; ok
    ok
    FORTH> : test ( addr -- ) cr dup h. space timer-reset foo .elapsed ; ok FORTH> $013CEB70 test
    $013CEB70 0.037 seconds elapsed. ok
    FORTH> PAD test
    $013CF1B8 0.036 seconds elapsed. ok
    FORTH> PAD 4000 + aligned test
    $013D0158 0.038 seconds elapsed. ok

    Not in this case, at least. However, with a bit more cleverness it is possible to write data in a cached line of preceding code that really needs to execute (CREATE ... DOES> or ... [ 0 , ] ... ). ISTR that in the past I have used ALIGN
    once or twice to get rid of a real (or imagined) problem.

    iForth is since long prepared for separated data and code, but I never enabled it because I would mean introducing new/non-standard words for CREATE ..
    DOES> and , C, etc.. Maybe in next year's release.

    -marcel

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From none) (albert@21:1/5 to mhx@iae.nl on Wed Oct 18 20:36:58 2023
    In article <53600953-77c8-466c-a1b2-3044388359e9n@googlegroups.com>,
    Marcel Hendrix <mhx@iae.nl> wrote:

    iForth is since long prepared for separated data and code, but I never enabled >it because I would mean introducing new/non-standard words for CREATE .. >DOES> and , C, etc.. Maybe in next year's release.

    I can't see that one has to introduce non-standard words.
    Also the changes to CODE ENDCODE ;CODE doesn't seem to be
    a bug deal either. But you are right, only do it, if it has
    benefits.


    -marcel
    --
    Don't praise the day before the evening. One swallow doesn't make spring.
    You must not say "hey" before you have crossed the bridge. Don't sell the
    hide of the bear until you shot it. Better one bird in the hand than ten in
    the air. First gain is a cat spinning. - the Wise from Antrim -

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Marcel Hendrix on Wed Oct 18 19:03:01 2023
    Marcel Hendrix <mhx@iae.nl> writes:
    On Wednesday, October 18, 2023 at 7:06:16=E2=80=AFPM UTC+2, Anton Ertl wrot= >e:
    [..]
    However, looking at the second-to-last line, I expect that we can
    still see a performance problem from code where the data does not
    start with a defining word, like (proof-of-concept):

    FORTH> : foo 100000000 0 do 0 over ! loop drop ; ok
    FORTH> here 0 , ok
    [1]FORTH> ' foo idis
    $013CEAC0 : foo
    $013CEACA mov rcx, $05F5E100 d#
    $013CEAD1 xor rbx, rbx
    $013CEAD4 call (DO) offset NEAR
    $013CEADE nop
    $013CEADF nop
    $013CEAE0 mov [rbx] qword, 0 d#
    $013CEAE7 add [rbp 0 +] qword, 1 b#
    $013CEAEC add [rbp 8 +] qword, 1 b#
    $013CEAF1 jno $013CEAE0 offset NEAR
    $013CEAF7 add rbp, #24 b#
    $013CEAFB ;
    $013CEB05 nop
    $013CEB06 nop
    [1]FORTH> dup h. $013CEB70 ok
    [1]FORTH> foo ok
    FORTH> $013CEB70 ? 0 ok
    FORTH> see foo
    Flags: TOKENIZE, ANSI
    : foo 100000000 0 DO 0 OVER ! LOOP DROP ; ok
    ok
    FORTH> : test ( addr -- ) cr dup h. space timer-reset foo .elapsed ; = >ok
    FORTH> $013CEB70 test
    $013CEB70 0.037 seconds elapsed. ok
    FORTH> PAD test
    $013CF1B8 0.036 seconds elapsed. ok
    FORTH> PAD 4000 + aligned test
    $013D0158 0.038 seconds elapsed. ok

    I measure the following on a 4GHz Core i5-6600K:

    : foo 100000000 0 do 0 over ! loop drop ; ok
    here 0 , ok
    dup h. timer-reset foo .elapsed

    foo start loop end cell address time
    $10226000 $10226037 $102260B0 0.140s
    $102268C0 $102268FF $10226970 5.711s

    Why is the code longer in the second case? For some reason, it used a
    10-byte instruction to put $00000001:00000000 into rcx, while the
    first variant used a 7-byte instruction to put $05F5E100 into rcx.
    The rest seems to be due to alignment.

    Anyway, for the discussion at hand, if a loop ends close to the end of
    a cache line, prefetching is apparently aggressive enough to prefetch
    the next two cache lines into the I-cache (although the branch should
    be predicted to be taken in the normal case), and this causes the I/D-cache-ping-pong that causes the slowdowns.

    I played around with variations on the following:

    : foo ( xt addr -- ) 100000000 0 do over execute loop 2drop ;
    : bar
    $123456789abcdef over ! $12345678 over ! $1234567 over !
    0 over ! 1 over ! ;
    here 0 , constant addr
    addr h. cr
    ' bar addr timer-reset foo .elapsed
    ' bar idis
    bye

    The idea here is that the end of BAR is executed at every iteration,
    bringing the code and the data even closer than in the example above.
    But in this case I did not see the slowdown, even with BAR ending 1
    byte before the end of the cache line, and even if it ends at the end
    of a cache line.

    So, the padding you put after code is usually enough, but I found one
    case where it was not.

    iForth is since long prepared for separated data and code, but I never enab= >led
    it because I would mean introducing new/non-standard words for CREATE .. >DOES> and , C, etc.. Maybe in next year's release.

    I don't see why it should. Gforth keeps the native code elsewhere
    without such words.

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2023: https://euro.theforth.net/2023

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marcel Hendrix@21:1/5 to Anton Ertl on Wed Oct 18 14:18:51 2023
    On Wednesday, October 18, 2023 at 10:00:25 PM UTC+2, Anton Ertl wrote:
    Marcel Hendrix <m...@iae.nl> writes:
    On Wednesday, October 18, 2023 at 7:06:16 PM UTC+2, Anton Ertl wrote:
    [..]
    foo start loop end cell address time
    $10226000 $10226037 $102260B0 0.140s
    $102268C0 $102268FF $10226970 5.711s

    Why is the code longer in the second case? For some reason, it used a
    10-byte instruction to put $00000001:00000000 into rcx, while the
    first variant used a 7-byte instruction to put $05F5E100 into rcx.

    It seems you were in HEX, which means your second loop was ...
    decimal $0000000100000000 100000000 / .
    ... 42 times longer than the first loop. Therefore the ratio of timings was 5711 140 / .
    ... 42 which is no surprise.

    When generating code, iForth tries to use 32bit constants when possible,
    which explains the 4 byte size difference.
    [..]
    But in this case I did not see the slowdown, even with BAR ending 1
    byte before the end of the cache line, and even if it ends at the end
    of a cache line.

    So, the padding you put after code is usually enough, but I found one
    case where it was not.

    If you were in HEX, then you didn't :-)

    iForth is since long prepared for separated data and code, but I never enabled
    it because I would mean introducing new/non-standard words for CREATE .. >DOES> and , C, etc.. Maybe in next year's release.

    I don't see why it should. Gforth keeps the native code elsewhere
    without such words.

    How do you generate native code with separate code (protected for write)
    and data segments, given the assembler is written in Forth. I can't use
    the standard !, C!, @, C@, C, and , to access the code segment.

    -marcel

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Marcel Hendrix on Thu Oct 19 10:04:42 2023
    Marcel Hendrix <mhx@iae.nl> writes:
    On Wednesday, October 18, 2023 at 10:00:25=E2=80=AFPM UTC+2, Anton Ertl wro= >te:
    Marcel Hendrix <m...@iae.nl> writes:
    On Wednesday, October 18, 2023 at 7:06:16 PM UTC+2, Anton Ertl wrote:
    [..]
    foo start loop end cell address time
    $10226000 $10226037 $102260B0 0.140s
    $102268C0 $102268FF $10226970 5.711s

    Why is the code longer in the second case? For some reason, it used a
    10-byte instruction to put $00000001:00000000 into rcx, while the
    first variant used a 7-byte instruction to put $05F5E100 into rcx.

    It seems you were in HEX, which means your second loop was ...
    decimal $0000000100000000 100000000 / .=20
    ... 42 times longer than the first loop. Therefore the ratio of timings was >5711 140 / .=20
    ... 42 which is no surprise.

    Ah, yes, thank you. I remember falling into this trap with a
    benchmark three decades ago. In the meantime, we have added number
    prefixes, and I should make it my custom to write decimal numbers with
    a "#" prefix. Still, I would also like if we standardized words like
    HEX. (Gforth) or H. (iForth, SwiftForth). I guess I'll add H. to
    Gforth.

    For DEC. (Gforth, iForth) there is only one name in the systems I
    tested.

    iForth is since long prepared for separated data and code, but I never e= >nabled
    it because I would mean introducing new/non-standard words for CREATE ..
    DOES> and , C, etc.. Maybe in next year's release.

    I don't see why it should. Gforth keeps the native code elsewhere
    without such words.

    How do you generate native code with separate code (protected for write)=20 >and data segments, given the assembler is written in Forth. I can't use
    the standard !, C!, @, C@, C, and , to access the code segment.

    Why can you not use @ and C@?

    If the code area is write-protected, you cannot write any code there,
    not with ! C!, nor with "new/non-standard words". So you have to make
    at least the page(s) in the code area where your code is going to land writeable, maybe only during code generation (but I make the code RWX
    all the time, everything else is security theatre in a Forth system
    that allows the user to do anything the process can do anyway),
    and then ! C! work.

    Concerning C, ,: The way I would do it in development Gforth is to
    have a separate section, say, NATIVE-CODE (the name may be a little
    bit too verbose for constant usage, but good enough for the example
    below), and then do something like:

    \ now generate some native code
    [: ... c, ... c, ... , ... ;] native-code section-execute

    Each section has its own dictionary pointer, and SECTION-EXECUTE in
    the example above switches to the dictionary pointer of NATIVE-CODE,
    then executes the quotation (i.e., the C,s and , in the quotation
    append to the NATIVE-CODE section), and then switches back to the
    dictionary pointer of the section in use before. For implementing
    quotations a stack of native-code sections would be useful (avoids the
    need to branch around the code of the quotation).

    These features are not yet documented; my EuroForth 2016 paper <http://www.euroforth.org/ef16/papers/ertl-sections.pdf> explains the
    basic idea (including a section on separating code and data), but the
    current implementation differs from what is proposed in the paper. In particular, named sections do not form a section stack (we had no need
    yet), only the main section "Forth" has a stack (for quotations,
    strings and the like). OTOH, we have added words like SECTION-EXECUTE
    which are not described in the paper. You can find the current
    implementation in:

    http://git.savannah.gnu.org/cgit/gforth.git/tree/sections.fs http://git.savannah.gnu.org/cgit/gforth.git/tree/sections2.fs

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2023: https://euro.theforth.net/2023

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From none) (albert@21:1/5 to mhx@iae.nl on Fri Oct 20 00:58:48 2023
    In article <53600953-77c8-466c-a1b2-3044388359e9n@googlegroups.com>,
    Marcel Hendrix <mhx@iae.nl> wrote:
    On Wednesday, October 18, 2023 at 7:06:16 PM UTC+2, Anton Ertl wrote:
    [..]
    However, looking at the second-to-last line, I expect that we can
    still see a performance problem from code where the data does not
    start with a defining word, like (proof-of-concept):

    FORTH> : foo 100000000 0 do 0 over ! loop drop ; ok
    FORTH> here 0 , ok
    [1]FORTH> ' foo idis
    $013CEAC0 : foo
    $013CEACA mov rcx, $05F5E100 d#
    $013CEAD1 xor rbx, rbx
    $013CEAD4 call (DO) offset NEAR
    $013CEADE nop
    $013CEADF nop
    $013CEAE0 mov [rbx] qword, 0 d#
    $013CEAE7 add [rbp 0 +] qword, 1 b#
    $013CEAEC add [rbp 8 +] qword, 1 b#
    $013CEAF1 jno $013CEAE0 offset NEAR
    $013CEAF7 add rbp, #24 b#
    $013CEAFB ;
    $013CEB05 nop
    $013CEB06 nop
    [1]FORTH> dup h. $013CEB70 ok
    [1]FORTH> foo ok
    FORTH> $013CEB70 ? 0 ok
    FORTH> see foo
    Flags: TOKENIZE, ANSI
    : foo 100000000 0 DO 0 OVER ! LOOP DROP ; ok
    ok
    FORTH> : test ( addr -- ) cr dup h. space timer-reset foo .elapsed ; ok >FORTH> $013CEB70 test
    $013CEB70 0.037 seconds elapsed. ok
    FORTH> PAD test
    $013CF1B8 0.036 seconds elapsed. ok
    FORTH> PAD 4000 + aligned test
    $013D0158 0.038 seconds elapsed. ok

    Not in this case, at least. However, with a bit more cleverness it is possible >to write data in a cached line of preceding code that really needs to execute >(CREATE ... DOES> or ... [ 0 , ] ... ). ISTR that in the past I have
    used ALIGN
    once or twice to get rid of a real (or imagined) problem.

    iForth is since long prepared for separated data and code, but I never enabled >it because I would mean introducing new/non-standard words for CREATE .. >DOES> and , C, etc.. Maybe in next year's release.

    Experimenting with ciforth for AMD 64.
    I have done an experiment. Placed cold machine code in a text segment,
    that works. Moved the low level code of drop in that segment, that works.
    The linker (as it should) takes care of filling in the code field of DROP.

    The code of drop could be dumped by lowlevel tools.
    I was surprised that I could patch the code of DROP, that was supposedly
    in a read only segment. I expected a violation.

    The technique sketched by Anton Ertl ought to work.


    -marcel

    Groetjes Albert
    --
    Don't praise the day before the evening. One swallow doesn't make spring.
    You must not say "hey" before you have crossed the bridge. Don't sell the
    hide of the bear until you shot it. Better one bird in the hand than ten in
    the air. First gain is a cat spinning. - the Wise from Antrim -

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From none) (albert@21:1/5 to mhx@iae.nl on Mon Oct 23 16:22:49 2023
    In article <b739e7b2-56ce-4020-ab84-e05735241725n@googlegroups.com>,
    Marcel Hendrix <mhx@iae.nl> wrote:
    <SNIP>

    How do you generate native code with separate code (protected for write)
    and data segments, given the assembler is written in Forth. I can't use
    the standard !, C!, @, C@, C, and , to access the code segment.

    Time to proceed to 64 bits, with its flat memory space.
    In fact in the 32 bits era, it already was behind the times to
    have separate data, code, stack, and extra segments.
    Linus Torvalds could not be bothered. He wouldn't have started
    Linux if he was obliged to.


    -marcel

    Groetjes Albert
    --
    Don't praise the day before the evening. One swallow doesn't make spring.
    You must not say "hey" before you have crossed the bridge. Don't sell the
    hide of the bear until you shot it. Better one bird in the hand than ten in
    the air. First gain is a cat spinning. - the Wise from Antrim -

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marcel Hendrix@21:1/5 to none albert on Mon Oct 23 07:51:50 2023
    On Monday, October 23, 2023 at 4:23:22 PM UTC+2, none albert wrote:
    In article <b739e7b2-56ce-4020...@googlegroups.com>,
    Marcel Hendrix <m...@iae.nl> wrote:
    <SNIP>

    How do you generate native code with separate code (protected for write) >and data segments, given the assembler is written in Forth. I can't use
    the standard !, C!, @, C@, C, and , to access the code segment.

    Time to proceed to 64 bits, with its flat memory space.
    In fact in the 32 bits era, it already was behind the times to
    have separate data, code, stack, and extra segments.
    Linus Torvalds could not be bothered. He wouldn't have started
    Linux if he was obliged to.

    iForth has a 64bit flat model and has no demonstrable slow-down problems
    when data is close to code. The question above is in case segments are write-protected.

    -marcel

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From none) (albert@21:1/5 to albert on Mon Oct 23 16:16:07 2023
    In article <nnd$23e6e7e7$28df6727@b48c89f815d28223>,
    none) (albert <albert@cherry.> wrote:
    In article <53600953-77c8-466c-a1b2-3044388359e9n@googlegroups.com>,
    Marcel Hendrix <mhx@iae.nl> wrote:

    iForth is since long prepared for separated data and code, but I never enabled
    it because I would mean introducing new/non-standard words for CREATE .. >>DOES> and , C, etc.. Maybe in next year's release.

    I can't see that one has to introduce non-standard words.
    Also the changes to CODE ENDCODE ;CODE doesn't seem to be
    a bug deal either. But you are right, only do it, if it has
    benefits.

    I have done it. The benefits are general cleaner code and
    a preparation in case we are in fact forced to separate for
    the newest arm apple computers.

    As you know the ciforth's are generated with one source file
    regulated by macro's using m4. This is i86 and AMD only.
    An addition for separating the code and data sections must
    make the main builds for windows and linux, i.e. the following
    tests must pass:
    make testlina64
    make testlina32
    make testwina64
    make testwina32

    These are build with fasm, one of the four assemblers foreseen.
    The additions are added to the gnu assembler version.
    That is regulated by the same lina64.cfg control file, but
    the target in the Makefile is .s.

    The first step is the addition of a rule to prelude.m4
    define( {_SEPARATED_}, _no)dnl
    If you want a feature only with separated code and data you
    can do
    _SEPARATED({ POP BX _C{ GET INCREMENT} })
    and it makes a difference only with
    define( {_SEPARATED_}, _yes)dnl

    Important to note that ciforth is a compiler factory.
    It generates an assembler file, that is passed to the engineer,
    comparable to the infamous FIGFORTH listings.
    He is not even aware that a facility to separate existed in the
    first place. For now the test are bound to pass,
    because the assembler file manufactured has not even changed.

    The modifications to a code definition as an example to DROP
    (the fields are cdflnsx, code data flag link name source extra)
    Prior the assumption was made that the code (first field) is
    directly after the header.

    In the generic file there is only:
    CODE_HEADER({DROP},{DROP})
    POP AX
    _NEXT

    DROP:
    ---- DQ DROP+HEADSIZE -
    | DQ DROP+HEADSIZE
    | DQ 0x0
    | DQ OVER
    | DQ N_DROP
    | DQ 0
    | DQ 0
    |
    ------> POP RAX
    LODSQ ; NEXT
    JMP QWORD[RAX]

    This must be changed to a proper label:

    DROP:
    DQ X_DROP
    DQ DROP+HEADSIZE
    DQ 0x0
    DQ OVER
    DQ N_DROP
    DQ 0
    DQ 0

    X_DROP:
    POP RAX
    LODSQ ; NEXT
    JMP QWORD[RAX]

    Or the last part if _SEPARATED_

    .section .forthx
    POP RAX
    LODSQ ; NEXT
    JMP QWORD[RAX]
    .section .forthd


    IN the file header.m4 we changed the macro's
    CODE_HEADER and _NEXT . That is all.
    At the end of the _NEXT macro we add _DATA_ to switch to the
    data segment.
    At the end of CODE_HEADER macro we add _TEXT_ to switch to the
    code segment.

    Nobody knows how the _TEXT_ switching code looks like.
    That is assembler dependant. Luckily that is separated
    out in the gas.m4 file.

    At the expense of few lines changed in the m4 files,
    the bulk of the work is done.
    There remains situations like the following where
    an extra _TEXT_ has be inserted because _NEXT switches to
    _DATA_ mode:

    XCHG RPO,SPO _C{ GET PARAMETER STACK}
    _NEXT
    ******* _TEXT_
    QXDO1: MOV HIP,AX
    _NEXT

    This _TEXT_ can be added to the generic file. It does no
    harm for the fasm compiler, as long as an appropriate _TEXT_
    is defined in the fasm.m4 macro file. It does nothing.
    At this stage all tests must pass.

    In the next stage we handle the defining words.
    They use R> to fill in the code field
    because that points to the low level code following,
    but that is not longer true in general.
    The changes are not large. Replace R> by a literal such
    as DOCOL. Don't forget put a _TEXT_ in front of the
    DOCOL: label.

    I have bored you enough. Bottom line there is now a gnu assembler version
    of lina64 that has its code and data separate.
    Unsurprisingly the difference in speed is unnoticeable. All code
    in ciforth is in the innermost cache anyway.

    And a last remark. Is the generic file now approaching its collapse
    with all the edits?
    Far from it. The changes made to the generic file add to the quality.
    The R> trick for defining words saves a NEXT and a cell in one definition.
    That could be worth it ... in the early 80's.
    The (;CODE) word saves 2 cells in 5 cases and eats up a name, 2 cells,
    and a header, 7 cells.
    You can't use it without ;CODE and the documentation is awkward.
    And of course it can't be used with separated sections.
    ;CODE and (;CODE) are eliminated, at the cost of 6 cells
    in file size.

    Now look what the definition of CONSTANT has become after
    eliminating ;CODE
    : CONSTANT NAME (CREATE) LATEST >DFA ! DOCON LATEST CFA ! ;
    Get a name, use it to create a header, get a stack time to store
    in the data field of the latest definition, store DOCON as the
    code for the latest definitions.
    The decompilation tools has become simpler because (;CODE) was
    a weird exception.

    -marcel

    Groetjes Albert
    --
    Don't praise the day before the evening. One swallow doesn't make spring.
    You must not say "hey" before you have crossed the bridge. Don't sell the
    hide of the bear until you shot it. Better one bird in the hand than ten in
    the air. First gain is a cat spinning. - the Wise from Antrim -

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From S Jack@21:1/5 to Marcel Hendrix on Wed Oct 25 05:18:43 2023
    On Monday, October 23, 2023 at 9:51:53 AM UTC-5, Marcel Hendrix wrote:

    when data is close to code. The question above is in case segments are write-protected.

    -marcel

    With assembler both name and attribute could be assigned to a section
    (Recall having named a section "dictionary" long ago). Text and data
    sections have default attributes but could be changed. Now days?
    Of late used assembler option to override text read-only. Possibly
    ELF has a bit that could be patched if option no longer provided.
    --
    me

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From none) (albert@21:1/5 to albert on Wed Oct 25 15:11:05 2023
    In article <nnd$2c16c062$51f4c8be@ef6686bea0ed6640>,
    none) (albert <albert@cherry.> wrote:
    In article <nnd$23e6e7e7$28df6727@b48c89f815d28223>,
    none) (albert <albert@cherry.> wrote:
    In article <53600953-77c8-466c-a1b2-3044388359e9n@googlegroups.com>,
    Marcel Hendrix <mhx@iae.nl> wrote:

    iForth is since long prepared for separated data and code, but I never enabled
    it because I would mean introducing new/non-standard words for CREATE .. >>>DOES> and , C, etc.. Maybe in next year's release.

    I can't see that one has to introduce non-standard words.
    Also the changes to CODE ENDCODE ;CODE doesn't seem to be
    a bug deal either. But you are right, only do it, if it has
    benefits.

    I have done it. The benefits are general cleaner code and
    a preparation in case we are in fact forced to separate for
    the newest arm apple computers.

    As you know the ciforth's are generated with one source file
    regulated by macro's using m4. This is i86 and AMD only.
    An addition for separating the code and data sections must
    make the main builds for windows and linux, i.e. the following
    tests must pass:
    make testlina64
    make testlina32
    make testwina64
    make testwina32

    These are build with fasm, one of the four assemblers foreseen.
    The additions are added to the gnu assembler version.
    That is regulated by the same lina64.cfg control file, but
    the target in the Makefile is .s.
    define( {_SEPARATED_}, _yes)dnl

    ci86.lina64.s --> glina64

    The new executables have 3 sections (.forthx .forthd. and .dict.)
    and I expected that the compilation options no longer worked
    because it patches the elf header.
    To my surprise
    ~/PROJECT/ciforths/ciforth: glina64 -c hellow.frt
    ~/PROJECT/ciforths/ciforth: hellow
    Hello world!
    ~/PROJECT/ciforths/ciforth:

    Turns out I removed the table with Sections. Probably it has to
    be reinstated if the Operating System requires non-writable execution
    sections.

    Groetjes Albert
    --
    Don't praise the day before the evening. One swallow doesn't make spring.
    You must not say "hey" before you have crossed the bridge. Don't sell the
    hide of the bear until you shot it. Better one bird in the hand than ten in
    the air. First gain is a cat spinning. - the Wise from Antrim -

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)