• Iterating over arrays

    From Anton Ertl@21:1/5 to All on Sat Jan 20 11:05:28 2024
    One way to loop over arrays is to loop over the array index values:

    ( addr u ) 0 ?do ( addr )
    dup i cells + @ ...
    loop
    drop

    A disadvantage of this approach is that one has to keep addr around.
    If you need to access other stack elements, this becomes annoying;
    e.g., if you want to produce the sum, minimum, and maximum of an array:

    ( addr u ) >r 0 max-n min-n r> 0 ?do ( addr sum max min )
    3 pick i cells + @
    tuck max >r
    tuck min >r
    + r> r>
    loop

    The loop body is a little bit better if I produces the address of the
    element, but now the prelude of the loop becomes bigger:

    ( addr u ) 2>r 0 max-n min-n 2r> cells over + swap ?do
    i @
    tuck max >r
    tuck min >r
    + r> r>
    1 cells +loop

    An additional advantage is that the loop body is shorter and probably
    faster.

    The longer prelude is annoying, especially in the case of walking
    backwards through an array (with U-DO..-LOOP), e.g., if we want to
    print out the numbers in the array backwards:

    ( addr u ) cells over + -1 cells + swap -1 cells + u-do
    i @ .
    1 cells -loop

    So in <2016Dec26.155852@mips.complang.tuwien.ac.at> I outlined:

    addr count stride A+DO ... A+LOOP \ i goes from addr...addr+(count-1)*stride addr count stride A-DO ... A-LOOP \ i goes from addr+(count-1)*stride...addr

    I have not implemented this yet, but have recently discussed it with
    Bernd Paysan, who pointed out that in "addr u" the u is not always the
    number of elements, but sometimes the number of bytes; so my current
    thinking is:

    ( addr count stride ) MEM+DO ... MEM+LOOP
    ( addr count stride ) MEM-DO ... MEM-LOOP

    And maybe have a word

    : ARRAY>MEM ( uelems uelemsize -- ubytes uelemsize )
    tuck * swap ;

    (Or maybe have a generic ANYLOOP (whatever it's name) that produces
    the right match for the ...DO)

    With that our examples would become:

    ( addr u ) 2>r 0 max-n min-n 2r> 1 cells array>mem mem+do
    i @
    tuck max >r
    tuck min >r
    + r> r>
    mem+loop

    ( addr u ) array>mem mem-do
    i @ .
    mem-loop

    (and maybe replace MEM+LOOP and MEM-LOOP with ANYLOOP).

    Opinions?

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2023: https://euro.theforth.net/2023

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Hans Bezemer on Sat Jan 20 17:23:03 2024
    Hans Bezemer <the.beez.speaks@gmail.com> writes:
    Opinions?

    [reformatted for conventional line lengths]

    I fail to see the need to generalize these. First, I (relatively)
    rarely seem to iterate over an entire array.

    MEM+DO and MEM-DO also work for parts of arrays and even for array
    slices (e.g., a diagonal in a matrix) as long as the stride is
    constant.

    In the largest public Forth program I wrote there are almost twice
    the number of BEGIN..REPEAT loops as DO..LOOPS

    In the Gforth image we have the following numbers of occurences:

    138 begin
    142 while
    108 repeat

    19 do
    73 ?do
    5 +do
    25 u+do
    2 -do
    5 u-do

    71 loop
    52 +loop
    7 -loop

    1 leave

    Looking at the occurences of U-DO, 4 out of the 5 look like obvious
    candidates for replacement by MEM-DO and 3 actually are (the fourth is
    the rare case of a data-dependent decrement). These 3 are:

    |libcc.fs:141:30: c-addr 1- c-addr u1 + 1- u-do
    |mwords.fs:34:43: cr words[] $@ bounds cell- swap cell- U-DO |locate1.fs:494:40: words[] $@ bounds cell- swap cell- U-DO

    They could be replaced with

    libcc.fs:141:30: c-addr u1 1 mem-do
    mwords.fs:34:43: cr words[] $@ cell mem-do
    locate1.fs:494:40: words[] $@ cell mem-do

    For MEM+DO the benefit is smaller, but if we have MEM-DO, why not also
    have MEM+DO?

    One major benefit of using DO..LOOP is that it gets the index and the
    limit out of the way, reducing the data stack load. The idiom of
    using the element address as DO..LOOP index has the additional
    advantage of getting the start address of the array out of the way,
    but it requires setup code before the ?DO-like word. Using MEM+DO and especially MEM-DO reduces that setup code.

    Second, I already have some generalized high order functions for
    mapping, reducing or FOREACHing complex structures. Can't say I used
    them much since creating them - because my DO..LOOPs are usually so
    simple, I can't see why I should be using them.

    That sounds like it would be a better idea to have do..loop variants
    with better support for dealing with arrays than these higher-order
    functions.

    Use case one: raw arrays
    - Set up the structure by address, #items and sizeof (item)

    16 constant #items
    32 constant /item
    #items /item * array items
    items #items /item * BOUNDS ?DO I ( do your thing) /item +LOOP

    That could be replaced with

    items #items /item array>mem mem+do i ( do your thing ) mem+loop

    Use case two: indexed arrays
    - Set up the structure by address, #items and sizeof (item)
    - Add a DOES> definition that returns the address

    16 constant #items
    32 constant /item
    #items /item * array items does> swap /item * + ;
    #items 0 ?DO I items ( do your thing) LOOP

    Yes, creating a named array and using that name in the loop body is a
    way to avoid the stack load of passing the start address of the array
    around in the indexing idiom, but the loop then becomes specific to
    that one array.

    Use case three: use TH (or equivalent) to do the indexing
    - TH only works with CELLS, but it is useful ( : TH CELLS + ; )

    TH is useful if you have the start address of the array as a stack
    item, but for a loop that walks the array, I prefer to avoid that.

    IMHO it is just some syntactic sugar - which won't speed up things anyway.

    Yes. The point is to make it easier and less error prone to write
    do..loops that uses the element addresses as index. And using that
    technique is usually faster than computing the element address from
    the start address and the index in each iteration.

    In that regard, I'd rather use FOREACH, since that one exposes the
    address anyway and gets rid of the dirty details as well:

    [: ( do your thing) ;] items #items /item FOREACH

    That corresponds to

    items #items /item mem+do i ( do your thing ) mem+loop

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2023: https://euro.theforth.net/2023

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From mhx@21:1/5 to All on Sat Jan 20 19:40:21 2024
    I was going to mention that a good (or possibly the only) reason to use iterators (like FOREACH) is when the data is organized in lists. However,
    on reflection, it is possible to avoid lists by adding an indexing array
    (at the cost of one indirection). Any idea if that would be significantly faster?

    The disadvantage of a simple array would be that it becomes too small or
    is too large (in case of lists the number of items is probably dynamic).

    -marcel

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to mhx on Sun Jan 21 15:36:57 2024
    mhx@iae.nl (mhx) writes:
    I was going to mention that a good (or possibly the only) reason to use >iterators (like FOREACH) is when the data is organized in lists. However,
    on reflection, it is possible to avoid lists by adding an indexing array
    (at the cost of one indirection). Any idea if that would be significantly >faster?

    Following a linked list costs 4-5 cycles per element on modern desktop
    CPUs even if the elements are in the D-cache, more if they are further
    away from the core.

    By contrast, dealing with the elements through an array of element
    addresses (I guess that's what you mean with "indexing array") does
    not introduce such a latency: if each element is processed independent
    of the others, you can process them arbitrarily in parallel. If there
    is some dependency in the processing of the elements, that dependency
    limits the processing.

    However, there is probably a reason why you used a linked list instead
    of an array. If you could easily do your processing with an array,
    you would have chosen the array in the first place.

    The disadvantage of a simple array would be that it becomes too small or
    is too large (in case of lists the number of items is probably dynamic).

    If the array can grow (and thus become too small), my usual approach
    is to ALLOCATE the array with a power-of-two number of elements, and
    RESIZE it to double size when that becomes too small.

    As for "too large", that's something I don't worry about. However,
    Bernd Paysan's string library (all the words with "$" in Gforth),
    which has grown into covering all kinds of dynamically allocated
    memory blocks, actually also uses RESIZE to shrink the memory blocks
    it administers. The reason for this is that the library only
    remembers the size of the current memory block in bytes, not also the
    size of the ALLOCATEd/RESIZEd block. The library computes the size of
    the ALLOCATEd block from the current memory block by rounding it up to
    the next power-of-two; when the next memory block size for an
    allocation results in a different power-of-to, the allocation is
    RESIZEd, whether that grows or shrinks.

    However, for a buffer where the contents can grow or shrink (or,
    typically, start a new content with a smaller size that then grows), I
    prefer not to shrink and to keep a separate BUFFER-MAXLENGTH field in
    the data structure describing it around. This means that the number
    of RESIZEs is determined by the maximum length that the buffer ever
    has.

    Back to linked lists: Typical applications are when you want to insert
    or delete something in the middle or want to have a parent-pointer
    tree (as in fig-Forth's vocabularies that all included the Forth
    vocabulary), or when you don't want to deal with dynamic memory
    allocation (as in wordlists in Forth implementations even without
    fig-Forth's common-ancestor property).

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2023: https://euro.theforth.net/2023

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to FFmike on Thu Jan 25 16:29:11 2024
    oh2aun@gmail.com (FFmike) writes:
    variable p inlined
    : p@ p @ ; inlined
    : p++ 2 p +! ; inlined
    : p@++ p@ p++ ; inlined
    : r>p r@ p ! ; inlined
    : !p>r p @ >r p ! ; inlined


    ( addr u ) swap !p>r
    for
    p@++ ...
    next r>p

    Of course the P words are coded efficiently in assembler.

    This reminds me of Bernd Paysan's OOF and Mini-OOF2, where he has a
    current object pointer that he explicitly sets with >O (similar to
    your !P>R), saving the old object pointer on the return stack, and he
    restores the old object pointer with O> (similar to your R>P). Most
    others (including me) prefer the setting of the object pointer
    automatically when a method is entered and restoring it automatically
    when the method is left, though.

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2023: https://euro.theforth.net/2023

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From none) (albert@21:1/5 to dxforth@gmail.com on Fri Jan 26 12:30:22 2024
    In article <uouvck$2ghkt$1@dont-email.me>, dxf <dxforth@gmail.com> wrote:
    On 26/01/2024 3:29 am, Anton Ertl wrote:
    oh2aun@gmail.com (FFmike) writes:
    variable p inlined
    : p@ p @ ; inlined
    : p++ 2 p +! ; inlined
    : p@++ p@ p++ ; inlined
    : r>p r@ p ! ; inlined
    : !p>r p @ >r p ! ; inlined


    ( addr u ) swap !p>r
    for
    p@++ ...
    next r>p

    Of course the P words are coded efficiently in assembler.

    This reminds me of Bernd Paysan's OOF and Mini-OOF2, where he has a
    current object pointer that he explicitly sets with >O (similar to
    your !P>R), saving the old object pointer on the return stack, and he
    restores the old object pointer with O> (similar to your R>P). Most
    others (including me) prefer the setting of the object pointer
    automatically when a method is entered and restoring it automatically
    when the method is left, though.

    Then there's Moore's address registers where saving/restoring is based
    on need rather than at every turn.

    And then is my oo. Whenever an object is invoked it is the current
    object from its class, until another object is selected. Thus you can
    use different objects at the same time and there is less hassle on the
    stack.
    Groetjes Albert

    --
    Don't praise the day before the evening. One swallow doesn't make spring.
    You must not say "hey" before you have crossed the bridge. Don't sell the
    hide of the bear until you shot it. Better one bird in the hand than ten in
    the air. First gain is a cat purring. - the Wise from Antrim -

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Anton Ertl on Sun Jan 28 12:15:37 2024
    anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
    ( addr count stride ) MEM+DO ... MEM+LOOP
    ( addr count stride ) MEM-DO ... MEM-LOOP

    And maybe have a word

    : ARRAY>MEM ( uelems uelemsize -- ubytes uelemsize )
    tuck * swap ;

    (Or maybe have a generic ANYLOOP (whatever it's name) that produces
    the right match for the ...DO)

    I decided to use LOOP to finish a MEM+DO or MEM-DO loop. I have now implemented this, and you can see possible uses as follows:

    create v 3 , 5 , 1 , -3 ,
    : test-mem-0 array>mem mem-do i @ . loop ;
    v 4 cell test-mem-0 \ output: -3 1 5 3
    : test-mem+0 array>mem mem+do i @ . loop ;
    v 4 cell test-mem+0 \ output: 3 5 1 -3

    Decompiling this with SEE TEST-MEM-0 shows (reformatted for
    readability):

    : test-mem-0
    array>mem dup negate >l - over + u-[do
    i @ .
    @local0 +loop
    lp+ ;

    So MEM-DO actually uses +LOOP; it includes the lower bound, just as is
    needed for this usage. I have added a matching U-[DO that includes
    the lower bound (while U-DO excludes it).

    The stride is stored on the locals stack in this example, so that J is unchanged, and any uses of J are also unaffected.

    A common usage is to have a known, constant stride. In this case the
    literal stack is used to produce better code:

    : test-mem-1 cell array>mem mem-do i @ . loop ;

    decompiles to:

    : test-mem-1
    cells #-8 + over + u-[do
    i @ .
    #-8 +loop ;

    The more constants we provide, the better, e.g.:

    : test-mem-3 v 4 cell array>mem mem-do i @ . loop ;

    decompiles to:

    : test-mem-3
    v <v+$18> u-[do
    i @ .
    #-8 +loop ;

    The benefit of MEM-DO and MEM+DO is that you can just call

    addr nelems +nelemsize ARRAY>MEM MEM-DO ... LOOP

    or

    addr nbytes +nelemsize MEM-DO ... LOOP

    or either variant with MEM+DO, and don't have to explicitly compute
    the start and limit for the U-[DO or U+DO and arrange them on the
    stack, and then repeat the stride in front of +LOOP. The result is
    fewer opportunities for mistakes and more readable code.

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2023: https://euro.theforth.net/2023

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)