• This is why ALLOCATE + SIZE is superior

    From none) (albert@21:1/5 to All on Mon Sep 4 14:14:43 2023
    If you implement ALLOCATE yourself it is easy to incorporate SIZE :

    1000 ALLOCATE THROW SIZE .
    1000 OK

    I have done that in my ALLOCATE and reportedly Hugh has done that
    also. Maybe he even came up with that name.

    You can now define :

    \ To an allocated vector append an vector . Return new vector .
    : vector-concat OVER SIZE DUP >R OVER SIZE + >R
    SWAP R> RESIZE THROW ( first vector is now enlarged ) ( l e )
    2DUP R> + OVER SIZE CMOVE NIP ;

    This is applicable to vectors of cells or characters alike.
    If you ALLOCATE your strings you can define
    'vector-concat ALIAS $+

    Groetjes Albert
    --
    Don't praise the day before the evening. One swallow doesn't make spring.
    You must not say "hey" before you have crossed the bridge. Don't sell the
    hide of the bear until you shot it. Better one bird in the hand than ten in
    the air. First gain is a cat spinning. - the Wise from Antrim -

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From minforth@21:1/5 to none albert on Tue Sep 5 01:43:50 2023
    none albert schrieb am Montag, 4. September 2023 um 14:14:48 UTC+2:
    If you implement ALLOCATE yourself it is easy to incorporate SIZE :

    1000 ALLOCATE THROW SIZE .
    1000 OK

    I have done that in my ALLOCATE and reportedly Hugh has done that
    also. Maybe he even came up with that name.

    You can now define :

    \ To an allocated vector append an vector . Return new vector .
    : vector-concat OVER SIZE DUP >R OVER SIZE + >R
    SWAP R> RESIZE THROW ( first vector is now enlarged ) ( l e )
    2DUP R> + OVER SIZE CMOVE NIP ;

    This is applicable to vectors of cells or characters alike.
    If you ALLOCATE your strings you can define
    'vector-concat ALIAS $+

    What is so special here? There are zillion ways to manage heap objects
    incl. resizing.

    One could use a big os-preallocated heap and do allocation/resizing of
    Forth objects within that heap without recalling further os functions. But I remember reading an essay saying that modern os allocation functions belong
    to the most heavily optimized functions anyhow. Which means that it
    can be hard to beat their performance.

    Of course in bare metal embedded devices things look different.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From none) (albert@21:1/5 to minforth@arcor.de on Tue Sep 5 12:07:34 2023
    In article <e29c5b36-128e-43c6-94f7-5ca5a0f078aen@googlegroups.com>,
    minforth <minforth@arcor.de> wrote:
    none albert schrieb am Montag, 4. September 2023 um 14:14:48 UTC+2:
    If you implement ALLOCATE yourself it is easy to incorporate SIZE :

    1000 ALLOCATE THROW SIZE .
    1000 OK

    I have done that in my ALLOCATE and reportedly Hugh has done that
    also. Maybe he even came up with that name.

    You can now define :

    \ To an allocated vector append an vector . Return new vector .
    : vector-concat OVER SIZE DUP >R OVER SIZE + >R
    SWAP R> RESIZE THROW ( first vector is now enlarged ) ( l e )
    2DUP R> + OVER SIZE CMOVE NIP ;

    This is applicable to vectors of cells or characters alike.
    If you ALLOCATE your strings you can define
    'vector-concat ALIAS $+

    What is so special here? There are zillion ways to manage heap objects
    incl. resizing.

    One could use a big os-preallocated heap and do allocation/resizing of
    Forth objects within that heap without recalling further os functions. But I >remember reading an essay saying that modern os allocation functions belong >to the most heavily optimized functions anyhow. Which means that it
    can be hard to beat their performance.

    Of course in bare metal embedded devices things look different.

    What are you talking about? Optimization is the last thing on my mind.
    Ease of programming is. Traditional malloc don't allow you to
    retrive size.

    Groetjes Albert
    --
    Don't praise the day before the evening. One swallow doesn't make spring.
    You must not say "hey" before you have crossed the bridge. Don't sell the
    hide of the bear until you shot it. Better one bird in the hand than ten in
    the air. First gain is a cat spinning. - the Wise from Antrim -

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From minforth@21:1/5 to none albert on Tue Sep 5 04:29:18 2023
    none albert schrieb am Dienstag, 5. September 2023 um 12:07:37 UTC+2:
    In article <e29c5b36-128e-43c6...@googlegroups.com>,
    minforth <minf...@arcor.de> wrote:
    none albert schrieb am Montag, 4. September 2023 um 14:14:48 UTC+2:
    If you implement ALLOCATE yourself it is easy to incorporate SIZE :

    1000 ALLOCATE THROW SIZE .
    1000 OK

    I have done that in my ALLOCATE and reportedly Hugh has done that
    also. Maybe he even came up with that name.

    You can now define :

    \ To an allocated vector append an vector . Return new vector .
    : vector-concat OVER SIZE DUP >R OVER SIZE + >R
    SWAP R> RESIZE THROW ( first vector is now enlarged ) ( l e )
    2DUP R> + OVER SIZE CMOVE NIP ;

    This is applicable to vectors of cells or characters alike.
    If you ALLOCATE your strings you can define
    'vector-concat ALIAS $+

    What is so special here? There are zillion ways to manage heap objects >incl. resizing.

    One could use a big os-preallocated heap and do allocation/resizing of >Forth objects within that heap without recalling further os functions. But I >remember reading an essay saying that modern os allocation functions belong >to the most heavily optimized functions anyhow. Which means that it
    can be hard to beat their performance.

    Of course in bare metal embedded devices things look different.
    What are you talking about? Optimization is the last thing on my mind.
    Ease of programming is. Traditional malloc don't allow you to
    retrive size.

    Okay. malloc/resize have the desired size as input argument, so it is already known.
    When they fail, they return an error indicator.
    When they succeed, the actually allocated size can be larger (eg by memory page granularity).
    Did I overlook something?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From none) (albert@21:1/5 to minforth@arcor.de on Wed Sep 6 11:22:31 2023
    In article <96c68e49-4de7-4538-885f-838692d3d5fdn@googlegroups.com>,
    minforth <minforth@arcor.de> wrote:
    none albert schrieb am Dienstag, 5. September 2023 um 12:07:37 UTC+2:
    In article <e29c5b36-128e-43c6...@googlegroups.com>,
    minforth <minf...@arcor.de> wrote:
    none albert schrieb am Montag, 4. September 2023 um 14:14:48 UTC+2:
    If you implement ALLOCATE yourself it is easy to incorporate SIZE :

    1000 ALLOCATE THROW SIZE .
    1000 OK

    I have done that in my ALLOCATE and reportedly Hugh has done that
    also. Maybe he even came up with that name.

    You can now define :

    \ To an allocated vector append an vector . Return new vector .
    : vector-concat OVER SIZE DUP >R OVER SIZE + >R
    SWAP R> RESIZE THROW ( first vector is now enlarged ) ( l e )
    2DUP R> + OVER SIZE CMOVE NIP ;

    This is applicable to vectors of cells or characters alike.
    If you ALLOCATE your strings you can define
    'vector-concat ALIAS $+

    What is so special here? There are zillion ways to manage heap objects
    incl. resizing.

    One could use a big os-preallocated heap and do allocation/resizing of
    Forth objects within that heap without recalling further os functions. But I
    remember reading an essay saying that modern os allocation functions belong >> >to the most heavily optimized functions anyhow. Which means that it
    can be hard to beat their performance.

    Of course in bare metal embedded devices things look different.
    What are you talking about? Optimization is the last thing on my mind.
    Ease of programming is. Traditional malloc don't allow you to
    retrive size.

    Okay. malloc/resize have the desired size as input argument, so it is
    already known.
    When they fail, they return an error indicator.
    When they succeed, the actually allocated size can be larger (eg by memory >page granularity).
    Did I overlook something?

    A really substantial hassle. You need to keep track of the sizes.
    E.g. concatenate gets 4 stack items instead of two.

    Groetjes Albert
    --
    Don't praise the day before the evening. One swallow doesn't make spring.
    You must not say "hey" before you have crossed the bridge. Don't sell the
    hide of the bear until you shot it. Better one bird in the hand than ten in
    the air. First gain is a cat spinning. - the Wise from Antrim -

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From minforth@21:1/5 to none albert on Wed Sep 6 12:07:24 2023
    none albert schrieb am Mittwoch, 6. September 2023 um 11:22:34 UTC+2:
    In article <96c68e49-4de7-4538...@googlegroups.com>,
    minforth <minf...@arcor.de> wrote:
    Did I overlook something?

    A really substantial hassle. You need to keep track of the sizes.
    E.g. concatenate gets 4 stack items instead of two.

    I see, stack juggling is evil

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Hans Bezemer@21:1/5 to none albert on Wed Sep 6 15:23:35 2023
    On Monday, September 4, 2023 at 2:14:48 PM UTC+2, none albert wrote:
    If you implement ALLOCATE yourself it is easy to incorporate SIZE :

    1000 ALLOCATE THROW SIZE .
    1000 OK

    I have done that in my ALLOCATE and reportedly Hugh has done that
    also. Maybe he even came up with that name.

    You can now define :

    \ To an allocated vector append an vector . Return new vector .
    : vector-concat OVER SIZE DUP >R OVER SIZE + >R
    SWAP R> RESIZE THROW ( first vector is now enlarged ) ( l e )
    2DUP R> + OVER SIZE CMOVE NIP ;

    This is applicable to vectors of cells or characters alike.
    If you ALLOCATE your strings you can define
    'vector-concat ALIAS $+

    Groetjes Albert
    --
    Don't praise the day before the evening. One swallow doesn't make spring. You must not say "hey" before you have crossed the bridge. Don't sell the hide of the bear until you shot it. Better one bird in the hand than ten in the air. First gain is a cat spinning. - the Wise from Antrim -
    4tH has got it too, but it is called ALLOCATED. Like ERROR I felt SIZE was
    too likely to be used in an application program. And since it isn't standardized
    I kept it out of harms way. 4tH doesn't allow implicit redefinition.

    Hans Bezemer

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to minforth on Thu Sep 7 07:21:45 2023
    minforth <minforth@arcor.de> writes:
    But I
    remember reading an essay saying that modern os allocation functions belong >to the most heavily optimized functions anyhow. Which means that it
    can be hard to beat their performance.

    I have read that many times over the decades. And then I did the
    measurements for Figure 14 of
    <http://euroforth.org/ef17/papers/ertl.pdf>, and found those funny
    kinky lines; I believe they come from the ALLOCATE and FREE calls
    (which used glibc's malloc() and free() implementations). So in
    further work <http://euroforth.org/ef18/papers/ertl-chaining.pdf> I
    implemented a cache of freed vectors. Some time later a new version
    of glibc appeared, and the announcement claimed to reduce the overhead
    of thread synchronization (which apparently also slows down
    single-threaded programs like my benchmarks) of earlier versions by
    using a per-thread cache of freed memory areas.

    So maybe the allocation functions have been heavily optimized at least
    since Doug Lea published his allocator. This does not mean that we
    are close to the optimum.

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2023: https://euro.theforth.net/2023

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From minforth@21:1/5 to Anton Ertl on Thu Sep 7 01:00:51 2023
    Anton Ertl schrieb am Donnerstag, 7. September 2023 um 09:40:13 UTC+2:
    minforth <minf...@arcor.de> writes:
    But I
    remember reading an essay saying that modern os allocation functions belong >to the most heavily optimized functions anyhow. Which means that it
    can be hard to beat their performance.
    I have read that many times over the decades. And then I did the
    measurements for Figure 14 of
    <http://euroforth.org/ef17/papers/ertl.pdf>, and found those funny
    kinky lines; I believe they come from the ALLOCATE and FREE calls
    (which used glibc's malloc() and free() implementations). So in
    further work <http://euroforth.org/ef18/papers/ertl-chaining.pdf> I implemented a cache of freed vectors. Some time later a new version
    of glibc appeared, and the announcement claimed to reduce the overhead
    of thread synchronization (which apparently also slows down
    single-threaded programs like my benchmarks) of earlier versions by
    using a per-thread cache of freed memory areas.

    So maybe the allocation functions have been heavily optimized at least
    since Doug Lea published his allocator. This does not mean that we
    are close to the optimum.

    Obviously. I am guessing that Linux is still mostly a server OS and thus optimizations are more targeted towards running huge amounts of parallel jobs on multi-CPU machines (eg K8s clusters). Not really Forth's domain.

    But this is not my field of expertise. For example I really can't estimate whether
    your benchmark would show different results between a Linux enterprise or desktop variant of the same distribution, or if they all use the same kernel module.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From none) (albert@21:1/5 to Anton Ertl on Thu Sep 7 11:39:39 2023
    In article <2023Sep7.092145@mips.complang.tuwien.ac.at>,
    Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
    minforth <minforth@arcor.de> writes:
    But I
    remember reading an essay saying that modern os allocation functions belong >>to the most heavily optimized functions anyhow. Which means that it
    can be hard to beat their performance.

    I have read that many times over the decades. And then I did the >measurements for Figure 14 of
    <http://euroforth.org/ef17/papers/ertl.pdf>, and found those funny
    kinky lines; I believe they come from the ALLOCATE and FREE calls
    (which used glibc's malloc() and free() implementations). So in
    further work <http://euroforth.org/ef18/papers/ertl-chaining.pdf> I >implemented a cache of freed vectors. Some time later a new version
    of glibc appeared, and the announcement claimed to reduce the overhead
    of thread synchronization (which apparently also slows down
    single-threaded programs like my benchmarks) of earlier versions by
    using a per-thread cache of freed memory areas.

    So maybe the allocation functions have been heavily optimized at least
    since Doug Lea published his allocator. This does not mean that we
    are close to the optimum.

    Then there is the notion that allocate/free is one-size fits-all
    approach. There are always circumstances that a strategy that will
    benefit some applications and will harm others.

    That opens an opportunity for Forth to adapt a simple ALLOCATE/FREE
    for the situation at hand. It is not so simple to customize glib's allocate/free.


    - anton
    Groetjes Albert
    --
    Don't praise the day before the evening. One swallow doesn't make spring.
    You must not say "hey" before you have crossed the bridge. Don't sell the
    hide of the bear until you shot it. Better one bird in the hand than ten in
    the air. First gain is a cat spinning. - the Wise from Antrim -

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marcel Hendrix@21:1/5 to none albert on Thu Sep 7 10:33:33 2023
    On Thursday, September 7, 2023 at 11:39:42 AM UTC+2, none albert wrote:
    [..]
    That opens an opportunity for Forth to adapt a simple ALLOCATE/FREE
    for the situation at hand. It is not so simple to customize glib's allocate/free.

    Most programs I know ( e.g. NGSPICE, FFTW, NRC, ...) add something to it.
    In iForth I found it useful to always align on a 256 byte boundary (for AVX) and make malloc do zeroing. I guess by now the C compilers make that an unnecessary addition.

    -marcel

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to minforth on Fri Sep 8 07:58:33 2023
    minforth <minforth@arcor.de> writes:
    Anton Ertl schrieb am Donnerstag, 7. September 2023 um 09:40:13 UTC+2:
    So maybe the allocation functions have been heavily optimized at least
    since Doug Lea published his allocator. This does not mean that we
    are close to the optimum.

    Obviously. I am guessing that Linux is still mostly a server OS and thus >optimizations are more targeted towards running huge amounts of parallel jobs >on multi-CPU machines (eg K8s clusters).

    If that was true, what would be its relevance for the topic at hand?

    Not really Forth's domain.

    Really? Forth did multi-user support before Linux even existed.
    Forth did clusters (e.g., in Riad airport, but also in the particle
    accelerator in Munich) also before Linux existed.

    But this is not my field of expertise. For example I really can't estimate whether
    your benchmark would show different results between a Linux enterprise or >desktop variant of the same distribution, or if they all use the same kernel module.

    Kernel modules don't come into play (at least if the problem was
    really malloc()/free()). malloc() and free() are not implemented by
    the kernel, but by libc. I used glibc, the same libc implementation
    that most distributions, including server distributions use.

    The typical difference between the server and desktop version of a
    distribution is that the desktop version provides more software, but
    the software provided in the server version is supported longer.

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2023: https://euro.theforth.net/2023

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to albert@cherry. on Fri Sep 8 08:36:31 2023
    albert@cherry.(none) (albert) writes:
    That opens an opportunity for Forth to adapt a simple ALLOCATE/FREE
    for the situation at hand. It is not so simple to customize glib's >allocate/free.

    I have often read about custom allocators used in C programs, and less
    often in Forth programs (probably because Forth fans still often try
    to do it all with static allocation).

    The goal of optimizations of malloc() and free() is to make such extra
    effort unnecessary, but at least the version (in glibc-2.24 or so) that
    I used in 2017 obviously had not achieved this goal.

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2023: https://euro.theforth.net/2023

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)