Forum: >>> Magnum BBS <<<

This is why ALLOCATE + SIZE is superior

From none) (albert@21:1/5 to All on Mon Sep 4 14:14:43 2023

If you implement ALLOCATE yourself it is easy to incorporate SIZE :

1000 ALLOCATE THROW SIZE .
1000 OK

I have done that in my ALLOCATE and reportedly Hugh has done that
also. Maybe he even came up with that name.

You can now define :

\ To an allocated vector append an vector . Return new vector .
: vector-concat OVER SIZE DUP >R OVER SIZE + >R
SWAP R> RESIZE THROW ( first vector is now enlarged ) ( l e )
2DUP R> + OVER SIZE CMOVE NIP ;

This is applicable to vectors of cells or characters alike.
If you ALLOCATE your strings you can define
'vector-concat ALIAS $+

Groetjes Albert
--
Don't praise the day before the evening. One swallow doesn't make spring.
You must not say "hey" before you have crossed the bridge. Don't sell the
hide of the bear until you shot it. Better one bird in the hand than ten in
the air. First gain is a cat spinning. - the Wise from Antrim -

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From minforth@21:1/5 to none albert on Tue Sep 5 01:43:50 2023

none albert schrieb am Montag, 4. September 2023 um 14:14:48 UTC+2:

If you implement ALLOCATE yourself it is easy to incorporate SIZE :

1000 ALLOCATE THROW SIZE .
1000 OK

I have done that in my ALLOCATE and reportedly Hugh has done that
also. Maybe he even came up with that name.

You can now define :

\ To an allocated vector append an vector . Return new vector .
: vector-concat OVER SIZE DUP >R OVER SIZE + >R
SWAP R> RESIZE THROW ( first vector is now enlarged ) ( l e )
2DUP R> + OVER SIZE CMOVE NIP ;

This is applicable to vectors of cells or characters alike.
If you ALLOCATE your strings you can define
'vector-concat ALIAS $+

What is so special here? There are zillion ways to manage heap objects
incl. resizing.

One could use a big os-preallocated heap and do allocation/resizing of
Forth objects within that heap without recalling further os functions. But I remember reading an essay saying that modern os allocation functions belong
to the most heavily optimized functions anyhow. Which means that it
can be hard to beat their performance.

Of course in bare metal embedded devices things look different.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From none) (albert@21:1/5 to minforth@arcor.de on Tue Sep 5 12:07:34 2023

In article <e29c5b36-128e-43c6-94f7-5ca5a0f078aen@googlegroups.com>,
minforth <minforth@arcor.de> wrote:

none albert schrieb am Montag, 4. September 2023 um 14:14:48 UTC+2:

If you implement ALLOCATE yourself it is easy to incorporate SIZE :

1000 ALLOCATE THROW SIZE .
1000 OK

I have done that in my ALLOCATE and reportedly Hugh has done that
also. Maybe he even came up with that name.

You can now define :

\ To an allocated vector append an vector . Return new vector .
: vector-concat OVER SIZE DUP >R OVER SIZE + >R
SWAP R> RESIZE THROW ( first vector is now enlarged ) ( l e )
2DUP R> + OVER SIZE CMOVE NIP ;

This is applicable to vectors of cells or characters alike.
If you ALLOCATE your strings you can define
'vector-concat ALIAS $+

What is so special here? There are zillion ways to manage heap objects
incl. resizing.

One could use a big os-preallocated heap and do allocation/resizing of
Forth objects within that heap without recalling further os functions. But I >remember reading an essay saying that modern os allocation functions belong >to the most heavily optimized functions anyhow. Which means that it
can be hard to beat their performance.

Of course in bare metal embedded devices things look different.

What are you talking about? Optimization is the last thing on my mind.
Ease of programming is. Traditional malloc don't allow you to
retrive size.

Groetjes Albert
--
Don't praise the day before the evening. One swallow doesn't make spring.
You must not say "hey" before you have crossed the bridge. Don't sell the
hide of the bear until you shot it. Better one bird in the hand than ten in
the air. First gain is a cat spinning. - the Wise from Antrim -

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From minforth@21:1/5 to none albert on Tue Sep 5 04:29:18 2023

none albert schrieb am Dienstag, 5. September 2023 um 12:07:37 UTC+2:

In article <e29c5b36-128e-43c6...@googlegroups.com>,
minforth <minf...@arcor.de> wrote:

none albert schrieb am Montag, 4. September 2023 um 14:14:48 UTC+2:

If you implement ALLOCATE yourself it is easy to incorporate SIZE :

1000 ALLOCATE THROW SIZE .
1000 OK

I have done that in my ALLOCATE and reportedly Hugh has done that
also. Maybe he even came up with that name.

You can now define :

\ To an allocated vector append an vector . Return new vector .
: vector-concat OVER SIZE DUP >R OVER SIZE + >R
SWAP R> RESIZE THROW ( first vector is now enlarged ) ( l e )
2DUP R> + OVER SIZE CMOVE NIP ;

This is applicable to vectors of cells or characters alike.
If you ALLOCATE your strings you can define
'vector-concat ALIAS $+

What is so special here? There are zillion ways to manage heap objects >incl. resizing.

One could use a big os-preallocated heap and do allocation/resizing of >Forth objects within that heap without recalling further os functions. But I >remember reading an essay saying that modern os allocation functions belong >to the most heavily optimized functions anyhow. Which means that it
can be hard to beat their performance.

Of course in bare metal embedded devices things look different.

What are you talking about? Optimization is the last thing on my mind.
Ease of programming is. Traditional malloc don't allow you to
retrive size.

Okay. malloc/resize have the desired size as input argument, so it is already known.
When they fail, they return an error indicator.
When they succeed, the actually allocated size can be larger (eg by memory page granularity).
Did I overlook something?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From none) (albert@21:1/5 to minforth@arcor.de on Wed Sep 6 11:22:31 2023

In article <96c68e49-4de7-4538-885f-838692d3d5fdn@googlegroups.com>,
minforth <minforth@arcor.de> wrote:

none albert schrieb am Dienstag, 5. September 2023 um 12:07:37 UTC+2:

In article <e29c5b36-128e-43c6...@googlegroups.com>,
minforth <minf...@arcor.de> wrote:

none albert schrieb am Montag, 4. September 2023 um 14:14:48 UTC+2:

If you implement ALLOCATE yourself it is easy to incorporate SIZE :

1000 ALLOCATE THROW SIZE .
1000 OK

I have done that in my ALLOCATE and reportedly Hugh has done that
also. Maybe he even came up with that name.

You can now define :

\ To an allocated vector append an vector . Return new vector .
: vector-concat OVER SIZE DUP >R OVER SIZE + >R
SWAP R> RESIZE THROW ( first vector is now enlarged ) ( l e )
2DUP R> + OVER SIZE CMOVE NIP ;

This is applicable to vectors of cells or characters alike.
If you ALLOCATE your strings you can define
'vector-concat ALIAS $+

What is so special here? There are zillion ways to manage heap objects
incl. resizing.

One could use a big os-preallocated heap and do allocation/resizing of
Forth objects within that heap without recalling further os functions. But I
remember reading an essay saying that modern os allocation functions belong >> >to the most heavily optimized functions anyhow. Which means that it
can be hard to beat their performance.

Of course in bare metal embedded devices things look different.

What are you talking about? Optimization is the last thing on my mind.
Ease of programming is. Traditional malloc don't allow you to
retrive size.

Okay. malloc/resize have the desired size as input argument, so it is
already known.
When they fail, they return an error indicator.
When they succeed, the actually allocated size can be larger (eg by memory >page granularity).
Did I overlook something?

A really substantial hassle. You need to keep track of the sizes.
E.g. concatenate gets 4 stack items instead of two.

Groetjes Albert
--
Don't praise the day before the evening. One swallow doesn't make spring.
You must not say "hey" before you have crossed the bridge. Don't sell the
hide of the bear until you shot it. Better one bird in the hand than ten in
the air. First gain is a cat spinning. - the Wise from Antrim -

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From minforth@21:1/5 to none albert on Wed Sep 6 12:07:24 2023

none albert schrieb am Mittwoch, 6. September 2023 um 11:22:34 UTC+2:

In article <96c68e49-4de7-4538...@googlegroups.com>,
minforth <minf...@arcor.de> wrote:

Did I overlook something?

A really substantial hassle. You need to keep track of the sizes.
E.g. concatenate gets 4 stack items instead of two.

I see, stack juggling is evil

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Hans Bezemer@21:1/5 to none albert on Wed Sep 6 15:23:35 2023

On Monday, September 4, 2023 at 2:14:48 PM UTC+2, none albert wrote:

If you implement ALLOCATE yourself it is easy to incorporate SIZE :

1000 ALLOCATE THROW SIZE .
1000 OK

I have done that in my ALLOCATE and reportedly Hugh has done that
also. Maybe he even came up with that name.

You can now define :

\ To an allocated vector append an vector . Return new vector .
: vector-concat OVER SIZE DUP >R OVER SIZE + >R
SWAP R> RESIZE THROW ( first vector is now enlarged ) ( l e )
2DUP R> + OVER SIZE CMOVE NIP ;

This is applicable to vectors of cells or characters alike.
If you ALLOCATE your strings you can define
'vector-concat ALIAS $+

Groetjes Albert
--
Don't praise the day before the evening. One swallow doesn't make spring. You must not say "hey" before you have crossed the bridge. Don't sell the hide of the bear until you shot it. Better one bird in the hand than ten in the air. First gain is a cat spinning. - the Wise from Antrim -

4tH has got it too, but it is called ALLOCATED. Like ERROR I felt SIZE was
too likely to be used in an application program. And since it isn't standardized
I kept it out of harms way. 4tH doesn't allow implicit redefinition.

Hans Bezemer

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to minforth on Thu Sep 7 07:21:45 2023

minforth <minforth@arcor.de> writes:

But I
remember reading an essay saying that modern os allocation functions belong >to the most heavily optimized functions anyhow. Which means that it
can be hard to beat their performance.

I have read that many times over the decades. And then I did the
measurements for Figure 14 of
<http://euroforth.org/ef17/papers/ertl.pdf>, and found those funny
kinky lines; I believe they come from the ALLOCATE and FREE calls
(which used glibc's malloc() and free() implementations). So in
further work <http://euroforth.org/ef18/papers/ertl-chaining.pdf> I
implemented a cache of freed vectors. Some time later a new version
of glibc appeared, and the announcement claimed to reduce the overhead
of thread synchronization (which apparently also slows down
single-threaded programs like my benchmarks) of earlier versions by
using a per-thread cache of freed memory areas.

So maybe the allocation functions have been heavily optimized at least
since Doug Lea published his allocator. This does not mean that we
are close to the optimum.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023: https://euro.theforth.net/2023

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From minforth@21:1/5 to Anton Ertl on Thu Sep 7 01:00:51 2023

Anton Ertl schrieb am Donnerstag, 7. September 2023 um 09:40:13 UTC+2:

minforth <minf...@arcor.de> writes:

But I
remember reading an essay saying that modern os allocation functions belong >to the most heavily optimized functions anyhow. Which means that it
can be hard to beat their performance.

I have read that many times over the decades. And then I did the
measurements for Figure 14 of
<http://euroforth.org/ef17/papers/ertl.pdf>, and found those funny
kinky lines; I believe they come from the ALLOCATE and FREE calls
(which used glibc's malloc() and free() implementations). So in
further work <http://euroforth.org/ef18/papers/ertl-chaining.pdf> I implemented a cache of freed vectors. Some time later a new version
of glibc appeared, and the announcement claimed to reduce the overhead
of thread synchronization (which apparently also slows down
single-threaded programs like my benchmarks) of earlier versions by
using a per-thread cache of freed memory areas.

So maybe the allocation functions have been heavily optimized at least
since Doug Lea published his allocator. This does not mean that we
are close to the optimum.

Obviously. I am guessing that Linux is still mostly a server OS and thus optimizations are more targeted towards running huge amounts of parallel jobs on multi-CPU machines (eg K8s clusters). Not really Forth's domain.

But this is not my field of expertise. For example I really can't estimate whether
your benchmark would show different results between a Linux enterprise or desktop variant of the same distribution, or if they all use the same kernel module.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From none) (albert@21:1/5 to Anton Ertl on Thu Sep 7 11:39:39 2023

In article <2023Sep7.092145@mips.complang.tuwien.ac.at>,
Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:

minforth <minforth@arcor.de> writes:

But I
remember reading an essay saying that modern os allocation functions belong >>to the most heavily optimized functions anyhow. Which means that it
can be hard to beat their performance.

I have read that many times over the decades. And then I did the >measurements for Figure 14 of
<http://euroforth.org/ef17/papers/ertl.pdf>, and found those funny
kinky lines; I believe they come from the ALLOCATE and FREE calls
(which used glibc's malloc() and free() implementations). So in
further work <http://euroforth.org/ef18/papers/ertl-chaining.pdf> I >implemented a cache of freed vectors. Some time later a new version
of glibc appeared, and the announcement claimed to reduce the overhead
of thread synchronization (which apparently also slows down
single-threaded programs like my benchmarks) of earlier versions by
using a per-thread cache of freed memory areas.

So maybe the allocation functions have been heavily optimized at least
since Doug Lea published his allocator. This does not mean that we
are close to the optimum.

Then there is the notion that allocate/free is one-size fits-all
approach. There are always circumstances that a strategy that will
benefit some applications and will harm others.

That opens an opportunity for Forth to adapt a simple ALLOCATE/FREE
for the situation at hand. It is not so simple to customize glib's allocate/free.

- anton

Groetjes Albert
--
Don't praise the day before the evening. One swallow doesn't make spring.
You must not say "hey" before you have crossed the bridge. Don't sell the
hide of the bear until you shot it. Better one bird in the hand than ten in
the air. First gain is a cat spinning. - the Wise from Antrim -

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Marcel Hendrix@21:1/5 to none albert on Thu Sep 7 10:33:33 2023

On Thursday, September 7, 2023 at 11:39:42 AM UTC+2, none albert wrote:
[..]

That opens an opportunity for Forth to adapt a simple ALLOCATE/FREE
for the situation at hand. It is not so simple to customize glib's allocate/free.

Most programs I know ( e.g. NGSPICE, FFTW, NRC, ...) add something to it.
In iForth I found it useful to always align on a 256 byte boundary (for AVX) and make malloc do zeroing. I guess by now the C compilers make that an unnecessary addition.

-marcel

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to minforth on Fri Sep 8 07:58:33 2023

minforth <minforth@arcor.de> writes:

Anton Ertl schrieb am Donnerstag, 7. September 2023 um 09:40:13 UTC+2:

So maybe the allocation functions have been heavily optimized at least
since Doug Lea published his allocator. This does not mean that we
are close to the optimum.

Obviously. I am guessing that Linux is still mostly a server OS and thus >optimizations are more targeted towards running huge amounts of parallel jobs >on multi-CPU machines (eg K8s clusters).

If that was true, what would be its relevance for the topic at hand?

Not really Forth's domain.

Really? Forth did multi-user support before Linux even existed.
Forth did clusters (e.g., in Riad airport, but also in the particle
accelerator in Munich) also before Linux existed.

But this is not my field of expertise. For example I really can't estimate whether
your benchmark would show different results between a Linux enterprise or >desktop variant of the same distribution, or if they all use the same kernel module.

Kernel modules don't come into play (at least if the problem was
really malloc()/free()). malloc() and free() are not implemented by
the kernel, but by libc. I used glibc, the same libc implementation
that most distributions, including server distributions use.

The typical difference between the server and desktop version of a
distribution is that the desktop version provides more software, but
the software provided in the server version is supported longer.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023: https://euro.theforth.net/2023

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to albert@cherry. on Fri Sep 8 08:36:31 2023

albert@cherry.(none) (albert) writes:

That opens an opportunity for Forth to adapt a simple ALLOCATE/FREE
for the situation at hand. It is not so simple to customize glib's >allocate/free.

I have often read about custom allocators used in C programs, and less
often in Forth programs (probably because Forth fans still often try
to do it all with static allocation).

The goal of optimizations of malloc() and free() is to make such extra
effort unnecessary, but at least the version (in glibc-2.24 or so) that
I used in 2017 obviously had not achieved this goal.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023: https://euro.theforth.net/2023

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Bob Worm
  Tue May 7 21:37:57 2024
  from Wales, Uk via Telnet
- Keyop
  Tue May 7 20:20:13 2024
  from Huddersfield, West Yorkshire via SSH
- Bob Worm
  Tue May 7 15:23:21 2024
  from Wales, Uk via Telnet
- Bob Worm
  Tue May 7 14:12:05 2024
  from Wales, Uk via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	300
Nodes:	16 (2 / 14)
Uptime:	51:59:26
Calls:	6,712
Calls today:	5
Files:	12,243
Messages:	5,355,173
Posted today:	1

This is why ALLOCATE + SIZE is superior

Who's Online

Recent Visitors

System Info