Forum: >>> Magnum BBS <<<

why is this procedure is so slow?

From Julieta Shem@21:1/5 to All on Thu Feb 8 13:47:51 2024

First I define x to be a list of bytes of an NNTP article.

(defvar x (fetch-article "local.test" "28"))

Now x is a list of integers, the bytes of the article 28, which is a
MIME message containing a PDF of 610 KiB.

Now I want to split it line by line. I am able to do it, but it's very
slow.

* (time (length (split-sequence (list 13 10) x nil)))
Evaluation took:
65.710 seconds of real time
65.671875 seconds of total run time (47.093750 user, 18.578125 system)
[ Run times consist of 23.968 seconds GC time, and 41.704 seconds non-GC time. ]
99.94% CPU
170,322,749,168 processor cycles
79,439,358,864 bytes consed

11585
*

Less than 12k lines. The number 79,439,358,864 of bytes is way more
than I would have expected for anything regarding this invocation above,
but I have no idea what this number really measures here. I appreciate
any help. Thank you.

Here's the procedure:

(defun split-sequence (delim ls acc &key limit (so-far 1))
(let* ((len (length ls))
(delim delim)
(pos (search delim ls))
(n-take (or pos len))
(n-drop (if pos
(+ n-take (length delim))
n-take)))
(cond ((zerop len) acc)
((and limit (= so-far limit)) (list ls))
(t (split-sequence
delim (drop n-drop ls)
(cons (take n-take ls) acc)
:limit limit
:so-far (1+ so-far))))))

(defun take (n seq) (subseq seq 0 n))
(defun drop (n seq) (subseq seq n))

(*) A sample of the article

All lines in it should be pretty short.

--8<---------------cut here---------------start------------->8---
Message-Id: <tnkqcqnuujaljsvmzvuc@loop>
Content-Type: multipart/mixed; boundary="------------PeB0GiqcER01ZhCmBvnP2yr6" Date: Wed, 7 Feb 2024 22:22:57 -0300
Mime-Version: 1.0
User-Agent: Mozilla Thunderbird
Newsgroups: local.test
Content-Language: en-US
From: Someone <someone@somewhere.org>
Subject: juris hartmanis

This is a multi-part message in MIME format. JVBERi0xLjIKekdf1fnfSqQYt7AjczYfpmRSIEyEcx8KMSAwIG9iago8PAovVHlwZSAvQ2F0 YWxvZwovUGFnZXMgMyAwIFIKL091dGxpbmVzIDIgMCBSCj4+CmVuZG9iagoyIDAgb2JqCjw8
[...]
IDAwMDAwIG4gCjAwMDA2MjI5NTQgMDAwMDAgbiAKdHJhaWxlcgo8PAovU2l6ZSA0NgovUm9v dCAxIDAgUgovSW5mbyA0NSAwIFIKPj4Kc3RhcnR4cmVmCjYyMzI4NgolJUVPRgo=

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Kaz Kylheku@21:1/5 to Julieta Shem on Thu Feb 8 19:24:57 2024

On 2024-02-08, Julieta Shem <jshem@yaxenu.org> wrote:

First I define x to be a list of bytes of an NNTP article.

(defvar x (fetch-article "local.test" "28"))

Now x is a list of integers, the bytes of the article 28, which is a
MIME message containing a PDF of 610 KiB.

Now I want to split it line by line. I am able to do it, but it's very
slow.

* (time (length (split-sequence (list 13 10) x nil)))
Evaluation took:
65.710 seconds of real time

Your code has O(N^2) behavior because each recursive call performs
operations that take a time proportional to the length of input
list, such as taking its length.

The operation (subseq list n) is required to produce a copy of the
cell structure. You can skip n elements of a list with (nthcdr n list)
without allocating memory. It still requires traversing the list
from the beginning to find the n-th cell.

This kind of function is best implemented separately for lists
and vectors.

To develop it reasonably efficiently in a generic way, what we can do is
have some sort of abstract iteration primitives first: a way to get an
iterator object over any kind of sequence, and then step it. The
splitting is then done as a state machine. Getting successive items,
collecting them and watching for the delimiter, collecting the items
into sequences of the same kind as the input.

To make it better, you'd really want to specialize this. If the input
sequence is a list, dispatch code that does efficient list processing
with minimal consing, and no redundant walking over the list (single
pass only).

Otherwise if the sequence is a vector-like sequence, then use subseq.
But don't wastefully calculate the subseq of the rest of the list.
Just extract the splits.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Julieta Shem@21:1/5 to Raymond Wiker on Fri Feb 9 10:24:05 2024

I'd like to thank everyone in this thread. Valuable help. Thank you.

Raymond Wiker <rwiker@gmail.com> writes:

Julieta Shem <jshem@yaxenu.org> writes:

First I define x to be a list of bytes of an NNTP article.

(defvar x (fetch-article "local.test" "28"))

Now x is a list of integers, the bytes of the article 28, which is a
MIME message containing a PDF of 610 KiB.

Now I want to split it line by line. I am able to do it, but it's very
slow.

* (time (length (split-sequence (list 13 10) x nil)))
Evaluation took:
65.710 seconds of real time
65.671875 seconds of total run time (47.093750 user, 18.578125 system)
[ Run times consist of 23.968 seconds GC time, and 41.704 seconds non-GC time. ]
99.94% CPU
170,322,749,168 processor cycles
79,439,358,864 bytes consed

11585
*

Less than 12k lines. The number 79,439,358,864 of bytes is way more
than I would have expected for anything regarding this invocation above,
but I have no idea what this number really measures here. I appreciate
any help. Thank you.

Here's the procedure:

(defun split-sequence (delim ls acc &key limit (so-far 1))
(let* ((len (length ls))
(delim delim)
(pos (search delim ls))
(n-take (or pos len))
(n-drop (if pos
(+ n-take (length delim))
n-take)))
(cond ((zerop len) acc)
((and limit (= so-far limit)) (list ls))
(t (split-sequence
delim (drop n-drop ls)
(cons (take n-take ls) acc)
:limit limit
:so-far (1+ so-far))))))

(defun take (n seq) (subseq seq 0 n))
(defun drop (n seq) (subseq seq n))

It is slow because

* You're repeatedly calling #'length on whatever is left of the
list. Computing the length of a list is an O(n) operation; repeatedly
calling #'length on the remainder of the list is then O(n^2).

* You are generating new lists for both the initial segments and the
rest of the list. In order to generate 12k lines you have to generate
lists containing the lines, but also 12 lists containing the remainder
(at various points) of the input.

That's the take and drop, I think.

* Each character of the input requires a cons cell, which is probably 16
bytes (assuming a 64-bit Lisp implementation). So, your input now
requires 610000*16, so about 9.76MB.

So that's one of the advantages of using a vector, I think.

* The remainder lists will consume about 0.5*610000*16*12000, so about
55GB. This is not too far from the 74GB that time reports. (My
calculations are probably off, but clearly not _too_ far off.)

Impressive.

* I'm going to assume that 74GB is going to be a significant chunk of
the available memory in your computer. That means that there will be
significant garbage-collection activity, and you may also end up
swapping to disk.

What do you mean by ``available memory;'''? I have 24 GiB of RAM---23.9
GiB usable, says Windows. I don't know how to show this right now, but
I think I don't have any swap space (on Windows). I tried to turn it
off some time ago I think I succeeded.

If you used strings (or byte arrays instead), you would

* Require much less memory.

* Computing #'length for a string or array is O(1), so the overall complexity
becomes O(n).

Because strings and arrays store their size somewhere in their data
structure, I suppose, and lists do not.

* You don't have to create new arrays containing the successive
remainders; instead, you can use the :start keyword to #'search.

I tried this. Here's how I rewrote it. First some user interface that
gets the length just once, even though now it's probably okay to call
length multiple times. (The /acc/umulator should be probably be
optional here, but I was afraid of mixing &optional with &key. This is
my first Common Lisp program.)

(defun split-vector (delim v acc &key limit (so-far 1))
(let ((len (length v)))
(split-vector-helper delim v len acc limit so-far 0)))

Now, find the position of /delim/ between /start/ and /len/. If you
can't find it, we're done. If we've reached the limit of lines the
caller requested, we're done. Now we found /delim/: copy the slice of
the vector that goes from /start/ to the /pos/ition we found /delim/;
save it in the accumulator. Update /start/ we along over the vector.
Now repeat everything.

(defun split-vector-helper (delim v len acc limit so-far start)
(if (zerop len)
acc
(let ((pos (search delim v :start2 start :end2 len)))
(cond ((or (not pos) (and limit (= so-far limit)))
(nreverse (cons (subseq v start len) acc)))
(t (split-vector-helper delim
v
len
(cons (subseq v start (or pos len)) acc)
limit
(1+ so-far)
(+ pos (length delim))))))))

The result is better:

* (time (length (split-vector (vector 13 10) x nil)))
Evaluation took:
0.011 seconds of real time
0.000000 seconds of total run time (0.000000 user, 0.000000 system)
0.00% CPU
30,857,378 processor cycles
7,000,064 bytes consed

11576

(Difficult to understand 0.011 real time atgainst ``total run time''.)

(I call /length/ so I won't get the output of split-vector, which is
long.)

Question. When split-vector-helper is repeated, will the arguments be
copied over to the new invokation, or will SBCL somehow pass on
pointers?

I haven't studied about the /loop/ macro yet.

* Also, if you used strings, you could just create a string stream from
the input string and repeatedly call #'read-line to collect the
lines. The entire program would be something like this (untested):

(defun split-article (article-as-string)
(with-input-from-string (str article-as-string)
(loop for line = (read-line str nil)
while line
collect line)))

As Spiros Bousbouras remarked in the thread, I'd prefer not to try to understand article content because the user has the right to use
whatever encoding they please in NNTP articles. So I've been trying
handle them as byte sequences.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Julieta Shem@21:1/5 to Spiros Bousbouras on Sat Feb 10 11:47:23 2024

Spiros Bousbouras <spibou@gmail.com> writes:

On Fri, 09 Feb 2024 10:24:05 -0300
Julieta Shem <jshem@yaxenu.org> wrote:

I don't know if you have learned declaration syntax yet but some declarations would help make the code clearer and help the compiler do optimisations or catch errors. For example

(declaim (ftype (function (vector vector list
&key (:limit (or null integer)) (:so-far integer))
list)
split-vector))

I have not. Very interesting. Sounds like ``gradual typing'', if
``gradual typing'' is what it sounds like. :)

(defun split-vector (delim v acc &key limit (so-far 1))
(let ((len (length v)))
(split-vector-helper delim v len acc limit so-far 0)))

Now, find the position of /delim/ between /start/ and /len/. If you
can't find it, we're done. If we've reached the limit of lines the
caller requested, we're done. Now we found /delim/: copy the slice of
the vector that goes from /start/ to the /pos/ition we found /delim/;
save it in the accumulator. Update /start/ we along over the vector.
Now repeat everything.

(defun split-vector-helper (delim v len acc limit so-far start)
(if (zerop len)
acc
(let ((pos (search delim v :start2 start :end2 len)))
(cond ((or (not pos) (and limit (= so-far limit)))
(nreverse (cons (subseq v start len) acc)))
(t (split-vector-helper delim
v
len
(cons (subseq v start (or pos len)) acc)

^^^^^^^^^^
If this branch is taken , you already know that pos is non NIL .

Thanks!

limit
(1+ so-far)
(+ pos (length delim))))))))

Question. When split-vector-helper is repeated, will the arguments be
copied over to the new invokation, or will SBCL somehow pass on
pointers?

This has been answered in a different post but you can also write the
helper function within the body of split-vector in which case the
helper will inherit the surrounding lexical environment so you won't
need to pass so many arguments to it.

Sounds very useful. Your ``&aux'' made me think that's possible.
Great news.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Julieta Shem@21:1/5 to Spiros Bousbouras on Sat Feb 10 11:43:44 2024

Spiros Bousbouras <spibou@gmail.com> writes:

[...]

I haven't studied about the /loop/ macro yet.

Some people abhor loop, while others find it strangely attractive (or
even addictive).

For turning Julieta's recursion into a loop , DO will do just fine.

Good to know. I haven't studied DO and DO* yet either. Lol. How am I
writing code over here? Perhaps you'd all be horrified. It's hard to
learn everything. It seems better to write with the little we have and
then rewrite it as the years go by. I did learn to use DOLIST.

[...]

If it also wants to do spam filtering
then yes , it will have to decode the body.

Spam should be the problem of a spam filter---not my problem. :)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Julieta Shem@21:1/5 to Raymond Wiker on Sat Feb 10 11:37:32 2024

Raymond Wiker <rwiker@gmail.com> writes:

Julieta Shem <jshem@yaxenu.org> writes:

I'd like to thank everyone in this thread. Valuable help. Thank you.

Raymond Wiker <rwiker@gmail.com> writes:

Julieta Shem <jshem@yaxenu.org> writes:

First I define x to be a list of bytes of an NNTP article.

(defvar x (fetch-article "local.test" "28"))

Now x is a list of integers, the bytes of the article 28, which is a
MIME message containing a PDF of 610 KiB.

Now I want to split it line by line. I am able to do it, but it's very >>>> slow.

* (time (length (split-sequence (list 13 10) x nil)))
Evaluation took:
65.710 seconds of real time
65.671875 seconds of total run time (47.093750 user, 18.578125 system) >>>> [ Run times consist of 23.968 seconds GC time, and 41.704 seconds non-GC time. ]
99.94% CPU
170,322,749,168 processor cycles
79,439,358,864 bytes consed

11585
*

Less than 12k lines. The number 79,439,358,864 of bytes is way more
than I would have expected for anything regarding this invocation above, >>>> but I have no idea what this number really measures here. I appreciate >>>> any help. Thank you.

Here's the procedure:

(defun split-sequence (delim ls acc &key limit (so-far 1))
(let* ((len (length ls))
(delim delim)
(pos (search delim ls))
(n-take (or pos len))
(n-drop (if pos
(+ n-take (length delim))
n-take)))
(cond ((zerop len) acc)
((and limit (= so-far limit)) (list ls))
(t (split-sequence
delim (drop n-drop ls)
(cons (take n-take ls) acc)
:limit limit
:so-far (1+ so-far))))))

(defun take (n seq) (subseq seq 0 n))
(defun drop (n seq) (subseq seq n))

It is slow because

* You're repeatedly calling #'length on whatever is left of the
list. Computing the length of a list is an O(n) operation; repeatedly
calling #'length on the remainder of the list is then O(n^2).

* You are generating new lists for both the initial segments and the
rest of the list. In order to generate 12k lines you have to generate
lists containing the lines, but also 12 lists containing the remainder >>> (at various points) of the input.

That's the take and drop, I think.

Yup.

* Each character of the input requires a cons cell, which is probably 16 >>> bytes (assuming a 64-bit Lisp implementation). So, your input now
requires 610000*16, so about 9.76MB.

So that's one of the advantages of using a vector, I think.

Right. The other is that accessing any element in an array by position (index) is O(1); for a list, it's O(n).

* The remainder lists will consume about 0.5*610000*16*12000, so about
55GB. This is not too far from the 74GB that time reports. (My
calculations are probably off, but clearly not _too_ far off.)

Impressive.

Maybe... but it also appears to be wrong :-)

That's okay. :-)

* I'm going to assume that 74GB is going to be a significant chunk of
the available memory in your computer. That means that there will be
significant garbage-collection activity, and you may also end up
swapping to disk.

What do you mean by ``available memory;'''? I have 24 GiB of RAM---23.9
GiB usable, says Windows. I don't know how to show this right now, but
I think I don't have any swap space (on Windows). I tried to turn it
off some time ago I think I succeeded.

What implementation of Common Lisp are you using? Is it a 32-bit or
64-bit version?

I believe it's a 64-bit version.

%file sbcl.exe
sbcl.exe: PE32+ executable (console) x86-64, for MS Windows

I got it from

https://www.sbcl.org/platform-table.html

and I downloaded the MSI-file from column AMD64, version 2.3.2.

%./sbcl.exe --version
SBCL 2.3.2

If you used strings (or byte arrays instead), you would

* Require much less memory.

* Computing #'length for a string or array is O(1), so the overall complexity
becomes O(n).

Because strings and arrays store their size somewhere in their data
structure, I suppose, and lists do not.

Right. Lists are literally composed of objects that contain two values
(cons cells), where the second value is a pointer to the next cons cell.

* You don't have to create new arrays containing the successive
remainders; instead, you can use the :start keyword to #'search.

I tried this. Here's how I rewrote it. First some user interface that
gets the length just once, even though now it's probably okay to call
length multiple times. (The /acc/umulator should be probably be
optional here, but I was afraid of mixing &optional with &key. This is
my first Common Lisp program.)

(defun split-vector (delim v acc &key limit (so-far 1))
(let ((len (length v)))
(split-vector-helper delim v len acc limit so-far 0)))

Now, find the position of /delim/ between /start/ and /len/. If you
can't find it, we're done. If we've reached the limit of lines the
caller requested, we're done. Now we found /delim/: copy the slice of
the vector that goes from /start/ to the /pos/ition we found /delim/;
save it in the accumulator. Update /start/ we along over the vector.
Now repeat everything.

(defun split-vector-helper (delim v len acc limit so-far start)
(if (zerop len)
acc
(let ((pos (search delim v :start2 start :end2 len)))
(cond ((or (not pos) (and limit (= so-far limit)))
(nreverse (cons (subseq v start len) acc)))
(t (split-vector-helper delim
v
len
(cons (subseq v start (or pos len)) acc)
limit
(1+ so-far)
(+ pos (length delim))))))))

The result is better:

* (time (length (split-vector (vector 13 10) x nil)))
Evaluation took:
0.011 seconds of real time
0.000000 seconds of total run time (0.000000 user, 0.000000 system)
0.00% CPU
30,857,378 processor cycles
7,000,064 bytes consed

11576

(Difficult to understand 0.011 real time atgainst ``total run time''.)

That's a good second implementation, and a very nice improvement in run
time and memory use!

Thanks!

Note that you do not actually need the "len" parameter: if you don't
specify :end2 for #'search or the end position for #'subseq, they will
assume the value nil, which means the end of the sequence.

Noted.

I haven't studied about the /loop/ macro yet.

Some people abhor loop, while others find it strangely attractive (or
even addictive).

People say loop is hard to understand. I have a feeling I won't like it
much: hard-to-understand things don't give us a feeling of control.

``The details of the interaction matter, ease of use matters, but I
want more than correct details, more than a system that is easy to
learn or to use: I want a system that is enjoyable to use. This is
an important, dominating design philosophy, easier to say than to
do. It implies developing systems that provide a strong sense of
understanding and control. This means tools that reveal their
underlying conceptual model and allow for interaction, tools that
emphasize comfort, ease, and pleasure of use [...]. A major factor
in this debate is the feeling of control that the user has over the
operations that are being performed. A `powerful,' `intelligent'
system can lead to the well documented problems of `overautomation,'
causing the user to be a passive observer of operations, no longer
in control of either what operations take place, or of how they are
done. On the other hand, systems that are not sufficiently powerful
or intelligent can leave too large a gap in the mappings from
intention to action execution and from system state to psychological
interpretation. The result is that operation and interpretation are
complex and difficult, and the user again feels out of control,
distanced from the system.'' -- ``User Centered System Design'',
chapter 3, ``cognitive engineering'', ``on the quality of
human-computer interaction'', pages 48--49, Donald A. Norman, CRC
Press, 1986, ISBN 0-89859-872-9.

* Also, if you used strings, you could just create a string stream from
the input string and repeatedly call #'read-line to collect the
lines. The entire program would be something like this (untested):

(defun split-article (article-as-string)
(with-input-from-string (str article-as-string)
(loop for line = (read-line str nil)
while line
collect line)))

As Spiros Bousbouras remarked in the thread, I'd prefer not to try to
understand article content because the user has the right to use
whatever encoding they please in NNTP articles. So I've been trying
handle them as byte sequences.

At some point you will probably have to encode the article as
characters. In order to do so, you will have to parse the article
headers to find what encoding is used. I'm not convinced that this is possible, but if you have a 16-bit character encoding, you may have an
octet 13 from one character, and an octet 10 from the following
character.

That's a good question. I wonder what would happen. Spiros Bousbouras
seems to say that the NNTP protocol is such that if someone uses an
encoding that could encode a character as CRLF, it's their problem
now---NNTP came first.

So far what I have written is what Bousbouras says. I take an article
as a sequence of bytes. I look for the first sequence of bytes

13 10 13 10.

That separates headers and body. The body is everything that follows
this mark. Headers are everything that precedes it. I do have to
decode some headers completely---such as /newsgroups/. To make my life
easy and let me use string functions, I've been decoding all headers,
though I would rewrite that if I could and let people put whatever
encoding they want in the headers so long as they respect
line-termination as CRLF and respect the header-value separator, which
is the COLON.

Perhaps this perspective turns the NNTP protocol into a binary protocol.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Julieta Shem@21:1/5 to Spiros Bousbouras on Sat Feb 10 14:18:13 2024

Spiros Bousbouras <spibou@gmail.com> writes:

On Sat, 10 Feb 2024 11:43:44 -0300
Julieta Shem <jshem@yaxenu.org> wrote:

Spiros Bousbouras <spibou@gmail.com> writes:

[...]

I haven't studied about the /loop/ macro yet.

Some people abhor loop, while others find it strangely attractive (or
even addictive).

For turning Julieta's recursion into a loop , DO will do just fine.

Good to know. I haven't studied DO and DO* yet either. Lol. How am I
writing code over here? Perhaps you'd all be horrified. It's hard to
learn everything. It seems better to write with the little we have and
then rewrite it as the years go by. I did learn to use DOLIST.

It will only take you 10 minutes tops to learn to use DO and it will make your life easier.

That's what non-beginners think, which includes beginners that didn't
begin yet---I've thought that too. Things are always more complicated
than it appears. You have no idea how many stupid things I've done
here. :) Here's a not-first attempt at using DO.

(defun nntp-read-line-helper (bs)
(let ((len (length bs))
acc)
(do ((i 0 (1+ i)))
((= i len) (nreverse acc))
(let ((x (svref bs i)))
(if (= x 10)
(return (nreverse acc))
(push x acc))))))

CL-USER> (nntp-read-line-helper (vector 1 2 3 13 10 4 5 6 13 10))
(1 2 3 13)

CL-USER> (nntp-read-line-helper (vector 4 5 6 13 10))
(4 5 6 13)

CL-USER> (nntp-read-line-helper (vector))
NIL

It'd be used like that. (This one might be successful.) I haven't
written nntp-read-line driver of this helper yet. (There are some thing
that I consider non-obvious. Network connections can deliver partial
lines. What do I do when I get a partial line? Also, for good
performance, I likely need read-sequence here. So, my first version
uses read-byte and now I have a slow reading procedure that shows its performance when you attach a file, say. To be honest, I'm not even
going to allow files to be attached, but I would like to know what to do
if I wanted a high-performance reader.)

Vectors are not convenient when we want to build something one element
at a time. I think coercing a list into a vector is O(n), but that
seems inevitable because to make a vector we need to know the size and
we don't know the size up front.

It must be hard trying to write everything as recursion. Just be
careful that the CLHS says do iterates over a group of statements
while a test condition holds.

which is the opposite than the correct behaviour which is that the iteration terminates when the test condition becomes true. That's an embarrassing mistake in the specification. All the implementations implement the correct behaviour.

I don't read the CLHS at first. I first, say, consult Paul Graham's
ANSI Common Lisp.

[...]

If it also wants to do spam filtering
then yes , it will have to decode the body.

Spam should be the problem of a spam filter---not my problem. :)

That depends on whether you are planning to use your server "properly"
i.e. put it online and peer with other servers , etc. People won't peer
with you unless you have antispam measures in place. So perhaps spam filtering won't be implemented within the server but you probably should
give some thought on how well your server would integrate with preexisting spam filters. I've seen peering policies specifying that in order to peer with a server , it has to use a piece of software called cleanfeed .

I'm not going to join the USENET---that would likely be a lot of work.
The idea of this server is to offer Lisp programmers a little private
club in which they can play with things. The server will be
closed---you need an account to join. Accounts can be created by any
member, so any member can invite other members---giving members a sense
they own the software that runs the community. (Instead of mailing
lists, say, you can run this little NNTP guy.) That's the idea---a
playground. It's also a small-community tool.

I've got a skeleton of it running. It's been great fun. I've never had
so much fun in programming. There was a time I was seriously
considering stopping programming, but then I discovered Lisp. In
choosing a dialect, I chose Racket and spent considerable time studying
it. I've written small things in Racket (for the web). I loved their web-server based on continuations. But it seems that life is much
easier and more fun in Common Lisp---not by a small delta. I always
felt that pretty much any Lisp would be more or less the same fun, but
that's not true.

Big thanks to the people here who directly or indirectly made me
consider Common Lisp. There's a thread here that began with Racket and
was eventually renamed to ``Choosing between Lisps and Schemes''. I
believe it was the POSIX discussion that gave me the idea that maybe I
would find the Common Lisp world easier to understand than Racket's.
Maybe that's what made me decide to give it a try. I'm very grateful to Racket, though, in that their books were what I needed to stop thinking
so much like a C programmer and start thinking more like a functional programmer. The project now is to think like a language-oriented
programmer and, ironically, I gave up on Racket for that. (Maybe I am
the problem.)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Alan Bawden@21:1/5 to All on Sun Feb 11 02:00:48 2024

Julieta Shem <jshem@yaxenu.org> writes:

Vectors are not convenient when we want to build something one element
at a time. I think coercing a list into a vector is O(n), but that
seems inevitable because to make a vector we need to know the size and
we don't know the size up front.

This is why we have arrays with fill pointers. You should read up on
how arrays (and vectors) really work, with an eye on eventually being
able to use VECTOR-PUSH-EXTEND on a array that's specialized to contain
just 8-bit bytes.

- Alan

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jeff Barnett@21:1/5 to Alan Bawden on Sun Feb 11 00:33:29 2024

On 2/11/2024 12:00 AM, Alan Bawden wrote:

Julieta Shem <jshem@yaxenu.org> writes:

Vectors are not convenient when we want to build something one element
at a time. I think coercing a list into a vector is O(n), but that
seems inevitable because to make a vector we need to know the size and
we don't know the size up front.

This is why we have arrays with fill pointers. You should read up on
how arrays (and vectors) really work, with an eye on eventually being
able to use VECTOR-PUSH-EXTEND on a array that's specialized to contain
just 8-bit bytes.

N.B. The use of vector-push-extend, where extend happens many (for many definitions of "many") times, will consume O(n^2) time AND probably
memory too. There is no free (or even cheap) lunch when you don't have a
good estimate of final total size up front.

I think the best you can do is some sort of balanced tree, e.g., a heap,
that will consume total build time of O(n*log n) and just O(n) space.
The average/median retrieval time of a single element by index: O(1) for vector; O(log n) for heap; O(n) for linked list.
--
Jeff Barnett

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Ben Bacarisse@21:1/5 to Jeff Barnett on Sun Feb 11 10:38:45 2024

Jeff Barnett <jbb@notatt.com> writes:

On 2/11/2024 12:00 AM, Alan Bawden wrote:

Julieta Shem <jshem@yaxenu.org> writes:
Vectors are not convenient when we want to build something one
element
at a time. I think coercing a list into a vector is O(n), but that
seems inevitable because to make a vector we need to know the size and >> we don't know the size up front.
This is why we have arrays with fill pointers. You should read up on
how arrays (and vectors) really work, with an eye on eventually being
able to use VECTOR-PUSH-EXTEND on a array that's specialized to contain
just 8-bit bytes.

N.B. The use of vector-push-extend, where extend happens many (for many definitions of "many") times, will consume O(n^2) time AND probably memory too. There is no free (or even cheap) lunch when you don't have a good estimate of final total size up front.

Is this true for real implementations? I would have thought they would typically use an exponential allocation strategy to give what is often
called "amortised O(n)" growth.

That is certainly what I found in a quick experiment using clisp:

(defun build-vec (size)
(let ((a (make-array 0 :adjustable t :fill-pointer t)))
(dotimes (i size (length a)) (vector-push-extend i a))))

shows linear growth in execution time.

I think the best you can do is some sort of balanced tree, e.g., a heap,
that will consume total build time of O(n*log n) and just O(n) space. The average/median retrieval time of a single element by index: O(1) for
vector; O(log n) for heap; O(n) for linked list.

--
Ben.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Ben Bacarisse@21:1/5 to Spiros Bousbouras on Sun Feb 11 22:00:56 2024

Spiros Bousbouras <spibou@gmail.com> writes:

On Sun, 11 Feb 2024 10:38:45 +0000
Ben Bacarisse <ben.usenet@bsb.me.uk> wrote:

Jeff Barnett <jbb@notatt.com> writes:

N.B. The use of vector-push-extend, where extend happens many (for many
definitions of "many") times, will consume O(n^2) time AND probably memory >> > too. There is no free (or even cheap) lunch when you don't have a good
estimate of final total size up front.

Is this true for real implementations? I would have thought they would
typically use an exponential allocation strategy to give what is often
called "amortised O(n)" growth.

That is certainly what I found in a quick experiment using clisp:

(defun build-vec (size)
(let ((a (make-array 0 :adjustable t :fill-pointer t)))
(dotimes (i size (length a)) (vector-push-extend i a))))

shows linear growth in execution time.

What did you measure ?

I did essentially this

(dolist (i '(16 17 18 19 20 21 22 23)) (time (build-vec (expt 2 i))))

though I was actually running it interactively to see what gave
reliable timings.

--
Ben.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Alan Bawden@21:1/5 to Alan Bawden on Sun Feb 11 21:06:20 2024

Jeff Barnett <jbb@notatt.com> writes:

On 2/11/2024 12:00 AM, Alan Bawden wrote:
> Julieta Shem <jshem@yaxenu.org> writes:
>
> Vectors are not convenient when we want to build something one element
> at a time. I think coercing a list into a vector is O(n), but that
> seems inevitable because to make a vector we need to know the size and
> we don't know the size up front.
>
> This is why we have arrays with fill pointers. You should read up on
> how arrays (and vectors) really work, with an eye on eventually being
> able to use VECTOR-PUSH-EXTEND on a array that's specialized to contain
> just 8-bit bytes.

N.B. The use of vector-push-extend, where extend happens many (for many
definitions of "many") times, will consume O(n^2) time AND probably memory
too. There is no free (or even cheap) lunch when you don't have a good
estimate of final total size up front.

I think the best you can do is some sort of balanced tree, e.g., a heap,
that will consume total build time of O(n*log n) and just O(n) space. The
average/median retrieval time of a single element by index: O(1) for
vector; O(log n) for heap; O(n) for linked list.

No. (Well technically you're correct, but this is not what the OP needs
to hear.)

Look, we added fill pointers, and VECTOR-PUSH and VECTOR-PUSH-EXTEND to
Common Lisp PRECISELY TO COVER SITUATIONS LIKE THIS. There's no need
here for balanced trees or other complicated data structures.

First you allocate an array with sufficient capacity for an estimated
typical case. If you're reading bytes from a stream, a size of
something like 16K is probably enough. Set the fill pointer to zero
and start accumulating bytes using VECTOR-PUSH-EXTEND.

Note that the third argument to VECTOR-PUSH-EXTEND is the amount to
extend the buffer if you get to the point of having more than 16K (or
whatever) elements. Pick something big enough so that extension is
rare. Like another 16K for example.

If you're doing this multiple times, just reset the fill pointer and
reuse this same array for the next buffer load of input. This way even
if you estimated wrong about the size of a typical buffer load, you'll
only pay for as many extensions as are need for the largest buffer load
you see.

This is pretty much the way you would handle things in C, and there is
no need to get any more complicated than that.

Yes, technically this is O(n^2) -- if you set the initial buffer
capacity to 1, and extend it by 1 element each time, this will matter.
But if you set the initial buffer capacity and extension amount to
something reasonable, you'll extend your buffer so few times that you'll
never know it isn't O(n).

- Alan

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jeff Barnett@21:1/5 to Ben Bacarisse on Sun Feb 11 21:39:34 2024

On 2/11/2024 3:38 AM, Ben Bacarisse wrote:

Jeff Barnett <jbb@notatt.com> writes:

On 2/11/2024 12:00 AM, Alan Bawden wrote:

Julieta Shem <jshem@yaxenu.org> writes:
Vectors are not convenient when we want to build something one
element
at a time. I think coercing a list into a vector is O(n), but that >>> seems inevitable because to make a vector we need to know the size and >>> we don't know the size up front.
This is why we have arrays with fill pointers. You should read up on
how arrays (and vectors) really work, with an eye on eventually being
able to use VECTOR-PUSH-EXTEND on a array that's specialized to contain
just 8-bit bytes.

N.B. The use of vector-push-extend, where extend happens many (for many
definitions of "many") times, will consume O(n^2) time AND probably memory >> too. There is no free (or even cheap) lunch when you don't have a good
estimate of final total size up front.

Is this true for real implementations? I would have thought they would typically use an exponential allocation strategy to give what is often
called "amortised O(n)" growth.

That is certainly what I found in a quick experiment using clisp:

(defun build-vec (size)
(let ((a (make-array 0 :adjustable t :fill-pointer t)))
(dotimes (i size (length a)) (vector-push-extend i a))))

shows linear growth in execution time.

Every time you do a rebuild initiated by a push-extend you end up
copying all the current items in the vector; and each time you do the
copy, more items are copied than for the last copy. This leads, finally,
to O(n^2) complexity. And as to storage, you must look at all the
allocations you have done and the overhead to either GC or tack them
into your heap. Usually, you only notice this degradation if you push a
whole lot of items - recall, O(.) is really an asymptotic measure. Lots
of tricks will hide the problem for a while but one usually gets bit in
the end.

When you ran build-vec, did you let size be a few billion? What happened
to run time? My guess is that most storage heap management schemes take
about the same amount of time and storage to allocate a few bytes and a
few thousand bytes. The growth effect, if it is real in the
implementation you are using for testing, must build vectors many many
times longer than the internal allocation chunk size to see the
quadratic effect.

I think the best you can do is some sort of balanced tree, e.g., a heap,
that will consume total build time of O(n*log n) and just O(n) space. The
average/median retrieval time of a single element by index: O(1) for
vector; O(log n) for heap; O(n) for linked list.

--
Jeff Barnett

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Ben Bacarisse@21:1/5 to Jeff Barnett on Mon Feb 12 20:53:05 2024

Jeff Barnett <jbb@notatt.com> writes:

On 2/11/2024 3:38 AM, Ben Bacarisse wrote:

Jeff Barnett <jbb@notatt.com> writes:

On 2/11/2024 12:00 AM, Alan Bawden wrote:

Julieta Shem <jshem@yaxenu.org> writes:
Vectors are not convenient when we want to build something one
element
at a time. I think coercing a list into a vector is O(n), but that >>>> seems inevitable because to make a vector we need to know the size and
we don't know the size up front.
This is why we have arrays with fill pointers. You should read up on
how arrays (and vectors) really work, with an eye on eventually being
able to use VECTOR-PUSH-EXTEND on a array that's specialized to contain >>>> just 8-bit bytes.

N.B. The use of vector-push-extend, where extend happens many (for many
definitions of "many") times, will consume O(n^2) time AND probably memory >>> too. There is no free (or even cheap) lunch when you don't have a good
estimate of final total size up front.

Is this true for real implementations? I would have thought they would
typically use an exponential allocation strategy to give what is often
called "amortised O(n)" growth.
That is certainly what I found in a quick experiment using clisp:
(defun build-vec (size)
(let ((a (make-array 0 :adjustable t :fill-pointer t)))
(dotimes (i size (length a)) (vector-push-extend i a))))
shows linear growth in execution time.

Every time you do a rebuild initiated by a push-extend you end up copying
all the current items in the vector;

Why? Having to copy everything, even in a tight reallocation loop like
the one I used, will be a problem.

and each time you do the copy, more
items are copied than for the last copy. This leads, finally, to O(n^2) complexity. And as to storage, you must look at all the allocations you
have done and the overhead to either GC or tack them into your
heap. Usually, you only notice this degradation if you push a whole lot of items - recall, O(.) is really an asymptotic measure. Lots of tricks will hide the problem for a while but one usually gets bit in the end.

When you ran build-vec, did you let size be a few billion? What
happened to run time?

No, I could only go up to (expt 2 23) before clisp seg faulted (surely a
bug?). That's only about 8 million items, but 134 million bytes of
storage. The time taken was almost perfectly linear.

My guess is that most storage heap management schemes take about
the same amount of time and storage to allocate a few bytes and a few thousand bytes. The growth effect, if it is real in the implementation you are using for testing, must build vectors many many times longer than the internal allocation chunk size to see the quadratic effect.

Can you post an example? I don't know how to get either clisp or SBCL
to go much larger. They both have options which would seem to control
this but I can't seem to get them to cooperate! Any help would be much appreciated.

--
Ben.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jeff Barnett@21:1/5 to Ben Bacarisse on Mon Feb 12 21:13:56 2024

On 2/12/2024 1:53 PM, Ben Bacarisse wrote:

Jeff Barnett <jbb@notatt.com> writes:

On 2/11/2024 3:38 AM, Ben Bacarisse wrote:

Jeff Barnett <jbb@notatt.com> writes:

On 2/11/2024 12:00 AM, Alan Bawden wrote:

Julieta Shem <jshem@yaxenu.org> writes:
Vectors are not convenient when we want to build something one >>>>> element
at a time. I think coercing a list into a vector is O(n), but that >>>>> seems inevitable because to make a vector we need to know the size and
we don't know the size up front.
This is why we have arrays with fill pointers. You should read up on >>>>> how arrays (and vectors) really work, with an eye on eventually being >>>>> able to use VECTOR-PUSH-EXTEND on a array that's specialized to contain >>>>> just 8-bit bytes.

N.B. The use of vector-push-extend, where extend happens many (for many >>>> definitions of "many") times, will consume O(n^2) time AND probably memory >>>> too. There is no free (or even cheap) lunch when you don't have a good >>>> estimate of final total size up front.

Is this true for real implementations? I would have thought they would
typically use an exponential allocation strategy to give what is often
called "amortised O(n)" growth.
That is certainly what I found in a quick experiment using clisp:
(defun build-vec (size)
(let ((a (make-array 0 :adjustable t :fill-pointer t)))
(dotimes (i size (length a)) (vector-push-extend i a))))
shows linear growth in execution time.

Every time you do a rebuild initiated by a push-extend you end up copying
all the current items in the vector;

Why? Having to copy everything, even in a tight reallocation loop like
the one I used, will be a problem.

and each time you do the copy, more
items are copied than for the last copy. This leads, finally, to O(n^2)
complexity. And as to storage, you must look at all the allocations you
have done and the overhead to either GC or tack them into your
heap. Usually, you only notice this degradation if you push a whole lot of >> items - recall, O(.) is really an asymptotic measure. Lots of tricks will
hide the problem for a while but one usually gets bit in the end.

When you ran build-vec, did you let size be a few billion? What
happened to run time?

No, I could only go up to (expt 2 23) before clisp seg faulted (surely a bug?). That's only about 8 million items, but 134 million bytes of
storage. The time taken was almost perfectly linear.

My guess is that most storage heap management schemes take about
the same amount of time and storage to allocate a few bytes and a few
thousand bytes. The growth effect, if it is real in the implementation you >> are using for testing, must build vectors many many times longer than the
internal allocation chunk size to see the quadratic effect.

Can you post an example? I don't know how to get either clisp or SBCL
to go much larger. They both have options which would seem to control
this but I can't seem to get them to cooperate! Any help would be much appreciated.

At the moment I have no system to build executable code. About a year
ago, I contacted Franz Lisp to see what they wanted for a non commercial license - what they used to lease to me years ago as some sort of
academic thing. Unfortunately, the price was more than I wanted to pay
to putter around so I passed. At present, I'm working from the back of
an envelope:

I don't know if it will help but I'll tell you a few of the assumptions
I made leading to what I wrote above.

1. Most variants of push-extend want to maintain the use of indexing to
find the nth entry. That basically means that you demand that all memory
used be contiguous (in the address space). There are ways around this by
using heap (as in heap sort) algorithms but retrieval times go from O(1)
to O(log size).

2. Many variants of Malloc/Free set a minimum size, under the table, for
the blocks they actually maintain: a size like 256 bytes or the same
size as the virtual address mechanism handles, e.g., 4k bytes. If this
is the case, you need REALLY big test cases that get to the point where
you go quadratic in these size chunks.

3. If you have a good handle on the structure (its ultimate size, growth characteristics, and reference patterns) you can develop or find
algorithms that are better than I'm estimating above. However and this
is were the gotchas come in, using these specialized approaches in
something like a general purpose Lisp push-extend primitive is all
wrong: the overhead will eat little applications alive and cause one to
wonder what's wrong with the system. Microseconds will turn into
milliseconds, etc. Of course one could take the same approach taken by
most Lisp hash packages: implement hash as an object type that can not
only grow but morph its type as the table gets larger. For example, a
hash table is originally implemented (under the table) as an assoc list;
when the number of items crosses a threshold another set of methods
provide hash functionality. And so on.
--
Jeff Barnett

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Ben Bacarisse@21:1/5 to Jeff Barnett on Tue Feb 13 14:13:49 2024

Jeff Barnett <jbb@notatt.com> writes:

On 2/12/2024 1:53 PM, Ben Bacarisse wrote:

Jeff Barnett <jbb@notatt.com> writes:

On 2/11/2024 3:38 AM, Ben Bacarisse wrote:

Jeff Barnett <jbb@notatt.com> writes:

On 2/11/2024 12:00 AM, Alan Bawden wrote:

Julieta Shem <jshem@yaxenu.org> writes:
Vectors are not convenient when we want to build something one >>>>>> element
at a time. I think coercing a list into a vector is O(n), but that
seems inevitable because to make a vector we need to know the size and
we don't know the size up front.
This is why we have arrays with fill pointers. You should read up on >>>>>> how arrays (and vectors) really work, with an eye on eventually being >>>>>> able to use VECTOR-PUSH-EXTEND on a array that's specialized to contain >>>>>> just 8-bit bytes.

N.B. The use of vector-push-extend, where extend happens many (for many >>>>> definitions of "many") times, will consume O(n^2) time AND probably memory
too. There is no free (or even cheap) lunch when you don't have a good >>>>> estimate of final total size up front.

Is this true for real implementations? I would have thought they would >>>> typically use an exponential allocation strategy to give what is often >>>> called "amortised O(n)" growth.
That is certainly what I found in a quick experiment using clisp:
(defun build-vec (size)
(let ((a (make-array 0 :adjustable t :fill-pointer t)))
(dotimes (i size (length a)) (vector-push-extend i a))))
shows linear growth in execution time.

Every time you do a rebuild initiated by a push-extend you end up copying >>> all the current items in the vector;

Why? Having to copy everything, even in a tight reallocation loop like
the one I used, will be a problem.

and each time you do the copy, more
items are copied than for the last copy. This leads, finally, to O(n^2)
complexity. And as to storage, you must look at all the allocations you
have done and the overhead to either GC or tack them into your
heap. Usually, you only notice this degradation if you push a whole lot of >>> items - recall, O(.) is really an asymptotic measure. Lots of tricks will >>> hide the problem for a while but one usually gets bit in the end.

When you ran build-vec, did you let size be a few billion? What
happened to run time?

No, I could only go up to (expt 2 23) before clisp seg faulted (surely a
bug?). That's only about 8 million items, but 134 million bytes of
storage. The time taken was almost perfectly linear.

My guess is that most storage heap management schemes take about
the same amount of time and storage to allocate a few bytes and a few
thousand bytes. The growth effect, if it is real in the implementation you >>> are using for testing, must build vectors many many times longer than the >>> internal allocation chunk size to see the quadratic effect.

Can you post an example? I don't know how to get either clisp or SBCL
to go much larger. They both have options which would seem to control
this but I can't seem to get them to cooperate! Any help would be much
appreciated.

...

I don't know if it will help but I'll tell you a few of the assumptions I made leading to what I wrote above.

Thanks for what you wrote, but I was hoping for some help to extend my
tests so I could see the effects you think I should be seeing but which I
was not. I saw linear growth up to the point where the implementations
crapped out and I could not see how to extend them.

By the way, I don't think one can assume that in-place reallocation with
an exponential growth strategy in always going to be in place. I was
just suggesting that's it's likely to be what is used in some
implementations.

--
Ben.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lawrence D'Oliveiro@21:1/5 to Spiros Bousbouras on Tue Feb 13 22:14:16 2024

On Fri, 9 Feb 2024 21:13:18 -0000 (UTC), Spiros Bousbouras wrote:

(declaim (ftype (function (vector vector list
&key (:limit (or null integer)) (:so-far
integer))
list)
split-vector))

Not a fan of “parenthesis-pileup” layout. This helps make it clearer
to me:

(declaim
(ftype
(function
(vector vector list &key (:limit (or null integer)) (:so-far integer))
list
)
split-vector
)
)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jeff Barnett@21:1/5 to Ben Bacarisse on Tue Feb 13 14:56:10 2024

On 2/13/2024 7:13 AM, Ben Bacarisse wrote:

Jeff Barnett <jbb@notatt.com> writes:

On 2/12/2024 1:53 PM, Ben Bacarisse wrote:

Jeff Barnett <jbb@notatt.com> writes:

On 2/11/2024 3:38 AM, Ben Bacarisse wrote:

Jeff Barnett <jbb@notatt.com> writes:

On 2/11/2024 12:00 AM, Alan Bawden wrote:

Julieta Shem <jshem@yaxenu.org> writes:
Vectors are not convenient when we want to build something one >>>>>>> element
at a time. I think coercing a list into a vector is O(n), but that
seems inevitable because to make a vector we need to know the size and
we don't know the size up front.
This is why we have arrays with fill pointers. You should read up on >>>>>>> how arrays (and vectors) really work, with an eye on eventually being >>>>>>> able to use VECTOR-PUSH-EXTEND on a array that's specialized to contain >>>>>>> just 8-bit bytes.

N.B. The use of vector-push-extend, where extend happens many (for many >>>>>> definitions of "many") times, will consume O(n^2) time AND probably memory
too. There is no free (or even cheap) lunch when you don't have a good >>>>>> estimate of final total size up front.

Is this true for real implementations? I would have thought they would >>>>> typically use an exponential allocation strategy to give what is often >>>>> called "amortised O(n)" growth.
That is certainly what I found in a quick experiment using clisp:
(defun build-vec (size)
(let ((a (make-array 0 :adjustable t :fill-pointer t)))
(dotimes (i size (length a)) (vector-push-extend i a))))
shows linear growth in execution time.

Every time you do a rebuild initiated by a push-extend you end up copying >>>> all the current items in the vector;

Why? Having to copy everything, even in a tight reallocation loop like
the one I used, will be a problem.

and each time you do the copy, more
items are copied than for the last copy. This leads, finally, to O(n^2) >>>> complexity. And as to storage, you must look at all the allocations you >>>> have done and the overhead to either GC or tack them into your
heap. Usually, you only notice this degradation if you push a whole lot of >>>> items - recall, O(.) is really an asymptotic measure. Lots of tricks will >>>> hide the problem for a while but one usually gets bit in the end.

When you ran build-vec, did you let size be a few billion? What
happened to run time?

No, I could only go up to (expt 2 23) before clisp seg faulted (surely a >>> bug?). That's only about 8 million items, but 134 million bytes of
storage. The time taken was almost perfectly linear.

My guess is that most storage heap management schemes take about
the same amount of time and storage to allocate a few bytes and a few
thousand bytes. The growth effect, if it is real in the implementation you >>>> are using for testing, must build vectors many many times longer than the >>>> internal allocation chunk size to see the quadratic effect.

Can you post an example? I don't know how to get either clisp or SBCL
to go much larger. They both have options which would seem to control
this but I can't seem to get them to cooperate! Any help would be much
appreciated.

...

I don't know if it will help but I'll tell you a few of the assumptions I
made leading to what I wrote above.

Thanks for what you wrote, but I was hoping for some help to extend my
tests so I could see the effects you think I should be seeing but which I
was not. I saw linear growth up to the point where the implementations crapped out and I could not see how to extend them.

By the way, I don't think one can assume that in-place reallocation with
an exponential growth strategy in always going to be in place. I was
just suggesting that's it's likely to be what is used in some implementations.

I wasn't assuming an in-place reallocation. My thoughts were more along
the following lines:

An implementation of a vector with extend potential must still properly
allow all vector methods (as in object methods) to meet their original contract. So, for example, the vector must present at least the illusion
of elements being consecutive in the address space. In Lisp it also
means that one can (after doing appropriate index validation) access the
vector by computing its address in most implementations. One can do
better by ignoring such "requirements" but then the implementation will
display some kinks. It's just easier to assign new contiguous address
space to the vector, copy the old contents, then put a magic forwarded
pointer in the original header, and leave the rest of the original so
the GC can have at.

I'm not sure if I understood what you were trying to get at. If I
flubbed it, please try again.

PS I hope all goes well with you. It seems we've both been lurking in
murky parts of USENET.
--
Jeff Barnett

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Ben Bacarisse@21:1/5 to Jeff Barnett on Wed Feb 14 20:45:00 2024

Jeff Barnett <jbb@notatt.com> writes:

On 2/13/2024 7:13 AM, Ben Bacarisse wrote:

Jeff Barnett <jbb@notatt.com> writes:

On 2/12/2024 1:53 PM, Ben Bacarisse wrote:

Jeff Barnett <jbb@notatt.com> writes:

On 2/11/2024 3:38 AM, Ben Bacarisse wrote:

Jeff Barnett <jbb@notatt.com> writes:

On 2/11/2024 12:00 AM, Alan Bawden wrote:

Julieta Shem <jshem@yaxenu.org> writes:
Vectors are not convenient when we want to build something one >>>>>>>> element
at a time. I think coercing a list into a vector is O(n), but that
seems inevitable because to make a vector we need to know the size and
we don't know the size up front.
This is why we have arrays with fill pointers. You should read up on >>>>>>>> how arrays (and vectors) really work, with an eye on eventually being >>>>>>>> able to use VECTOR-PUSH-EXTEND on a array that's specialized to contain
just 8-bit bytes.

N.B. The use of vector-push-extend, where extend happens many (for many >>>>>>> definitions of "many") times, will consume O(n^2) time AND probably memory
too. There is no free (or even cheap) lunch when you don't have a good >>>>>>> estimate of final total size up front.

Is this true for real implementations? I would have thought they would >>>>>> typically use an exponential allocation strategy to give what is often >>>>>> called "amortised O(n)" growth.
That is certainly what I found in a quick experiment using clisp:
(defun build-vec (size)
(let ((a (make-array 0 :adjustable t :fill-pointer t)))
(dotimes (i size (length a)) (vector-push-extend i a))))
shows linear growth in execution time.

Every time you do a rebuild initiated by a push-extend you end up copying >>>>> all the current items in the vector;

Why? Having to copy everything, even in a tight reallocation loop like >>>> the one I used, will be a problem.

and each time you do the copy, more
items are copied than for the last copy. This leads, finally, to O(n^2) >>>>> complexity. And as to storage, you must look at all the allocations you >>>>> have done and the overhead to either GC or tack them into your
heap. Usually, you only notice this degradation if you push a whole lot of
items - recall, O(.) is really an asymptotic measure. Lots of tricks will >>>>> hide the problem for a while but one usually gets bit in the end.

When you ran build-vec, did you let size be a few billion? What
happened to run time?

No, I could only go up to (expt 2 23) before clisp seg faulted (surely a >>>> bug?). That's only about 8 million items, but 134 million bytes of
storage. The time taken was almost perfectly linear.

My guess is that most storage heap management schemes take about
the same amount of time and storage to allocate a few bytes and a few >>>>> thousand bytes. The growth effect, if it is real in the implementation you
are using for testing, must build vectors many many times longer than the >>>>> internal allocation chunk size to see the quadratic effect.

Can you post an example? I don't know how to get either clisp or SBCL >>>> to go much larger. They both have options which would seem to control >>>> this but I can't seem to get them to cooperate! Any help would be much >>>> appreciated.

...

I don't know if it will help but I'll tell you a few of the assumptions I >>> made leading to what I wrote above.

Thanks for what you wrote, but I was hoping for some help to extend my
tests so I could see the effects you think I should be seeing but which I
was not. I saw linear growth up to the point where the implementations
crapped out and I could not see how to extend them.
By the way, I don't think one can assume that in-place reallocation with
an exponential growth strategy in always going to be in place. I was
just suggesting that's it's likely to be what is used in some
implementations.

I wasn't assuming an in-place reallocation.

Yes, I know. I was just remarking that my own observations that just
because as least two implementation appear to offer amortised O(n) cost,
that result can't be relied upon as the language does not stipulate any underlying strategy.

I'm not sure if I understood what you were trying to get at. If I flubbed
it, please try again.

I was simply reporting the result of a measurement up to the maximum
size I could manage.

--
Ben.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	388
Nodes:	16 (2 / 14)
Uptime:	09:51:39
Calls:	8,221
Files:	13,122
Messages:	5,872,631

why is this procedure is so slow?

Who's Online

System Info