First I define x to be a list of bytes of an NNTP article.
(defvar x (fetch-article "local.test" "28"))
Now x is a list of integers, the bytes of the article 28, which is a
MIME message containing a PDF of 610 KiB.
Now I want to split it line by line. I am able to do it, but it's very
slow.
* (time (length (split-sequence (list 13 10) x nil)))
Evaluation took:
65.710 seconds of real time
Julieta Shem <jshem@yaxenu.org> writes:
First I define x to be a list of bytes of an NNTP article.
(defvar x (fetch-article "local.test" "28"))
Now x is a list of integers, the bytes of the article 28, which is a
MIME message containing a PDF of 610 KiB.
Now I want to split it line by line. I am able to do it, but it's very
slow.
* (time (length (split-sequence (list 13 10) x nil)))
Evaluation took:
65.710 seconds of real time
65.671875 seconds of total run time (47.093750 user, 18.578125 system)
[ Run times consist of 23.968 seconds GC time, and 41.704 seconds non-GC time. ]
99.94% CPU
170,322,749,168 processor cycles
79,439,358,864 bytes consed
11585
*
Less than 12k lines. The number 79,439,358,864 of bytes is way more
than I would have expected for anything regarding this invocation above,
but I have no idea what this number really measures here. I appreciate
any help. Thank you.
Here's the procedure:
(defun split-sequence (delim ls acc &key limit (so-far 1))
(let* ((len (length ls))
(delim delim)
(pos (search delim ls))
(n-take (or pos len))
(n-drop (if pos
(+ n-take (length delim))
n-take)))
(cond ((zerop len) acc)
((and limit (= so-far limit)) (list ls))
(t (split-sequence
delim (drop n-drop ls)
(cons (take n-take ls) acc)
:limit limit
:so-far (1+ so-far))))))
(defun take (n seq) (subseq seq 0 n))
(defun drop (n seq) (subseq seq n))
It is slow because
* You're repeatedly calling #'length on whatever is left of the
list. Computing the length of a list is an O(n) operation; repeatedly
calling #'length on the remainder of the list is then O(n^2).
* You are generating new lists for both the initial segments and the
rest of the list. In order to generate 12k lines you have to generate
lists containing the lines, but also 12 lists containing the remainder
(at various points) of the input.
* Each character of the input requires a cons cell, which is probably 16
bytes (assuming a 64-bit Lisp implementation). So, your input now
requires 610000*16, so about 9.76MB.
* The remainder lists will consume about 0.5*610000*16*12000, so about
55GB. This is not too far from the 74GB that time reports. (My
calculations are probably off, but clearly not _too_ far off.)
* I'm going to assume that 74GB is going to be a significant chunk of
the available memory in your computer. That means that there will be
significant garbage-collection activity, and you may also end up
swapping to disk.
If you used strings (or byte arrays instead), you would
* Require much less memory.
* Computing #'length for a string or array is O(1), so the overall complexity
becomes O(n).
* You don't have to create new arrays containing the successive
remainders; instead, you can use the :start keyword to #'search.
* Also, if you used strings, you could just create a string stream from
the input string and repeatedly call #'read-line to collect the
lines. The entire program would be something like this (untested):
(defun split-article (article-as-string)
(with-input-from-string (str article-as-string)
(loop for line = (read-line str nil)
while line
collect line)))
On Fri, 09 Feb 2024 10:24:05 -0300
Julieta Shem <jshem@yaxenu.org> wrote:
I don't know if you have learned declaration syntax yet but some declarations would help make the code clearer and help the compiler do optimisations or catch errors. For example
(declaim (ftype (function (vector vector list
&key (:limit (or null integer)) (:so-far integer))
list)
split-vector))
(defun split-vector (delim v acc &key limit (so-far 1))^^^^^^^^^^
(let ((len (length v)))
(split-vector-helper delim v len acc limit so-far 0)))
Now, find the position of /delim/ between /start/ and /len/. If you
can't find it, we're done. If we've reached the limit of lines the
caller requested, we're done. Now we found /delim/: copy the slice of
the vector that goes from /start/ to the /pos/ition we found /delim/;
save it in the accumulator. Update /start/ we along over the vector.
Now repeat everything.
(defun split-vector-helper (delim v len acc limit so-far start)
(if (zerop len)
acc
(let ((pos (search delim v :start2 start :end2 len)))
(cond ((or (not pos) (and limit (= so-far limit)))
(nreverse (cons (subseq v start len) acc)))
(t (split-vector-helper delim
v
len
(cons (subseq v start (or pos len)) acc)
If this branch is taken , you already know that pos is non NIL .
limit
(1+ so-far)
(+ pos (length delim))))))))
Question. When split-vector-helper is repeated, will the arguments be
copied over to the new invokation, or will SBCL somehow pass on
pointers?
This has been answered in a different post but you can also write the
helper function within the body of split-vector in which case the
helper will inherit the surrounding lexical environment so you won't
need to pass so many arguments to it.
I haven't studied about the /loop/ macro yet.
Some people abhor loop, while others find it strangely attractive (or
even addictive).
For turning Julieta's recursion into a loop , DO will do just fine.
If it also wants to do spam filtering
then yes , it will have to decode the body.
Julieta Shem <jshem@yaxenu.org> writes:
I'd like to thank everyone in this thread. Valuable help. Thank you.
Raymond Wiker <rwiker@gmail.com> writes:
Julieta Shem <jshem@yaxenu.org> writes:
First I define x to be a list of bytes of an NNTP article.
(defvar x (fetch-article "local.test" "28"))
Now x is a list of integers, the bytes of the article 28, which is a
MIME message containing a PDF of 610 KiB.
Now I want to split it line by line. I am able to do it, but it's very >>>> slow.
* (time (length (split-sequence (list 13 10) x nil)))
Evaluation took:
65.710 seconds of real time
65.671875 seconds of total run time (47.093750 user, 18.578125 system) >>>> [ Run times consist of 23.968 seconds GC time, and 41.704 seconds non-GC time. ]
99.94% CPU
170,322,749,168 processor cycles
79,439,358,864 bytes consed
11585
*
Less than 12k lines. The number 79,439,358,864 of bytes is way more
than I would have expected for anything regarding this invocation above, >>>> but I have no idea what this number really measures here. I appreciate >>>> any help. Thank you.
Here's the procedure:
(defun split-sequence (delim ls acc &key limit (so-far 1))
(let* ((len (length ls))
(delim delim)
(pos (search delim ls))
(n-take (or pos len))
(n-drop (if pos
(+ n-take (length delim))
n-take)))
(cond ((zerop len) acc)
((and limit (= so-far limit)) (list ls))
(t (split-sequence
delim (drop n-drop ls)
(cons (take n-take ls) acc)
:limit limit
:so-far (1+ so-far))))))
(defun take (n seq) (subseq seq 0 n))
(defun drop (n seq) (subseq seq n))
It is slow because
* You're repeatedly calling #'length on whatever is left of the
list. Computing the length of a list is an O(n) operation; repeatedly
calling #'length on the remainder of the list is then O(n^2).
* You are generating new lists for both the initial segments and the
rest of the list. In order to generate 12k lines you have to generate
lists containing the lines, but also 12 lists containing the remainder >>> (at various points) of the input.
That's the take and drop, I think.
Yup.
* Each character of the input requires a cons cell, which is probably 16 >>> bytes (assuming a 64-bit Lisp implementation). So, your input now
requires 610000*16, so about 9.76MB.
So that's one of the advantages of using a vector, I think.
Right. The other is that accessing any element in an array by position (index) is O(1); for a list, it's O(n).
* The remainder lists will consume about 0.5*610000*16*12000, so about
55GB. This is not too far from the 74GB that time reports. (My
calculations are probably off, but clearly not _too_ far off.)
Impressive.
Maybe... but it also appears to be wrong :-)
* I'm going to assume that 74GB is going to be a significant chunk of
the available memory in your computer. That means that there will be
significant garbage-collection activity, and you may also end up
swapping to disk.
What do you mean by ``available memory;'''? I have 24 GiB of RAM---23.9
GiB usable, says Windows. I don't know how to show this right now, but
I think I don't have any swap space (on Windows). I tried to turn it
off some time ago I think I succeeded.
What implementation of Common Lisp are you using? Is it a 32-bit or
64-bit version?
If you used strings (or byte arrays instead), you would
* Require much less memory.
* Computing #'length for a string or array is O(1), so the overall complexity
becomes O(n).
Because strings and arrays store their size somewhere in their data
structure, I suppose, and lists do not.
Right. Lists are literally composed of objects that contain two values
(cons cells), where the second value is a pointer to the next cons cell.
* You don't have to create new arrays containing the successive
remainders; instead, you can use the :start keyword to #'search.
I tried this. Here's how I rewrote it. First some user interface that
gets the length just once, even though now it's probably okay to call
length multiple times. (The /acc/umulator should be probably be
optional here, but I was afraid of mixing &optional with &key. This is
my first Common Lisp program.)
(defun split-vector (delim v acc &key limit (so-far 1))
(let ((len (length v)))
(split-vector-helper delim v len acc limit so-far 0)))
Now, find the position of /delim/ between /start/ and /len/. If you
can't find it, we're done. If we've reached the limit of lines the
caller requested, we're done. Now we found /delim/: copy the slice of
the vector that goes from /start/ to the /pos/ition we found /delim/;
save it in the accumulator. Update /start/ we along over the vector.
Now repeat everything.
(defun split-vector-helper (delim v len acc limit so-far start)
(if (zerop len)
acc
(let ((pos (search delim v :start2 start :end2 len)))
(cond ((or (not pos) (and limit (= so-far limit)))
(nreverse (cons (subseq v start len) acc)))
(t (split-vector-helper delim
v
len
(cons (subseq v start (or pos len)) acc)
limit
(1+ so-far)
(+ pos (length delim))))))))
The result is better:
* (time (length (split-vector (vector 13 10) x nil)))
Evaluation took:
0.011 seconds of real time
0.000000 seconds of total run time (0.000000 user, 0.000000 system)
0.00% CPU
30,857,378 processor cycles
7,000,064 bytes consed
11576
(Difficult to understand 0.011 real time atgainst ``total run time''.)
That's a good second implementation, and a very nice improvement in run
time and memory use!
Note that you do not actually need the "len" parameter: if you don't
specify :end2 for #'search or the end position for #'subseq, they will
assume the value nil, which means the end of the sequence.
I haven't studied about the /loop/ macro yet.
Some people abhor loop, while others find it strangely attractive (or
even addictive).
* Also, if you used strings, you could just create a string stream from
the input string and repeatedly call #'read-line to collect the
lines. The entire program would be something like this (untested):
(defun split-article (article-as-string)
(with-input-from-string (str article-as-string)
(loop for line = (read-line str nil)
while line
collect line)))
As Spiros Bousbouras remarked in the thread, I'd prefer not to try to
understand article content because the user has the right to use
whatever encoding they please in NNTP articles. So I've been trying
handle them as byte sequences.
At some point you will probably have to encode the article as
characters. In order to do so, you will have to parse the article
headers to find what encoding is used. I'm not convinced that this is possible, but if you have a 16-bit character encoding, you may have an
octet 13 from one character, and an octet 10 from the following
character.
On Sat, 10 Feb 2024 11:43:44 -0300
Julieta Shem <jshem@yaxenu.org> wrote:
Spiros Bousbouras <spibou@gmail.com> writes:
[...]
I haven't studied about the /loop/ macro yet.
Some people abhor loop, while others find it strangely attractive (or
even addictive).
For turning Julieta's recursion into a loop , DO will do just fine.
Good to know. I haven't studied DO and DO* yet either. Lol. How am I
writing code over here? Perhaps you'd all be horrified. It's hard to
learn everything. It seems better to write with the little we have and
then rewrite it as the years go by. I did learn to use DOLIST.
It will only take you 10 minutes tops to learn to use DO and it will make your life easier.
It must be hard trying to write everything as recursion. Just be
careful that the CLHS says do iterates over a group of statements
while a test condition holds.
which is the opposite than the correct behaviour which is that the iteration terminates when the test condition becomes true. That's an embarrassing mistake in the specification. All the implementations implement the correct behaviour.
[...]
If it also wants to do spam filtering
then yes , it will have to decode the body.
Spam should be the problem of a spam filter---not my problem. :)
That depends on whether you are planning to use your server "properly"
i.e. put it online and peer with other servers , etc. People won't peer
with you unless you have antispam measures in place. So perhaps spam filtering won't be implemented within the server but you probably should
give some thought on how well your server would integrate with preexisting spam filters. I've seen peering policies specifying that in order to peer with a server , it has to use a piece of software called cleanfeed .
Julieta Shem <jshem@yaxenu.org> writes:
Vectors are not convenient when we want to build something one element
at a time. I think coercing a list into a vector is O(n), but that
seems inevitable because to make a vector we need to know the size and
we don't know the size up front.
This is why we have arrays with fill pointers. You should read up on
how arrays (and vectors) really work, with an eye on eventually being
able to use VECTOR-PUSH-EXTEND on a array that's specialized to contain
just 8-bit bytes.
On 2/11/2024 12:00 AM, Alan Bawden wrote:
Julieta Shem <jshem@yaxenu.org> writes:
Vectors are not convenient when we want to build something one
element
at a time. I think coercing a list into a vector is O(n), but that
seems inevitable because to make a vector we need to know the size and >> we don't know the size up front.
This is why we have arrays with fill pointers. You should read up on
how arrays (and vectors) really work, with an eye on eventually being
able to use VECTOR-PUSH-EXTEND on a array that's specialized to contain
just 8-bit bytes.
N.B. The use of vector-push-extend, where extend happens many (for many definitions of "many") times, will consume O(n^2) time AND probably memory too. There is no free (or even cheap) lunch when you don't have a good estimate of final total size up front.
I think the best you can do is some sort of balanced tree, e.g., a heap,
that will consume total build time of O(n*log n) and just O(n) space. The average/median retrieval time of a single element by index: O(1) for
vector; O(log n) for heap; O(n) for linked list.
On Sun, 11 Feb 2024 10:38:45 +0000
Ben Bacarisse <ben.usenet@bsb.me.uk> wrote:
Jeff Barnett <jbb@notatt.com> writes:
N.B. The use of vector-push-extend, where extend happens many (for many
definitions of "many") times, will consume O(n^2) time AND probably memory >> > too. There is no free (or even cheap) lunch when you don't have a good
estimate of final total size up front.
Is this true for real implementations? I would have thought they would
typically use an exponential allocation strategy to give what is often
called "amortised O(n)" growth.
That is certainly what I found in a quick experiment using clisp:
(defun build-vec (size)
(let ((a (make-array 0 :adjustable t :fill-pointer t)))
(dotimes (i size (length a)) (vector-push-extend i a))))
shows linear growth in execution time.
What did you measure ?
Jeff Barnett <jbb@notatt.com> writes:
On 2/11/2024 12:00 AM, Alan Bawden wrote:
Julieta Shem <jshem@yaxenu.org> writes:
Vectors are not convenient when we want to build something one
element
at a time. I think coercing a list into a vector is O(n), but that >>> seems inevitable because to make a vector we need to know the size and >>> we don't know the size up front.
This is why we have arrays with fill pointers. You should read up on
how arrays (and vectors) really work, with an eye on eventually being
able to use VECTOR-PUSH-EXTEND on a array that's specialized to contain
just 8-bit bytes.
N.B. The use of vector-push-extend, where extend happens many (for many
definitions of "many") times, will consume O(n^2) time AND probably memory >> too. There is no free (or even cheap) lunch when you don't have a good
estimate of final total size up front.
Is this true for real implementations? I would have thought they would typically use an exponential allocation strategy to give what is often
called "amortised O(n)" growth.
That is certainly what I found in a quick experiment using clisp:
(defun build-vec (size)
(let ((a (make-array 0 :adjustable t :fill-pointer t)))
(dotimes (i size (length a)) (vector-push-extend i a))))
shows linear growth in execution time.
--I think the best you can do is some sort of balanced tree, e.g., a heap,
that will consume total build time of O(n*log n) and just O(n) space. The
average/median retrieval time of a single element by index: O(1) for
vector; O(log n) for heap; O(n) for linked list.
On 2/11/2024 3:38 AM, Ben Bacarisse wrote:
Jeff Barnett <jbb@notatt.com> writes:
On 2/11/2024 12:00 AM, Alan Bawden wrote:Is this true for real implementations? I would have thought they would
Julieta Shem <jshem@yaxenu.org> writes:
Vectors are not convenient when we want to build something one
element
at a time. I think coercing a list into a vector is O(n), but that >>>> seems inevitable because to make a vector we need to know the size and
we don't know the size up front.
This is why we have arrays with fill pointers. You should read up on
how arrays (and vectors) really work, with an eye on eventually being
able to use VECTOR-PUSH-EXTEND on a array that's specialized to contain >>>> just 8-bit bytes.
N.B. The use of vector-push-extend, where extend happens many (for many
definitions of "many") times, will consume O(n^2) time AND probably memory >>> too. There is no free (or even cheap) lunch when you don't have a good
estimate of final total size up front.
typically use an exponential allocation strategy to give what is often
called "amortised O(n)" growth.
That is certainly what I found in a quick experiment using clisp:
(defun build-vec (size)
(let ((a (make-array 0 :adjustable t :fill-pointer t)))
(dotimes (i size (length a)) (vector-push-extend i a))))
shows linear growth in execution time.
Every time you do a rebuild initiated by a push-extend you end up copying
all the current items in the vector;
and each time you do the copy, more
items are copied than for the last copy. This leads, finally, to O(n^2) complexity. And as to storage, you must look at all the allocations you
have done and the overhead to either GC or tack them into your
heap. Usually, you only notice this degradation if you push a whole lot of items - recall, O(.) is really an asymptotic measure. Lots of tricks will hide the problem for a while but one usually gets bit in the end.
When you ran build-vec, did you let size be a few billion? What
happened to run time?
My guess is that most storage heap management schemes take about
the same amount of time and storage to allocate a few bytes and a few thousand bytes. The growth effect, if it is real in the implementation you are using for testing, must build vectors many many times longer than the internal allocation chunk size to see the quadratic effect.
Jeff Barnett <jbb@notatt.com> writes:
On 2/11/2024 3:38 AM, Ben Bacarisse wrote:
Jeff Barnett <jbb@notatt.com> writes:
On 2/11/2024 12:00 AM, Alan Bawden wrote:Is this true for real implementations? I would have thought they would
Julieta Shem <jshem@yaxenu.org> writes:
Vectors are not convenient when we want to build something one >>>>> element
at a time. I think coercing a list into a vector is O(n), but that >>>>> seems inevitable because to make a vector we need to know the size and
we don't know the size up front.
This is why we have arrays with fill pointers. You should read up on >>>>> how arrays (and vectors) really work, with an eye on eventually being >>>>> able to use VECTOR-PUSH-EXTEND on a array that's specialized to contain >>>>> just 8-bit bytes.
N.B. The use of vector-push-extend, where extend happens many (for many >>>> definitions of "many") times, will consume O(n^2) time AND probably memory >>>> too. There is no free (or even cheap) lunch when you don't have a good >>>> estimate of final total size up front.
typically use an exponential allocation strategy to give what is often
called "amortised O(n)" growth.
That is certainly what I found in a quick experiment using clisp:
(defun build-vec (size)
(let ((a (make-array 0 :adjustable t :fill-pointer t)))
(dotimes (i size (length a)) (vector-push-extend i a))))
shows linear growth in execution time.
Every time you do a rebuild initiated by a push-extend you end up copying
all the current items in the vector;
Why? Having to copy everything, even in a tight reallocation loop like
the one I used, will be a problem.
and each time you do the copy, more
items are copied than for the last copy. This leads, finally, to O(n^2)
complexity. And as to storage, you must look at all the allocations you
have done and the overhead to either GC or tack them into your
heap. Usually, you only notice this degradation if you push a whole lot of >> items - recall, O(.) is really an asymptotic measure. Lots of tricks will
hide the problem for a while but one usually gets bit in the end.
When you ran build-vec, did you let size be a few billion? What
happened to run time?
No, I could only go up to (expt 2 23) before clisp seg faulted (surely a bug?). That's only about 8 million items, but 134 million bytes of
storage. The time taken was almost perfectly linear.
My guess is that most storage heap management schemes take about
the same amount of time and storage to allocate a few bytes and a few
thousand bytes. The growth effect, if it is real in the implementation you >> are using for testing, must build vectors many many times longer than the
internal allocation chunk size to see the quadratic effect.
Can you post an example? I don't know how to get either clisp or SBCL
to go much larger. They both have options which would seem to control
this but I can't seem to get them to cooperate! Any help would be much appreciated.
On 2/12/2024 1:53 PM, Ben Bacarisse wrote:...
Jeff Barnett <jbb@notatt.com> writes:
On 2/11/2024 3:38 AM, Ben Bacarisse wrote:Why? Having to copy everything, even in a tight reallocation loop like
Jeff Barnett <jbb@notatt.com> writes:
On 2/11/2024 12:00 AM, Alan Bawden wrote:Is this true for real implementations? I would have thought they would >>>> typically use an exponential allocation strategy to give what is often >>>> called "amortised O(n)" growth.
Julieta Shem <jshem@yaxenu.org> writes:
Vectors are not convenient when we want to build something one >>>>>> element
at a time. I think coercing a list into a vector is O(n), but that
seems inevitable because to make a vector we need to know the size and
we don't know the size up front.
This is why we have arrays with fill pointers. You should read up on >>>>>> how arrays (and vectors) really work, with an eye on eventually being >>>>>> able to use VECTOR-PUSH-EXTEND on a array that's specialized to contain >>>>>> just 8-bit bytes.
N.B. The use of vector-push-extend, where extend happens many (for many >>>>> definitions of "many") times, will consume O(n^2) time AND probably memory
too. There is no free (or even cheap) lunch when you don't have a good >>>>> estimate of final total size up front.
That is certainly what I found in a quick experiment using clisp:
(defun build-vec (size)
(let ((a (make-array 0 :adjustable t :fill-pointer t)))
(dotimes (i size (length a)) (vector-push-extend i a))))
shows linear growth in execution time.
Every time you do a rebuild initiated by a push-extend you end up copying >>> all the current items in the vector;
the one I used, will be a problem.
and each time you do the copy, moreNo, I could only go up to (expt 2 23) before clisp seg faulted (surely a
items are copied than for the last copy. This leads, finally, to O(n^2)
complexity. And as to storage, you must look at all the allocations you
have done and the overhead to either GC or tack them into your
heap. Usually, you only notice this degradation if you push a whole lot of >>> items - recall, O(.) is really an asymptotic measure. Lots of tricks will >>> hide the problem for a while but one usually gets bit in the end.
When you ran build-vec, did you let size be a few billion? What
happened to run time?
bug?). That's only about 8 million items, but 134 million bytes of
storage. The time taken was almost perfectly linear.
My guess is that most storage heap management schemes take aboutCan you post an example? I don't know how to get either clisp or SBCL
the same amount of time and storage to allocate a few bytes and a few
thousand bytes. The growth effect, if it is real in the implementation you >>> are using for testing, must build vectors many many times longer than the >>> internal allocation chunk size to see the quadratic effect.
to go much larger. They both have options which would seem to control
this but I can't seem to get them to cooperate! Any help would be much
appreciated.
I don't know if it will help but I'll tell you a few of the assumptions I made leading to what I wrote above.
(declaim (ftype (function (vector vector list
&key (:limit (or null integer)) (:so-far
integer))
list)
split-vector))
Jeff Barnett <jbb@notatt.com> writes:
On 2/12/2024 1:53 PM, Ben Bacarisse wrote:...
Jeff Barnett <jbb@notatt.com> writes:
On 2/11/2024 3:38 AM, Ben Bacarisse wrote:Why? Having to copy everything, even in a tight reallocation loop like
Jeff Barnett <jbb@notatt.com> writes:
On 2/11/2024 12:00 AM, Alan Bawden wrote:Is this true for real implementations? I would have thought they would >>>>> typically use an exponential allocation strategy to give what is often >>>>> called "amortised O(n)" growth.
Julieta Shem <jshem@yaxenu.org> writes:
Vectors are not convenient when we want to build something one >>>>>>> element
at a time. I think coercing a list into a vector is O(n), but that
seems inevitable because to make a vector we need to know the size and
we don't know the size up front.
This is why we have arrays with fill pointers. You should read up on >>>>>>> how arrays (and vectors) really work, with an eye on eventually being >>>>>>> able to use VECTOR-PUSH-EXTEND on a array that's specialized to contain >>>>>>> just 8-bit bytes.
N.B. The use of vector-push-extend, where extend happens many (for many >>>>>> definitions of "many") times, will consume O(n^2) time AND probably memory
too. There is no free (or even cheap) lunch when you don't have a good >>>>>> estimate of final total size up front.
That is certainly what I found in a quick experiment using clisp:
(defun build-vec (size)
(let ((a (make-array 0 :adjustable t :fill-pointer t)))
(dotimes (i size (length a)) (vector-push-extend i a))))
shows linear growth in execution time.
Every time you do a rebuild initiated by a push-extend you end up copying >>>> all the current items in the vector;
the one I used, will be a problem.
and each time you do the copy, moreNo, I could only go up to (expt 2 23) before clisp seg faulted (surely a >>> bug?). That's only about 8 million items, but 134 million bytes of
items are copied than for the last copy. This leads, finally, to O(n^2) >>>> complexity. And as to storage, you must look at all the allocations you >>>> have done and the overhead to either GC or tack them into your
heap. Usually, you only notice this degradation if you push a whole lot of >>>> items - recall, O(.) is really an asymptotic measure. Lots of tricks will >>>> hide the problem for a while but one usually gets bit in the end.
When you ran build-vec, did you let size be a few billion? What
happened to run time?
storage. The time taken was almost perfectly linear.
My guess is that most storage heap management schemes take aboutCan you post an example? I don't know how to get either clisp or SBCL
the same amount of time and storage to allocate a few bytes and a few
thousand bytes. The growth effect, if it is real in the implementation you >>>> are using for testing, must build vectors many many times longer than the >>>> internal allocation chunk size to see the quadratic effect.
to go much larger. They both have options which would seem to control
this but I can't seem to get them to cooperate! Any help would be much
appreciated.
I don't know if it will help but I'll tell you a few of the assumptions I
made leading to what I wrote above.
Thanks for what you wrote, but I was hoping for some help to extend my
tests so I could see the effects you think I should be seeing but which I
was not. I saw linear growth up to the point where the implementations crapped out and I could not see how to extend them.
By the way, I don't think one can assume that in-place reallocation with
an exponential growth strategy in always going to be in place. I was
just suggesting that's it's likely to be what is used in some implementations.
On 2/13/2024 7:13 AM, Ben Bacarisse wrote:
Jeff Barnett <jbb@notatt.com> writes:
On 2/12/2024 1:53 PM, Ben Bacarisse wrote:...
Jeff Barnett <jbb@notatt.com> writes:
On 2/11/2024 3:38 AM, Ben Bacarisse wrote:Why? Having to copy everything, even in a tight reallocation loop like >>>> the one I used, will be a problem.
Jeff Barnett <jbb@notatt.com> writes:
On 2/11/2024 12:00 AM, Alan Bawden wrote:Is this true for real implementations? I would have thought they would >>>>>> typically use an exponential allocation strategy to give what is often >>>>>> called "amortised O(n)" growth.
Julieta Shem <jshem@yaxenu.org> writes:
Vectors are not convenient when we want to build something one >>>>>>>> element
at a time. I think coercing a list into a vector is O(n), but that
seems inevitable because to make a vector we need to know the size and
we don't know the size up front.
This is why we have arrays with fill pointers. You should read up on >>>>>>>> how arrays (and vectors) really work, with an eye on eventually being >>>>>>>> able to use VECTOR-PUSH-EXTEND on a array that's specialized to contain
just 8-bit bytes.
N.B. The use of vector-push-extend, where extend happens many (for many >>>>>>> definitions of "many") times, will consume O(n^2) time AND probably memory
too. There is no free (or even cheap) lunch when you don't have a good >>>>>>> estimate of final total size up front.
That is certainly what I found in a quick experiment using clisp:
(defun build-vec (size)
(let ((a (make-array 0 :adjustable t :fill-pointer t)))
(dotimes (i size (length a)) (vector-push-extend i a))))
shows linear growth in execution time.
Every time you do a rebuild initiated by a push-extend you end up copying >>>>> all the current items in the vector;
and each time you do the copy, moreNo, I could only go up to (expt 2 23) before clisp seg faulted (surely a >>>> bug?). That's only about 8 million items, but 134 million bytes of
items are copied than for the last copy. This leads, finally, to O(n^2) >>>>> complexity. And as to storage, you must look at all the allocations you >>>>> have done and the overhead to either GC or tack them into your
heap. Usually, you only notice this degradation if you push a whole lot of
items - recall, O(.) is really an asymptotic measure. Lots of tricks will >>>>> hide the problem for a while but one usually gets bit in the end.
When you ran build-vec, did you let size be a few billion? What
happened to run time?
storage. The time taken was almost perfectly linear.
My guess is that most storage heap management schemes take aboutCan you post an example? I don't know how to get either clisp or SBCL >>>> to go much larger. They both have options which would seem to control >>>> this but I can't seem to get them to cooperate! Any help would be much >>>> appreciated.
the same amount of time and storage to allocate a few bytes and a few >>>>> thousand bytes. The growth effect, if it is real in the implementation you
are using for testing, must build vectors many many times longer than the >>>>> internal allocation chunk size to see the quadratic effect.
I don't know if it will help but I'll tell you a few of the assumptions I >>> made leading to what I wrote above.Thanks for what you wrote, but I was hoping for some help to extend my
tests so I could see the effects you think I should be seeing but which I
was not. I saw linear growth up to the point where the implementations
crapped out and I could not see how to extend them.
By the way, I don't think one can assume that in-place reallocation with
an exponential growth strategy in always going to be in place. I was
just suggesting that's it's likely to be what is used in some
implementations.
I wasn't assuming an in-place reallocation.
I'm not sure if I understood what you were trying to get at. If I flubbed
it, please try again.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 388 |
Nodes: | 16 (2 / 14) |
Uptime: | 09:51:39 |
Calls: | 8,221 |
Files: | 13,122 |
Messages: | 5,872,631 |