• Re: Why to sort an array of numbers

    From none) (albert@21:1/5 to siarczek83@gmail.com on Thu Aug 18 13:07:52 2022
    In article <4bba2005-0bb4-46b4-9535-0dbff5d6348en@googlegroups.com>,
    MichaĆ Kasprzak <siarczek83@gmail.com> wrote:
    <SNIP>
    Mouse teaches us that we can append additional information to a numeric key to create a new larger number.
    We do this by shifting the key as many bits to the left as the tied information has. In the obtained place we insert the
    related information.
    This additional information could be a pointer to related data. This pointer may be an index instead of an address.

    Note that the locality property of quicksort goes down the drain with pointers. This is why my QSORT accept execution tokes for comparison and exchange words.


    The above solution has become more possible in today's 64-bit era. 32 bits used to have fewer opportunities to pack
    essential information into our numerical mini-records.
    And this idea has a great future ahead of it. Have you heard about AVX-512 and 512 bit ZMM0-31 registers?

    A more generic QSORT (see above) has no problem with 2 CELL records in a
    32 bits Forth. Of course packing using a 64 bits Forth may be slightly faster.

    Groetjes Albert.
    --
    "in our communism country Viet Nam, people are forced to be
    alive and in the western country like US, people are free to
    die from Covid 19 lol" duc ha
    albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Micha=C5=82_Kasprzak?=@21:1/5 to All on Thu Aug 18 03:39:16 2022
    I have to start a new thread because important things are happening that you don't notice. After understanding the importance of this topic, an avalanche of comments may be flooded, obscuring the thread with the ugly Heapsort algorithm.

    The following dates are important because they show that no one but Zombie saw the big idea and even Zombie needed a week.

    On August 7, 2022 Waldek Hebisch wrote: "Note 1: I pack one number and sum of cubes into a word. From then one can reconstruct the second number."

    On August 14, 2022 Anton Ertl commented on the above as: "Clever!"

    Professor Waldek Hebisch is like a clever little gray mouse trying to stay invisible that goes out at night to eat something, so I suggest a nickname for Waldek: Mouse ;)

    We are happy that thanks to Mouse we have the heapsort word which implements the Heap Sort Algorithm called Ugly which sorts an array of numbers.
    We tested the heapsort word (even Marcel did!) sorting out the example. The example was an array of numbers. An array of numbers is a very simple data structure.
    And now the question arises: what else can we sort with the heapsort word?
    Due to the difficulty of answering, this question quickly turns into another question: why to sort an array of numbers at all?

    It is not a question of why to sort at all. It is very easy to give many examples of when sorting is useful. Probably all the examples that come to your mind are record sorting, in which the sorted numbers appear as the record keys.
    That is, there are keys associated with additional information that must be rearranged simultaneously during sorting.
    If the data structures take up a lot of memory then it is better to leave them untouched and bind numeric keys and pointers to their addresses in memory.
    But such a data structure is no longer an array of numbers for which our heapsort word could apply.

    So do you know any reason why you should sort an array of numbers with no information tied to it?
    Maybe ideas like SQL Aggregate Functions come to your mind: Max, Min, Sum, Avg, Count. But you implement these functions faster without sorting. After further reflection, you may come to the conclusion that if you take the numbers away from the
    information related to them, their order will probably not matter to you either.
    I recall that in some of the IT Olympics tasks there was a need to sort an array of numbers, but I can't recall the details.
    Are these really the only training reasons for sorting an array of numbers, such as testing the written heapsort word or benchmarks?
    We had a taxi which our clever Mouse got into.

    We conclude that sorting numbers makes sense almost only when there is data associated with the numbers.
    Does that mean our heapsort word doesn't make sense because it won't have anything to sort? It would be a pity because this word sorts very quickly, because the data structure is simple and fits well with the CPU. When the data structures are more
    complicated, the sort speed drops and Pony knows something about it because he wrote his MERGE-SORT ;)

    Mouse teaches us that we can append additional information to a numeric key to create a new larger number.
    We do this by shifting the key as many bits to the left as the tied information has. In the obtained place we insert the related information.
    This additional information could be a pointer to related data. This pointer may be an index instead of an address.
    We can now use our very quick heapsort word.

    The above solution has become more possible in today's 64-bit era. 32 bits used to have fewer opportunities to pack essential information into our numerical mini-records.
    And this idea has a great future ahead of it. Have you heard about AVX-512 and 512 bit ZMM0-31 registers?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul Rubin@21:1/5 to siarczek83@gmail.com on Thu Aug 18 11:00:05 2022
    MichaƂ Kasprzak <siarczek83@gmail.com> writes:
    Due to the difficulty of answering, this question quickly turns into
    another question: why to sort an array of numbers at all?

    One obvious reason is to check the array for duplicates. The C++ code
    that I posted for the taxicab problem did that. Another might be to
    find the median. Sorting is a convenient though not optimal way to do
    that. Or, you might want to know what the largest or smallest N numbers
    are. Again, sorting is a straightforward method.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marcel Hendrix@21:1/5 to Paul Rubin on Thu Aug 18 12:36:59 2022
    On Thursday, August 18, 2022 at 8:00:26 PM UTC+2, Paul Rubin wrote:
    MichaƂ Kasprzak <siarc...@gmail.com> writes:
    Due to the difficulty of answering, this question quickly turns into another question: why to sort an array of numbers at all?
    One obvious reason is to check the array for duplicates. The C++ code
    that I posted for the taxicab problem did that. Another might be to
    find the median. Sorting is a convenient though not optimal way to do
    that. Or, you might want to know what the largest or smallest N numbers
    are. Again, sorting is a straightforward method.

    Finding duplicates and largest/smaller need only a single pass through
    all elements. The median can be done in O(n) in a few lines?

    I found 1863 mentions of SORT in iForth, but their only use seems
    to be benchmarks. What may occur more frequently is inserting
    elements in an ordered list, but I don't know how to search for that :-)

    -marcel

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul Rubin@21:1/5 to Marcel Hendrix on Thu Aug 18 21:10:34 2022
    Marcel Hendrix <mhx@iae.nl> writes:
    Finding duplicates and largest/smaller need only a single pass through
    all elements.

    Finding duplicates in a single pass needs some kind of lookup table,
    right? So there is more storage overhead than just sorting in place.

    Similarly, the "right" way to find the N largest (say N=100) is to use a
    heap as a priority queue and pull off N elements, but calling a general
    purpose sorting routine is simpler.

    The median can be done in O(n) in a few lines?

    I wasn't aware of that and I thought it was more complicated. I know of
    a guaranteed O(n) method that is fairly intricate, and an O(n)
    average-case method that amounts to doing quicksort but figuring out
    after each pivot which partition contains the median, and only recursing
    on that partition instead of on both. Again though, sorting takes 1
    line instead of "a few".

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Paul Rubin on Fri Aug 19 11:11:07 2022
    Paul Rubin <no.email@nospam.invalid> writes:
    Marcel Hendrix <mhx@iae.nl> writes:
    Finding duplicates and largest/smaller need only a single pass through
    all elements.

    Finding duplicates in a single pass needs some kind of lookup table,
    right? So there is more storage overhead than just sorting in place.

    Not necessarily. In the lookup table, you need only memory for the
    unique elements, while with sorting, you need memory for all elements.

    Similarly, the "right" way to find the N largest (say N=100) is to use a
    heap as a priority queue and pull off N elements,

    That's slow, because creating a heap for all the elements is slow (bad locality). I would use a variant of quicksort that only recurses into
    the partitions that include the requested ranks. Another alternative
    would be to use mergesort and stop merging after N elements in each
    step.

    The median can be done in O(n) in a few lines?

    I wasn't aware of that and I thought it was more complicated. I know of
    a guaranteed O(n) method that is fairly intricate, and an O(n)
    average-case method that amounts to doing quicksort but figuring out
    after each pivot which partition contains the median and only recursing
    on that partition instead of on both.

    It's trivial to "figure out" which partition contains a rank: After partitioning the elements equal to the pivot are at their final ranks,
    so you just need to compare these ranks (indexes) with the desired rank.

    Again though, sorting takes 1
    line instead of "a few".

    Only if you have a sort library call and no
    search-for-element-with-rank library call.

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2022: https://euro.theforth.net

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From none) (albert@21:1/5 to siarczek83@gmail.com on Tue Aug 23 12:30:50 2022
    In article <4bba2005-0bb4-46b4-9535-0dbff5d6348en@googlegroups.com>,
    MichaĆ Kasprzak <siarczek83@gmail.com> wrote:
    <SNIP>
    This additional information could be a pointer to related data.
    This pointer may be an index instead of an address.
    We can now use our very quick heapsort word.

    An important note is that pointers have a detrimental effect on the
    locality of memory access (caching issues).
    I'd say that quicksort using pointers is about as efficient as heapsort
    without pointers.

    Groetjes Albert
    --
    "in our communism country Viet Nam, people are forced to be
    alive and in the western country like US, people are free to
    die from Covid 19 lol" duc ha
    albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bernd Linsel@21:1/5 to albert on Tue Aug 23 17:34:08 2022
    On 23.08.2022 12:30, albert wrote:
    In article <4bba2005-0bb4-46b4-9535-0dbff5d6348en@googlegroups.com>,
    Micha? Kasprzak <siarczek83@gmail.com> wrote: <SNIP>
    This additional information could be a pointer to related data.
    This pointer may be an index instead of an address. We can now use
    our very quick heapsort word.

    An important note is that pointers have a detrimental effect on the
    locality of memory access (caching issues). I'd say that quicksort
    using pointers is about as efficient as heapsort without pointers.

    Groetjes Albert

    That's firstly a question of the locality of the pointed-to data. If
    they are organized in an array, and you have another array of pointers
    into that array, the data to be sorted will be cached alike.
    There is even an advantage, as swap operations only have to swap
    pointers, not whole array elements.

    Your point is only valid if the pointed-to data is scattered over the
    whole memory, so that nearly every access (eg for getting the keys to
    compare) misses the cache.

    Regards
    Bernd

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Bernd Linsel on Tue Aug 23 17:33:23 2022
    Bernd Linsel <bl1-removethis@gmx.com> writes:
    On 23.08.2022 12:30, albert wrote:
    In article <4bba2005-0bb4-46b4-9535-0dbff5d6348en@googlegroups.com>,
    Micha? Kasprzak <siarczek83@gmail.com> wrote: <SNIP>
    This additional information could be a pointer to related data.
    This pointer may be an index instead of an address. We can now use
    our very quick heapsort word.

    An important note is that pointers have a detrimental effect on the
    locality of memory access (caching issues). I'd say that quicksort
    using pointers is about as efficient as heapsort without pointers.
    ...
    That's firstly a question of the locality of the pointed-to data. If
    they are organized in an array, and you have another array of pointers
    into that array, the data to be sorted will be cached alike.

    Only at the start. After a few steps of a sorting algorithm with
    spatial locality the pointers are rearranged and walking through them
    linearly will access the elements in a way that has little spatial
    locality.

    But still, I don't see any validity in his point. Sure we can throw
    away the spatial locality of quicksort and mergesort, and then it
    might do as badly as heapsort (which does not have to take such
    measures to have bad spatial locality), but why would one do that?

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2022: https://euro.theforth.net

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)