• Vector operations for 64-bit Forths

    From Brad Eckert@21:1/5 to All on Thu Sep 14 15:56:26 2023
    Hi All,

    As a 32-bit throwback, I am reluctant to write 64-bit Forth applications. However, 64-bit Forth is becoming a thing anyway. Moreover, 64-bit data is becoming a thing.

    64-bit cells are much wider than most data in the application. This opens up the possibility of vector operations. I don't mean SSE and such, I mean treating 64-bit words as vectors. For example, as 4-element groups of 16-bit numbers you can AND, OR,
    INVERT, or XOR directly.

    Vector addition would be a nice-to-have. Add the top two stack elements but break the carry chain every 16 or 32 bits. Being a little lazy today, I'm just going to ask if the i86 architecture supports this stuff. I assume it does. 64-bit Forth will have
    to standardize on some kind of vector wordset.

    Has this been explored before?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marcel Hendrix@21:1/5 to Brad Eckert on Thu Sep 14 21:46:39 2023
    On Friday, September 15, 2023 at 12:56:28 AM UTC+2, Brad Eckert wrote:
    [..]
    64-bit cells are much wider than most data in the application. This opens up the possibility of vector operations. I don't mean SSE
    and such, I mean treating 64-bit words as vectors. For example, as 4-element groups of 16-bit numbers you can AND, OR, INVERT,
    or XOR directly.
    [..]
    Has this been explored before?

    What kind of application would use this? If speed is important, the programmer will use SSE or whatever anyway.
    If speed is not important, maybe the application cries out for the approach (easier to understand or shorter?)
    In my experience SIMD code is not easy to write or debug, and if you don't pay attention the performance
    is subpar, or the code grows/is difficult to use because of alignment issues. Hence my initial question.

    -marcel

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From minforth@21:1/5 to Brad Eckert on Fri Sep 15 01:18:42 2023
    Brad Eckert schrieb am Freitag, 15. September 2023 um 00:56:28 UTC+2:
    Hi All,

    As a 32-bit throwback, I am reluctant to write 64-bit Forth applications. However, 64-bit Forth is becoming a thing anyway. Moreover, 64-bit data is becoming a thing.

    64-bit cells are much wider than most data in the application. This opens up the possibility of vector operations. I don't mean SSE and such, I mean treating 64-bit words as vectors. For example, as 4-element groups of 16-bit numbers you can AND, OR,
    INVERT, or XOR directly.

    Vector addition would be a nice-to-have. Add the top two stack elements but break the carry chain every 16 or 32 bits. Being a little lazy today, I'm just going to ask if the i86 architecture supports this stuff. I assume it does. 64-bit Forth will
    have to standardize on some kind of vector wordset.

    Has this been explored before?

    You seem to focus on packed integers. Not uncommon for some graphics applications
    or e.g. crypto, where speed is often needed.

    But IME the complexities of SSE/AVX don't pay off where speed is a secondary concern.

    For signal processing you can fit analog-to-digital converter results easily into 32-bits
    in raw mode, or after scaling and noise filtering into small floating-point number vectors.
    Unsurprisingly many SSE/AVX operands deal with such fp-vectors.

    From a Forth perspective it is also important to have a good library to deal with serial data in
    heap memory and on mass storage.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From none) (albert@21:1/5 to hwfwguy@gmail.com on Fri Sep 15 11:21:32 2023
    In article <e8fedb6c-a47c-4c5f-ac04-7ab6fe70ab16n@googlegroups.com>,
    Brad Eckert <hwfwguy@gmail.com> wrote:
    Hi All,

    As a 32-bit throwback, I am reluctant to write 64-bit Forth
    applications. However, 64-bit Forth is becoming a thing anyway.
    Moreover, 64-bit data is becoming a thing.
    2008 is some 20 years after ISO, and at that time
    everybody uses CELL+ CELLS .
    I changed my Linux Forth to 64 bits at that time.
    No programs had to be changed.

    Now we are 15 years later. Your applications should run on every
    hosted Forth, no need to write "64-bit Forth".

    Groetjes Albert
    --
    Don't praise the day before the evening. One swallow doesn't make spring.
    You must not say "hey" before you have crossed the bridge. Don't sell the
    hide of the bear until you shot it. Better one bird in the hand than ten in
    the air. First gain is a cat spinning. - the Wise from Antrim -

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Heinrich Hohl@21:1/5 to none albert on Fri Sep 15 11:43:05 2023
    On Friday, September 15, 2023 at 11:21:37 AM UTC+2, none albert wrote:
    Your applications should run on every hosted Forth, no need to write "64-bit Forth".

    It is true that you do not need to rewrite every 32-bit Forth application from scratch,
    but you may need to adjust a few things. The following comes to my mind:

    - If your 32-bit application uses assembly code, you must adjust register names
    (e.g. EBX --> RBX)

    - If you make use of truncated multiplication (e.g. in an LCG random number generator),
    you must truncate the result using a bit mask

    - If your 32-bit application makes use of double length math operators, you may be able
    to simplify math operations using single length operators in the 64-bit Forth system

    I recommend checking the 32-bit Forth source code carefully for possible traps before
    running it in a 64-bit Forth system.

    Henry

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Brad Eckert@21:1/5 to Heinrich Hohl on Sun Sep 17 11:48:47 2023
    On Friday, September 15, 2023 at 11:43:08 AM UTC-7, Heinrich Hohl wrote:
    On Friday, September 15, 2023 at 11:21:37 AM UTC+2, none albert wrote:
    Your applications should run on every hosted Forth, no need to write "64-bit Forth".
    It is true that you do not need to rewrite every 32-bit Forth application from scratch,
    but you may need to adjust a few things. The following comes to my mind:

    - If your 32-bit application uses assembly code, you must adjust register names
    (e.g. EBX --> RBX)

    - If you make use of truncated multiplication (e.g. in an LCG random number generator),
    you must truncate the result using a bit mask

    - If your 32-bit application makes use of double length math operators, you may be able
    to simplify math operations using single length operators in the 64-bit Forth system

    I recommend checking the 32-bit Forth source code carefully for possible traps before
    running it in a 64-bit Forth system.

    Henry
    Thanks for the responses but I am thinking more along the lines of Forth computers, not i64 monsters. I tend to agree with Marcel that the added complexity probably isn't worth it. It would be interesting to experiment with primitives like >< (swap
    halves of the top of stack) and H+ (add the top two stack elements as 2D vectors). Adding two vectors would be simple compared to the usual stackrobatics.

    The cathedrals of modern computing have already been built to suit the C programming paradigm. Apps are written for the hardware and hardware is designed for the apps. The prevailing computing religion is the only game in town unless you want to go live
    in a tent. It's like old times but they don't burn you for heresy. So, 32-bit and 64-bit paradigms are here to stay. The FPUs came, everyone loved IEEE754 doubles, and it became a good idea to move data in 64-bit chunks. 64-bit also gave segmented
    address haters their linear address space. The hardware guys knew their market better than Anheuser Busch.

    Forth strongly favors integer arithmetic for reasons of ideological purity at this point. Maybe cells should fit floating point numbers. How wide should floating point numbers really be? 32-bit seems a bit small. 64-bit seems a bit big. Where is the
    Goldilocks point? The B5500 that inspired Chuck Moore had a 48-bit word.



    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marcel Hendrix@21:1/5 to Brad Eckert on Sun Sep 17 12:22:54 2023
    On Sunday, September 17, 2023 at 8:48:50 PM UTC+2, Brad Eckert wrote:
    How wide should floating point numbers really be? 32-bit seems a bit small. 64-bit seems a bit big.
    Where is the Goldilocks point? The B5500 that inspired Chuck Moore had a 48-bit word.

    128 bits. The 64-bit doubles and 80-bits extended are enough, but there are algorithms
    that only work when you have twice the width.

    -marcel

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From dxf@21:1/5 to Marcel Hendrix on Mon Sep 18 14:13:48 2023
    On 18/09/2023 5:22 am, Marcel Hendrix wrote:
    On Sunday, September 17, 2023 at 8:48:50 PM UTC+2, Brad Eckert wrote:
    How wide should floating point numbers really be? 32-bit seems a bit small. 64-bit seems a bit big.
    Where is the Goldilocks point? The B5500 that inspired Chuck Moore had a 48-bit word.

    128 bits. The 64-bit doubles and 80-bits extended are enough, but there are algorithms
    that only work when you have twice the width.

    -marcel

    Moore would find a better algorithm.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From minforth@21:1/5 to Brad Eckert on Sun Sep 17 22:43:22 2023
    Brad Eckert schrieb am Sonntag, 17. September 2023 um 20:48:50 UTC+2:
    On Friday, September 15, 2023 at 11:43:08 AM UTC-7, Heinrich Hohl wrote:
    On Friday, September 15, 2023 at 11:21:37 AM UTC+2, none albert wrote:
    Your applications should run on every hosted Forth, no need to write "64-bit Forth".
    It is true that you do not need to rewrite every 32-bit Forth application from scratch,
    but you may need to adjust a few things. The following comes to my mind:

    - If your 32-bit application uses assembly code, you must adjust register names
    (e.g. EBX --> RBX)

    - If you make use of truncated multiplication (e.g. in an LCG random number generator),
    you must truncate the result using a bit mask

    - If your 32-bit application makes use of double length math operators, you may be able
    to simplify math operations using single length operators in the 64-bit Forth system

    I recommend checking the 32-bit Forth source code carefully for possible traps before
    running it in a 64-bit Forth system.

    Henry
    Thanks for the responses but I am thinking more along the lines of Forth computers, not i64 monsters. I tend to agree with Marcel that the added complexity probably isn't worth it. It would be interesting to experiment with primitives like >< (swap
    halves of the top of stack) and H+ (add the top two stack elements as 2D vectors). Adding two vectors would be simple compared to the usual stackrobatics.

    The cathedrals of modern computing have already been built to suit the C programming paradigm. Apps are written for the hardware and hardware is designed for the apps. The prevailing computing religion is the only game in town unless you want to go
    live in a tent. It's like old times but they don't burn you for heresy. So, 32-bit and 64-bit paradigms are here to stay. The FPUs came, everyone loved IEEE754 doubles, and it became a good idea to move data in 64-bit chunks. 64-bit also gave segmented
    address haters their linear address space. The hardware guys knew their market better than Anheuser Busch.

    Forth strongly favors integer arithmetic for reasons of ideological purity at this point. Maybe cells should fit floating point numbers. How wide should floating point numbers really be? 32-bit seems a bit small. 64-bit seems a bit big. Where is the
    Goldilocks point? The B5500 that inspired Chuck Moore had a 48-bit word.

    Even IBM Z mainframes are flexible, with preference for 32 bits: https://www.ibm.com/docs/en/cics-ts/5.4?topic=basics-24-bit-31-bit-64-bit-addressing

    When the CPU does not support your proposed >< or + H+ operations
    I would be surprised if they were faster than standard register-cached
    stack operations.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marcel Hendrix@21:1/5 to dxf on Mon Sep 18 10:34:55 2023
    On Monday, September 18, 2023 at 6:13:52 AM UTC+2, dxf wrote:
    [..]
    The 64-bit doubles and 80-bits extended are enough, but there are algorithms
    that only work when you have twice the width.
    [..]
    Moore would find a better algorithm.

    I doubt it. It is not that the whole algorithm switches to 128 bits to avoid the problem.
    For some equations/problems there can be a mix of very small and very big numbers where floating-point performs badly. By successive refinement the
    issue can be eliminated, but in its critical operation higher-than-default precision is needed. A typical use-case is throwing a switch.

    -marcel

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Brad Eckert@21:1/5 to Marcel Hendrix on Sat Sep 23 17:02:53 2023
    On Monday, September 18, 2023 at 10:34:58 AM UTC-7, Marcel Hendrix wrote:
    On Monday, September 18, 2023 at 6:13:52 AM UTC+2, dxf wrote:
    [..]
    The 64-bit doubles and 80-bits extended are enough, but there are algorithms
    that only work when you have twice the width.
    [..]
    Moore would find a better algorithm.
    I doubt it. It is not that the whole algorithm switches to 128 bits to avoid the problem.
    For some equations/problems there can be a mix of very small and very big numbers where floating-point performs badly. By successive refinement the issue can be eliminated, but in its critical operation higher-than-default precision is needed. A typical use-case is throwing a switch.

    -marcel
    I can imagine Tim the Tool Time Guy saying "MORE BITS!".

    As long as we are blue-skying, I would propose that arbitrary precision floating point as well as arbitrary precision (bignum) integers be supported in hardware. It would be nice to have IEEE standards for both. Maybe we will see that someday.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From none) (albert@21:1/5 to hwfwguy@gmail.com on Sun Sep 24 12:09:27 2023
    In article <63976a7d-f3d4-4f15-958d-eef28880044cn@googlegroups.com>,
    Brad Eckert <hwfwguy@gmail.com> wrote:
    On Monday, September 18, 2023 at 10:34:58 AM UTC-7, Marcel Hendrix wrote:
    On Monday, September 18, 2023 at 6:13:52 AM UTC+2, dxf wrote:
    [..]
    The 64-bit doubles and 80-bits extended are enough, but there are >algorithms
    that only work when you have twice the width.
    [..]
    Moore would find a better algorithm.
    I doubt it. It is not that the whole algorithm switches to 128 bits to >avoid the problem.
    For some equations/problems there can be a mix of very small and very big
    numbers where floating-point performs badly. By successive refinement the
    issue can be eliminated, but in its critical operation higher-than-default >> precision is needed. A typical use-case is throwing a switch.

    -marcel
    I can imagine Tim the Tool Time Guy saying "MORE BITS!".

    As long as we are blue-skying, I would propose that arbitrary precision >floating point as well as arbitrary precision (bignum) integers be
    supported in hardware. It would be nice to have IEEE standards for both. >Maybe we will see that someday.

    Most of the problematic problems are "stiff" problems where there are
    orders of magnitude between the small and large eigenvalues of a
    matrix. It is relatively easy to predict the position of the earth
    years into the future, but the position of the moon relative to the
    earth is far less precise. It is related to chaotic problems.
    Boosting the precision is of the floating point is of little avail
    and is surely not an alternative to numerical analysis.

    There are a few problems that warrant high precision.
    We are now encountering problems with the magnetic moment of muon.
    The experimental value doesn't agree with the theoretical value,
    but wait, a different calculation gives a different theoretical value.
    These experiments are insanely precise, and require more than
    IEEE double precision.

    Groetjes Albert
    --
    Don't praise the day before the evening. One swallow doesn't make spring.
    You must not say "hey" before you have crossed the bridge. Don't sell the
    hide of the bear until you shot it. Better one bird in the hand than ten in
    the air. First gain is a cat spinning. - the Wise from Antrim -

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marcel Hendrix@21:1/5 to none albert on Sun Sep 24 04:23:40 2023
    On Sunday, September 24, 2023 at 12:09:30 PM UTC+2, none albert wrote:
    In article <63976a7d-f3d4-4f15...@googlegroups.com>,
    Brad Eckert <hwf...@gmail.com> wrote:
    On Monday, September 18, 2023 at 10:34:58 AM UTC-7, Marcel Hendrix wrote:
    [..]
    Most of the problematic problems are "stiff" problems where there are
    orders of magnitude between the small and large eigenvalues of a
    matrix. It is relatively easy to predict the position of the earth
    years into the future, but the position of the moon relative to the
    earth is far less precise. It is related to chaotic problems.
    Boosting the precision is of the floating point is of little avail
    and is surely not an alternative to numerical analysis.

    Of course. But there are problems (like circuit simulation) where the
    balance between accuracy and speed is important. More bits is
    generally (not always) slower, therefore algorithms are used that
    won't crash when the condition number explodes after an unfortunate
    refactor, while still having acceptable accuracy with double precision.

    -marcel

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Zbig@21:1/5 to All on Sun Sep 24 05:33:28 2023
    64-bit cells are much wider than most data in the application.
    This opens up the possibility of vector operations. I don't mean
    SSE and such, I mean treating 64-bit words as vectors.

    Maybe specialized version of Forth, that uses exclusively
    32-bit complex numbers?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)