• Re: Double precision /MOD

    From minforth@21:1/5 to none albert on Tue Mar 21 05:04:31 2023
    none albert schrieb am Dienstag, 21. März 2023 um 12:22:19 UTC+1:
    Suppose we have a fast single precision /MOD , say unsigned.
    Does that help with building a double precision /MOD ,
    or must we resort to a shift and subtract algorithm?

    /mod ( u u -- remainder quotient )
    um/mod ( du u - remainder quotient )

    Results are single precision unsigned.

    Not quite clear what you intend. The maximum
    UD/MOD ( udnom uddenom -- udrem udquot) ??
    on a 32-bit or 64-bit system ??

    or like https://groups.google.com/g/comp.lang.forth/c/RlszeFOHIB4/m/ijpQHZ0huR0J

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From none) (albert@21:1/5 to All on Tue Mar 21 12:22:16 2023
    Suppose we have a fast single precision /MOD , say unsigned.
    Does that help with building a double precision /MOD ,
    or must we resort to a shift and subtract algorithm?

    /mod ( u u -- remainder quotient )
    um/mod ( du u - remainder quotient )

    Results are single precision unsigned.

    Groetjes Albert
    --
    Don't praise the day before the evening. One swallow doesn't make spring.
    You must not say "hey" before you have crossed the bridge. Don't sell the
    hide of the bear until you shot it. Better one bird in the hand than ten in
    the air. First gain is a cat spinning. - the Wise from Antrim -

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From minforth@21:1/5 to minforth on Tue Mar 21 07:01:17 2023
    minforth schrieb am Dienstag, 21. März 2023 um 13:04:33 UTC+1:
    none albert schrieb am Dienstag, 21. März 2023 um 12:22:19 UTC+1:
    Suppose we have a fast single precision /MOD , say unsigned.
    Does that help with building a double precision /MOD ,
    or must we resort to a shift and subtract algorithm?

    /mod ( u u -- remainder quotient )
    um/mod ( du u - remainder quotient )

    Results are single precision unsigned.
    Not quite clear what you intend. The maximum
    UD/MOD ( udnom uddenom -- udrem udquot) ??
    on a 32-bit or 64-bit system ??

    or like https://groups.google.com/g/comp.lang.forth/c/RlszeFOHIB4/m/ijpQHZ0huR0J

    P.S. after weeding out all reduction possibilities for special cases, there are two other options than shift-and-subract:
    - on certain CPUs use floating-point division with 32-bit systems when du has < 56 bits
    - Newton-Raphson division is perhaps faster

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From minforth@21:1/5 to dxforth on Tue Mar 21 08:07:45 2023
    dxforth schrieb am Dienstag, 21. März 2023 um 15:35:33 UTC+1:
    On 22/03/2023 1:01 am, minforth wrote:
    minforth schrieb am Dienstag, 21. März 2023 um 13:04:33 UTC+1:
    none albert schrieb am Dienstag, 21. März 2023 um 12:22:19 UTC+1:
    Suppose we have a fast single precision /MOD , say unsigned.
    Does that help with building a double precision /MOD ,
    or must we resort to a shift and subtract algorithm?

    /mod ( u u -- remainder quotient )
    um/mod ( du u - remainder quotient )

    Results are single precision unsigned.
    Not quite clear what you intend. The maximum
    UD/MOD ( udnom uddenom -- udrem udquot) ??
    on a 32-bit or 64-bit system ??

    or like
    https://groups.google.com/g/comp.lang.forth/c/RlszeFOHIB4/m/ijpQHZ0huR0J

    P.S. after weeding out all reduction possibilities for special cases, there are
    two other options than shift-and-subract:
    - on certain CPUs use floating-point division with 32-bit systems when du has < 56 bits
    - Newton-Raphson division is perhaps faster
    I think he's asking how to make UM/MOD from U/MOD on the assumption the latter is faster. Problem is they are essentially the same algorithm.

    Only if you do shift-and-subtract on both. But who would?

    The question to decompose higher order divisions into lower order CPU-supported efficient division primitives pops up now and then. I am also among those on the
    lookout for simple answers. ;-)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From dxforth@21:1/5 to minforth on Wed Mar 22 01:35:31 2023
    On 22/03/2023 1:01 am, minforth wrote:
    minforth schrieb am Dienstag, 21. März 2023 um 13:04:33 UTC+1:
    none albert schrieb am Dienstag, 21. März 2023 um 12:22:19 UTC+1:
    Suppose we have a fast single precision /MOD , say unsigned.
    Does that help with building a double precision /MOD ,
    or must we resort to a shift and subtract algorithm?

    /mod ( u u -- remainder quotient )
    um/mod ( du u - remainder quotient )

    Results are single precision unsigned.
    Not quite clear what you intend. The maximum
    UD/MOD ( udnom uddenom -- udrem udquot) ??
    on a 32-bit or 64-bit system ??

    or like
    https://groups.google.com/g/comp.lang.forth/c/RlszeFOHIB4/m/ijpQHZ0huR0J

    P.S. after weeding out all reduction possibilities for special cases, there are
    two other options than shift-and-subract:
    - on certain CPUs use floating-point division with 32-bit systems when du has < 56 bits
    - Newton-Raphson division is perhaps faster

    I think he's asking how to make UM/MOD from U/MOD on the assumption the
    latter is faster. Problem is they are essentially the same algorithm.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From bl1@gmx.com@21:1/5 to minforth on Tue Mar 21 09:13:29 2023
    minforth schrieb am Dienstag, 21. März 2023 um 16:07:46 UTC+1:
    dxforth schrieb am Dienstag, 21. März 2023 um 15:35:33 UTC+1:
    On 22/03/2023 1:01 am, minforth wrote:
    minforth schrieb am Dienstag, 21. März 2023 um 13:04:33 UTC+1:
    none albert schrieb am Dienstag, 21. März 2023 um 12:22:19 UTC+1:
    Suppose we have a fast single precision /MOD , say unsigned.
    Does that help with building a double precision /MOD ,
    or must we resort to a shift and subtract algorithm?

    /mod ( u u -- remainder quotient )
    um/mod ( du u - remainder quotient )

    Results are single precision unsigned.
    Not quite clear what you intend. The maximum
    UD/MOD ( udnom uddenom -- udrem udquot) ??
    on a 32-bit or 64-bit system ??

    or like
    https://groups.google.com/g/comp.lang.forth/c/RlszeFOHIB4/m/ijpQHZ0huR0J

    P.S. after weeding out all reduction possibilities for special cases, there are
    two other options than shift-and-subract:
    - on certain CPUs use floating-point division with 32-bit systems when du has < 56 bits
    - Newton-Raphson division is perhaps faster
    I think he's asking how to make UM/MOD from U/MOD on the assumption the latter is faster. Problem is they are essentially the same algorithm.
    Only if you do shift-and-subtract on both. But who would?

    The question to decompose higher order divisions into lower order CPU-supported
    efficient division primitives pops up now and then. I am also among those on the
    lookout for simple answers. ;-)

    Maybe Donald Knuth's Long Division Algorithm is helpful. Here is an examination about some implementations:
    https://skanthak.homepage.t-online.de/division.html

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to albert@cherry. on Tue Mar 21 17:21:13 2023
    albert@cherry.(none) (albert) writes:
    Suppose we have a fast single precision /MOD , say unsigned.
    Does that help with building a double precision /MOD ,
    or must we resort to a shift and subtract algorithm?

    /mod ( u u -- remainder quotient )

    The standard word /MOD works with signed numbers. Gforth calls this
    word U/MOD.

    um/mod ( du u - remainder quotient )

    Results are single precision unsigned.

    If you have a fast U/MOD, you get a faster UM/MOD by using U/MOD than
    by using shift-and-subtract. But it's not straightforward. Gforth
    copied a large chunk of code from libgcc for this.

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2022: https://euro.theforth.net

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From minforth@21:1/5 to b...@gmx.com on Tue Mar 21 11:53:25 2023
    b...@gmx.com schrieb am Dienstag, 21. März 2023 um 17:13:31 UTC+1:
    minforth schrieb am Dienstag, 21. März 2023 um 16:07:46 UTC+1:
    dxforth schrieb am Dienstag, 21. März 2023 um 15:35:33 UTC+1:
    On 22/03/2023 1:01 am, minforth wrote:
    minforth schrieb am Dienstag, 21. März 2023 um 13:04:33 UTC+1:
    none albert schrieb am Dienstag, 21. März 2023 um 12:22:19 UTC+1: >>> Suppose we have a fast single precision /MOD , say unsigned.
    Does that help with building a double precision /MOD ,
    or must we resort to a shift and subtract algorithm?

    /mod ( u u -- remainder quotient )
    um/mod ( du u - remainder quotient )

    Results are single precision unsigned.
    Not quite clear what you intend. The maximum
    UD/MOD ( udnom uddenom -- udrem udquot) ??
    on a 32-bit or 64-bit system ??

    or like
    https://groups.google.com/g/comp.lang.forth/c/RlszeFOHIB4/m/ijpQHZ0huR0J

    P.S. after weeding out all reduction possibilities for special cases, there are
    two other options than shift-and-subract:
    - on certain CPUs use floating-point division with 32-bit systems when du has < 56 bits
    - Newton-Raphson division is perhaps faster
    I think he's asking how to make UM/MOD from U/MOD on the assumption the latter is faster. Problem is they are essentially the same algorithm.
    Only if you do shift-and-subtract on both. But who would?

    The question to decompose higher order divisions into lower order CPU-supported
    efficient division primitives pops up now and then. I am also among those on the
    lookout for simple answers. ;-)
    Maybe Donald Knuth's Long Division Algorithm is helpful. Here is an examination about some implementations:
    https://skanthak.homepage.t-online.de/division.html

    Nice overview by 'Skanthak'. But I am astonished at those complications he is demonstrating there, and at those required code lengths.

    I think Forth can do better. FWIW what I am using so far without trouble:

    \ Non standard:
    \ Primitive: _MU/MOD \ ( ud v - r udq ) mixed unsigned division (gforth: UD/MOD)
    \ : PLUCK 2 pick ;

    : D/REM \ ( du dv -- drem dquot ) MF double number signed symmetric division
    pluck >r pluck over xor >r \ remember signs
    dabs 2swap dabs 2swap dup IF
    C mfUCell ul=mffth, uh=mfthd, vl=mfsec, vh=mftos; // copy Forth stack to C unsigned variables
    C mfUCell q=0, m=1;
    C if ((uh==vh)&&(ul==vl)) uh=ul=0, q=1;
    C else { // shift and subtract
    C while(vh==uh?vl<=ul:vh<=uh)
    C vh=(vh<<1)|(vl>>63), vl<<=1, m<<=1;
    C while(m>1) {
    C vl=(vl>>1)|(vh<<63), vh>>=1, m>>=1;
    C if (uh==vh?ul>=vl:uh>=vh) {
    C uh-=vh+(vl>ul), ul-=vl, q|=m; } } }
    C mffth=ul, mfthd=uh, mfsec=q, mftos=0;
    ELSE drop _mu/mod 0 -rot THEN
    r> 0< IF dnegate THEN \ apply signs
    r> 0< IF 2swap dnegate 2swap THEN ;

    Don't bother about this seemingly wild mix of Forth and C code. All I want to show here
    is how compact it can be.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From dxforth@21:1/5 to minforth on Wed Mar 22 11:42:09 2023
    On 22/03/2023 2:07 am, minforth wrote:
    dxforth schrieb am Dienstag, 21. März 2023 um 15:35:33 UTC+1:
    On 22/03/2023 1:01 am, minforth wrote:
    minforth schrieb am Dienstag, 21. März 2023 um 13:04:33 UTC+1:
    none albert schrieb am Dienstag, 21. März 2023 um 12:22:19 UTC+1:
    Suppose we have a fast single precision /MOD , say unsigned.
    Does that help with building a double precision /MOD ,
    or must we resort to a shift and subtract algorithm?

    /mod ( u u -- remainder quotient )
    um/mod ( du u - remainder quotient )

    Results are single precision unsigned.
    Not quite clear what you intend. The maximum
    UD/MOD ( udnom uddenom -- udrem udquot) ??
    on a 32-bit or 64-bit system ??

    or like
    https://groups.google.com/g/comp.lang.forth/c/RlszeFOHIB4/m/ijpQHZ0huR0J >>>
    P.S. after weeding out all reduction possibilities for special cases, there are
    two other options than shift-and-subract:
    - on certain CPUs use floating-point division with 32-bit systems when du has < 56 bits
    - Newton-Raphson division is perhaps faster
    I think he's asking how to make UM/MOD from U/MOD on the assumption the
    latter is faster. Problem is they are essentially the same algorithm.

    Only if you do shift-and-subtract on both. But who would?

    The question to decompose higher order divisions into lower order CPU-supported
    efficient division primitives pops up now and then. I am also among those on the
    lookout for simple answers. ;-)

    What pops up every now and again are C users asking why Intel didn't provide double by double and single by single division instructions.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to minforth on Wed Mar 22 09:38:26 2023
    minforth <minforth@arcor.de> writes:
    b...@gmx.com schrieb am Dienstag, 21. M=C3=A4rz 2023 um 17:13:31 UTC+1:
    https://skanthak.homepage.t-online.de/division.html

    Nice overview by 'Skanthak'. But I am astonished at those complications he = >is
    demonstrating there, and at those required code lengths.

    I think Forth can do better. FWIW what I am using so far without trouble:

    \ Non standard:
    \ Primitive: _MU/MOD \ ( ud v - r udq ) mixed unsigned division (gforth: U= >D/MOD)
    \ : PLUCK 2 pick ;

    : D/REM \ ( du dv -- drem dquot ) MF double number signed symmetric divisi= >on
    pluck >r pluck over xor >r \ remember signs
    dabs 2swap dabs 2swap dup IF=20
    C mfUCell ul=3Dmffth, uh=3Dmfthd, vl=3Dmfsec, vh=3Dmftos; // copy Forth=
    stack to C unsigned variables
    C mfUCell q=3D0, m=3D1;=20
    C if ((uh=3D=3Dvh)&&(ul=3D=3Dvl)) uh=3Dul=3D0, q=3D1;
    C else { // shift and subtract
    C while(vh=3D=3Duh?vl<=3Dul:vh<=3Duh)
    C vh=3D(vh<<1)|(vl>>63), vl<<=3D1, m<<=3D1;
    C while(m>1) {
    C vl=3D(vl>>1)|(vh<<63), vh>>=3D1, m>>=3D1;
    C if (uh=3D=3Dvh?ul>=3Dvl:uh>=3Dvh) {
    C uh-=3Dvh+(vl>ul), ul-=3Dvl, q|=3Dm; } } }=20
    C mffth=3Dul, mfthd=3Duh, mfsec=3Dq, mftos=3D0;
    ELSE drop _mu/mod 0 -rot THEN
    r> 0< IF dnegate THEN \ apply signs
    r> 0< IF 2swap dnegate 2swap THEN ;

    Don't bother about this seemingly wild mix of Forth and C code. All I want = >to show here
    is how compact it can be.

    Except that it does not solve the problem "none albert" asked for:
    Implementing double/single->single with single/single->single, faster
    than shift-and-subtract.

    What you provide is an implementation of double/double->double that
    uses shift-and-subtract in the general case and calls a pre-existing implementation of double/single->double in Forth, without showing how
    that is implemented. In Gforth the latter is implemented as

    : ud/mod ( ud1 u2 -- urem udquot ) \ gforth
    \G divide unsigned double @i{ud1} by @i{u2}, resulting in a unsigned double
    \G quotient @i{udquot} and a single remainder @i{urem}.
    over 0= if nip u/mod 0 exit then
    dup >r u/mod r> swap >r um/mod r> ;

    And this implementation uses UM/MOD, which is the word that "none
    albert" has asked about.

    So we have an onion of division words, starting from the core:

    u/mod (called by) um/mod (called by) ud/mod (called by) d/rem

    The first part can be skipped if the architecture directly supports
    UM/MOD, or if it does not have division instructions.

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2022: https://euro.theforth.net

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From none) (albert@21:1/5 to Anton Ertl on Wed Mar 22 11:58:03 2023
    In article <2023Mar21.182113@mips.complang.tuwien.ac.at>,
    Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
    albert@cherry.(none) (albert) writes:
    Suppose we have a fast single precision /MOD , say unsigned.
    Does that help with building a double precision /MOD ,
    or must we resort to a shift and subtract algorithm?

    /mod ( u u -- remainder quotient )

    The standard word /MOD works with signed numbers. Gforth calls this
    word U/MOD.

    um/mod ( du u - remainder quotient )

    Results are single precision unsigned.

    If you have a fast U/MOD, you get a faster UM/MOD by using U/MOD than
    by using shift-and-subtract. But it's not straightforward. Gforth
    copied a large chunk of code from libgcc for this.

    I suspected that much. Thanks.
    Knuth based a multiprecision divide on a single division divide.
    A 2-precision divide is a special case of this, but it is equally
    cumbersome. I remember that in implementing this multiprecision
    divide on the transputer (we did number theoretic work on it)
    Marcel Hendrix asked my help for debugging, a rare event.


    - anton

    Groetjes Albert
    --
    Don't praise the day before the evening. One swallow doesn't make spring.
    You must not say "hey" before you have crossed the bridge. Don't sell the
    hide of the bear until you shot it. Better one bird in the hand than ten in
    the air. First gain is a cat spinning. - the Wise from Antrim -

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul Rubin@21:1/5 to Anton Ertl on Wed Mar 22 10:56:30 2023
    anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
    : ud/mod ( ud1 u2 -- urem udquot ) \ gforth
    \G divide unsigned double @i{ud1} by @i{u2}, resulting in a unsigned double
    \G quotient @i{udquot} and a single remainder @i{urem}.
    over 0= if nip u/mod 0 exit then
    dup >r u/mod r> swap >r um/mod r> ;

    In gforth 0.7.3 on amd64, I get:

    see ud/mod
    : ud/mod
    >r 0 i um/mod r> swap >r um/mod r> ; ok

    I haven't tried to figure out what either of these are doing.

    I had been wondering what happens if the quotient won't fit in a single.
    0 1 1 ud/mod gives 0 0 1 on my system, which doesn't seem so great.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Paul Rubin on Wed Mar 22 18:49:29 2023
    Paul Rubin <no.email@nospam.invalid> writes:
    anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
    : ud/mod ( ud1 u2 -- urem udquot ) \ gforth
    \G divide unsigned double @i{ud1} by @i{u2}, resulting in a unsigned double
    \G quotient @i{udquot} and a single remainder @i{urem}.
    over 0= if nip u/mod 0 exit then
    dup >r u/mod r> swap >r um/mod r> ;

    In gforth 0.7.3 on amd64, I get:

    see ud/mod
    : ud/mod
    >r 0 i um/mod r> swap >r um/mod r> ; ok

    I haven't tried to figure out what either of these are doing.

    They are doing what is documented. The implementation I cited is more optimized than the 0.7.3 one (with the intention of getting a faster
    "#" (and words that call it, like ".", especially on architectures
    that do not provide UM/MOD, such as ARM and RISC-V)).

    I had been wondering what happens if the quotient won't fit in a single.
    0 1 1 ud/mod gives 0 0 1 on my system, which doesn't seem so great.

    What you are doing (on a 64-bit system) is:

    18446744073709551616/1

    And the result you get is the quotient 18446744073709551616 and
    remainder 0.

    What's not so great in your opinion?

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2022: https://euro.theforth.net

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From minforth@21:1/5 to Anton Ertl on Wed Mar 22 13:32:49 2023
    Anton Ertl schrieb am Mittwoch, 22. März 2023 um 10:51:54 UTC+1:
    minforth <minf...@arcor.de> writes:
    b...@gmx.com schrieb am Dienstag, 21. M=C3=A4rz 2023 um 17:13:31 UTC+1:
    https://skanthak.homepage.t-online.de/division.html

    Nice overview by 'Skanthak'. But I am astonished at those complications he = >is
    demonstrating there, and at those required code lengths.

    I think Forth can do better. FWIW what I am using so far without trouble:

    \ Non standard:
    \ Primitive: _MU/MOD \ ( ud v - r udq ) mixed unsigned division (gforth: U= >D/MOD)
    \ : PLUCK 2 pick ;

    : D/REM \ ( du dv -- drem dquot ) MF double number signed symmetric divisi= >on
    pluck >r pluck over xor >r \ remember signs
    dabs 2swap dabs 2swap dup IF=20
    C mfUCell ul=3Dmffth, uh=3Dmfthd, vl=3Dmfsec, vh=3Dmftos; // copy Forth=
    stack to C unsigned variables
    C mfUCell q=3D0, m=3D1;=20
    C if ((uh=3D=3Dvh)&&(ul=3D=3Dvl)) uh=3Dul=3D0, q=3D1;
    C else { // shift and subtract
    C while(vh=3D=3Duh?vl<=3Dul:vh<=3Duh)
    C vh=3D(vh<<1)|(vl>>63), vl<<=3D1, m<<=3D1;
    C while(m>1) {
    C vl=3D(vl>>1)|(vh<<63), vh>>=3D1, m>>=3D1;
    C if (uh=3D=3Dvh?ul>=3Dvl:uh>=3Dvh) {
    C uh-=3Dvh+(vl>ul), ul-=3Dvl, q|=3Dm; } } }=20
    C mffth=3Dul, mfthd=3Duh, mfsec=3Dq, mftos=3D0;
    ELSE drop _mu/mod 0 -rot THEN
    0< IF dnegate THEN \ apply signs
    0< IF 2swap dnegate 2swap THEN ;

    Don't bother about this seemingly wild mix of Forth and C code. All I want = >to show here
    is how compact it can be.
    Except that it does not solve the problem "none albert" asked for: Implementing double/single->single with single/single->single, faster
    than shift-and-subtract.

    What you provide is an implementation of double/double->double that
    uses shift-and-subtract in the general case and calls a pre-existing implementation of double/single->double in Forth, without showing how
    that is implemented. In Gforth the latter is implemented as

    : ud/mod ( ud1 u2 -- urem udquot ) \ gforth
    \G divide unsigned double @i{ud1} by @i{u2}, resulting in a unsigned double \G quotient @i{udquot} and a single remainder @i{urem}.
    over 0= if nip u/mod 0 exit then
    dup >r u/mod r> swap >r um/mod r> ;

    And this implementation uses UM/MOD, which is the word that "none
    albert" has asked about.

    So we have an onion of division words, starting from the core:

    u/mod (called by) um/mod (called by) ud/mod (called by) d/rem

    The first part can be skipped if the architecture directly supports
    UM/MOD, or if it does not have division instructions.

    Of course. IMO it boils down to either you have CPU mixed division support
    or not. If not, the question remains: is there a faster/simpler division algorithm
    than binary shift-subtract?

    F.ex. given fast integer log2 and pow2 operators, division can be solved with multiplication and some additions. But only timing can tell if it is worth the hassle.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From dxforth@21:1/5 to Paul Rubin on Thu Mar 23 10:51:14 2023
    On 23/03/2023 4:56 am, Paul Rubin wrote:
    anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
    : ud/mod ( ud1 u2 -- urem udquot ) \ gforth
    \G divide unsigned double @i{ud1} by @i{u2}, resulting in a unsigned double
    \G quotient @i{udquot} and a single remainder @i{urem}.
    over 0= if nip u/mod 0 exit then
    dup >r u/mod r> swap >r um/mod r> ;

    In gforth 0.7.3 on amd64, I get:

    see ud/mod
    : ud/mod
    >r 0 i um/mod r> swap >r um/mod r> ; ok

    I haven't tried to figure out what either of these are doing.

    I had been wondering what happens if the quotient won't fit in a single.

    As is potentially the case in UM/MOD ? That would be an overflow/exception. Don't do that. Intelligent use of UM/MOD is better than default use of UD/MOD.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to minforth on Thu Mar 23 07:54:44 2023
    minforth <minforth@arcor.de> writes:
    Of course. IMO it boils down to either you have CPU mixed division support
    or not. If not, the question remains: is there a faster/simpler division al= >gorithm
    than binary shift-subtract?=20

    F.ex. given fast integer log2 and pow2 operators, division can be solved wi= >th
    multiplication and some additions. But only timing can tell if it is worth = >the hassle.

    Yes, a lot of the bulk of the stuff we took from libgcc is due to
    variations for making use of some feature present on some
    architectures to get a faster result, plus the code for an efficient implementation on architectures where the feature is not present
    (which may mean that you use a different approach rather than just
    replacing the code for the feature with a software implementation of
    that feature). With several such features in play, the result bloats considerably.

    You can code a simple implementation that uses only minimal features,
    and you can code it up in Forth, and get a much slimmer result and
    then claim "Forth simplicity", but in that case the slimness is not
    due to Forth, but due to preferencing slimness over efficiency.

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2022: https://euro.theforth.net

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From minforth@21:1/5 to Anton Ertl on Thu Mar 23 01:36:50 2023
    Anton Ertl schrieb am Donnerstag, 23. März 2023 um 09:03:45 UTC+1:
    You can code a simple implementation that uses only minimal features,
    and you can code it up in Forth, and get a much slimmer result and
    then claim "Forth simplicity", but in that case the slimness is not
    due to Forth, but due to preferencing slimness over efficiency.

    Due to caching effects on CPUs or being able to make better usage of registers, very often the slimmer code wins the race despite being mathematically less efficient.

    And it saves a lot of debugging time. ;-)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to minforth on Thu Mar 23 09:20:05 2023
    minforth <minforth@arcor.de> writes:
    Anton Ertl schrieb am Donnerstag, 23. M=C3=A4rz 2023 um 09:03:45 UTC+1:
    You can code a simple implementation that uses only minimal features,=20
    and you can code it up in Forth, and get a much slimmer result and=20
    then claim "Forth simplicity", but in that case the slimness is not=20
    due to Forth, but due to preferencing slimness over efficiency.

    Due to caching effects on CPUs or being able to make better usage of regist= >ers,
    very often the slimmer code wins the race despite being mathematically less=
    efficient.

    For every architecture only the variant for its architectural features
    results is compiled (using conditional compilation). This results in
    the following code lengths for UM/MOD (excluding the final jump in the
    NEXT):

    Bytes Architecture
    30 AMD64
    292 RISC-V
    308 Aarch64

    If you come up with shorter code that runs on all the architectures
    Gforth supports and also results in shorter and faster object code, I
    will happily replace the current Gforth implementation. Until then I
    continue to believe that no significant source code size improvement
    is possible that does not result in a significant disadvantage in
    performance.

    And it saves a lot of debugging time. ;-)

    Yes, that's a disadvantage of all this special-case code, but in this
    case the debugging was performed by the gcc developers, I just dropped
    it in as-is.

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2022: https://euro.theforth.net

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)