• FPU (x87) code debugging.

    From Frank Kotler@21:1/5 to R.Wieser on Fri Aug 6 13:57:16 2021
    On 08/06/2021 01:11 PM, R.Wieser wrote:
    Moderator, Frank : I'm not sure if questions about the x87 FPU are permitted here. If not than please just discard. If they are than please remove this line. :-)

    Hi Rudy,
    Consider the line removed.
    I think x87 is on topic. If necessary, I so rule it. :)
    I don't know the answer, though...
    Best.
    Frank

    Hello all,

    I've just been writing some basic code to parse a simple float, and realized that I had no idea how to check if the x87 FPU was empty after I was done - as a simple measure to check if my code cleaned up correctly.

    I've been looking at using the ST bits in the FPU status word, but had to find that they (unexpectedly) didn't end at zero after I done my thing :

    minimal example:

    fld1 ;Load
    fld1

    fstp st(2) ;Swap ST(0) and ST(1) <-- this is the culprit

    fstp st(0) ;Discard
    fstp st(0)

    At this point all the ST bits are set, indicating a minus one, not zero.

    My questions at this point are:

    1) Have I done anything wrong in the above ? I don't think so, but "you never know" ....

    2) How do I, for debugging purposes, check the FPU stack ?

    Regards,
    Rudy Wieser



    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From R.Wieser@21:1/5 to All on Fri Aug 6 19:11:21 2021
    Moderator, Frank : I'm not sure if questions about the x87 FPU are permitted here. If not than please just discard. If they are than please remove this line. :-)

    Hello all,

    I've just been writing some basic code to parse a simple float, and realized that I had no idea how to check if the x87 FPU was empty after I was done -
    as a simple measure to check if my code cleaned up correctly.

    I've been looking at using the ST bits in the FPU status word, but had to
    find that they (unexpectedly) didn't end at zero after I done my thing :

    minimal example:

    fld1 ;Load
    fld1

    fstp st(2) ;Swap ST(0) and ST(1) <-- this is the culprit

    fstp st(0) ;Discard
    fstp st(0)

    At this point all the ST bits are set, indicating a minus one, not zero.

    My questions at this point are:

    1) Have I done anything wrong in the above ? I don't think so, but "you
    never know" ....

    2) How do I, for debugging purposes, check the FPU stack ?

    Regards,
    Rudy Wieser

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From R.Wieser@21:1/5 to All on Fri Aug 6 20:55:10 2021
    Frank,

    Moderator, Frank : I'm not sure if questions about the x87 FPU are
    permitted
    here. If not than please just discard. If they are than please remove
    this
    line. :-)

    Hi Rudy,
    Consider the line removed.

    To be honest, I had forgotten all about you whitelisting (pardon me if that isn't PC) people and assumed you would see the message before it would go
    into the newsgroup. But hey, now I know I'm on your whitelist too. :-)

    I think x87 is on topic. If necessary, I so rule it. :)

    I was't quite sure, as most all here is 16 bit assembly. And thats is from
    a time when x87 FPUs were add-on chips. But thanks.

    I don't know the answer, though...

    No problem. Hopefully someone else here has an idea.

    Regards,
    Rudy Wieser

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From DJ Delorie@21:1/5 to R.Wieser on Fri Aug 6 21:14:03 2021
    "R.Wieser" <address@nospicedham.not.available> writes:
    2) How do I, for debugging purposes, check the FPU stack ?

    If your debugger doesn't support it, you can at least use FSAVE/FRESTOR
    to fill in a chunk of data which you can then inspect.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From R.Wieser@21:1/5 to All on Sat Aug 7 09:51:16 2021
    DJ,

    If your debugger doesn't support it

    No debugger here (never liked them).

    you can at least use FSAVE/FRESTOR to fill in a chunk
    of data which you can then inspect.

    Thanks. That one does give quite a bit of information.

    It does have a drawback though: it re-initializes the FPU stack, meaning it cannot be used while in the middle of a calculation. Any idea to some non-destructive probing ?

    Regards,
    Rudy Wieser

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From wolfgang kern@21:1/5 to R.Wieser on Sat Aug 7 10:40:30 2021
    On 07.08.2021 09:51, R.Wieser wrote:
    DJ,

    If your debugger doesn't support it

    No debugger here (never liked them).

    you can at least use FSAVE/FRESTOR to fill in a chunk
    of data which you can then inspect.

    Thanks. That one does give quite a bit of information.

    It does have a drawback though: it re-initializes the FPU stack, meaning it cannot be used while in the middle of a calculation. Any idea to some non-destructive probing ?

    FXSAVE
    __
    wolfgang

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From wolfgang kern@21:1/5 to R.Wieser on Sat Aug 7 10:34:10 2021
    On 06.08.2021 19:11, R.Wieser wrote:

    2) How do I, for debugging purposes, check the FPU stack ?

    not every debug tool supports FPU. I had to write my own debugger
    anyway and it uses FXSAVE to show registers and all status bits.

    but how did you check 1) FSTCW ? FXAM/r ? FSTENV ? FSTSW AX ?
    too many consecutive fstp will cause stack errors.

    The FNSTCW instruction does not check for possible floating-point
    exceptions before copying the image of the x87 status register.

    FCLEX or FXSAVE followed by FINIT work fine for me to clean up.
    and FFREE/r is my way to empty a specific register.

    I actually hate this stupid stack-up/dn design, an overall ST(n)
    would work just fine with much lesser doubtful quirks.
    meanwhile we got SSE/AVX and AMD may remove FPU from chip soon.
    __
    wolfgang

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From R.Wieser@21:1/5 to All on Sat Aug 7 12:11:43 2021
    Wolfgang,

    but how did you check 1)

    I read the Status Word, using FNSTSW. From there I isolated the ST bits.

    Thanks for mentioning FXAM. Something I already thought of being handy to have, but didn't now the name of. :-)

    FCLEX or FXSAVE followed by FINIT work fine for me to clean up.
    and FFREE/r is my way to empty a specific register.

    I already found (and used, just before the code I posted) FNINIT. But that just drops all "left over" variables and error flags. Not something I want
    to finish a calculation with ...

    As for FFREE ? I'm not sure I understand its worth - other than to perhaps delete the bottom-of-stack variable (and even than), as in all other cases
    it would create a "hole" on the stack, which I than still would have to
    recon with. :-|

    I actually hate this stupid stack-up/dn design, an overall ST(n)
    would work just fine with much lesser doubtful quirks.

    :-) Agreed. But as I have to work with what the 'puter offers me I have
    no other choice than to deal with it.

    [in regard to FSAVE]

    It does have a drawback though: it re-initializes the FPU stack, meaning
    it cannot be used while in the middle of a calculation. Any idea to some
    non-destructive probing ?

    FXSAVE

    Thanks again.


    Blimy! I just realized (did some "thats quaint, what happens if I do
    {this}" probing) that the "ST(x)" argument is relative to the "Stack top" (status word, ST bits). In hindsight that makes sense, but wasn't expected.
    It does make the "Stack Top" value useless for a quick "is it empty" test though.

    Regards,
    Rudy Wieser

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Robert@21:1/5 to R.Wieser on Sat Aug 7 14:19:22 2021
    R.Wieser <address@nospicedham.not.available> wrote in part:
    Moderator, Frank : I'm not sure if questions about the x87
    FPU are permitted here. If not than please just discard.
    If they are than please remove this line. :-)

    Hello all,

    I've just been writing some basic code to parse a simple
    float, and realized that I had no idea how to check if the
    x87 FPU was empty after I was done - as a simple measure
    to check if my code cleaned up correctly.

    You will need FSAVE/FRSTOR (and varients) if you use
    the x87. Your first FLD will clobber the stack top,
    which might be OK only if it is empty.


    I've been looking at using the ST bits in the FPU status word, but had to find that they (unexpectedly) didn't end at zero after I done my thing :

    minimal example:

    fld1 ;Load
    fld1

    fstp st(2) ;Swap ST(0) and ST(1) <-- this is the culprit

    fstp st(0) ;Discard
    fstp st(0)

    At this point all the ST bits are set, indicating a minus one, not zero.

    As another poster has said, I don't think the x87 automagically
    sets value flags (as x86 does( and needs FXAM. FSTSW=FF sounds
    like an empty x87.


    My questions at this point are:

    1) Have I done anything wrong in the above ? I don't think
    so, but "you never know" ....

    2) How do I, for debugging purposes, check the FPU stack ?

    Dump and examine in main memory. Like the Hewlett-Packard
    Reverse Polish Notation calculators it was modelled on,
    the x87 is meant for crunching together, not picking apart.

    -- Robert

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From R.Wieser@21:1/5 to All on Sat Aug 7 17:51:41 2021
    Robert,

    You will need FSAVE/FRSTOR (and varients)

    Wolfgang gave that suggestion too. Alas, the F(N)SAVE resets the FPU stack, and for some reason I can't get the FXSAVE to work (my assembler shows its
    age by not knowing the opcode, and trying to use a "db 0Fh,0AEH, ....." sequence crashes the program).

    Your first FLD will clobber the stack top,

    I don't get that - why only the first one, and why would it clobber (the
    value at) the stack top ?

    As another poster has said, I don't think the x87 automagically
    sets value flags

    I don't quite get this either. Value flags ? I'm reading the "Status
    Word" and in it look at the ST bits (at 11-13).

    Remark : I later found out/realized that the "Stack Top" is just the
    starting offset for the ST(x) arguments. IOW : whats in it isn't really relevant.

    Dump and examine in main memory.

    :-) The problem was that I had no idea that I could or how to do that .

    Ofcourse it didn't help that I got confused by (and by it focussed on) the "Stack Top" value. :-\

    Regards,
    Rudy Wieser

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Robert@21:1/5 to R.Wieser on Sun Aug 8 03:22:58 2021
    R.Wieser <address@nospicedham.not.available> wrote in part:
    Robert,
    You will need FSAVE/FRSTOR (and varients)
    Wolfgang gave that suggestion too. Alas, the F(N)SAVE resets
    the FPU stack, and for some reason I can't get the FXSAVE to work
    (my assembler shows its age by not knowing the opcode, and trying
    to use a "db 0Fh,0AEH, ....." sequence crashes the program).

    It might have some safeguards against executing data :)

    Your first FLD will clobber the stack top,

    I don't get that - why only the first one, and why would
    it clobber (the value at) the stack top ?

    The stack is eight FP registers, any load pushes the one on
    the top into the bit bucket. Actually, I believe the registers
    are a circular file, and the load overwrites and decrements TOS.

    As another poster has said, I don't think the x87 automagically
    sets value flags

    I don't quite get this either. Value flags ? I'm reading the
    "Status Word" and in it look at the ST bits (at 11-13).

    Aren't those three bits (0-7) the Top-of-Stack pointer?
    People sometimes compare the FPSW with the x86 flags register.
    It is not.

    Remark : I later found out/realized that the "Stack Top"
    is just the starting offset for the ST(x) arguments. IOW :
    whats in it isn't really relevant.

    Exactly.

    Dump and examine in main memory.

    :-) The problem was that I had no idea that I could or how to do that .

    Well, debugging always requires more space. x86 assumes
    sufficient stack space (or switches to priviliged memory).

    34 years ago I wrote an extention to MS-DOS DEBUG.COM to
    examine the x87. Converting binaryFP to decimal FP was hard.

    Ofcourse it didn't help that I got confused by (and by it focussed on)
    the "Stack Top" value. :-\

    Well, quite forgivable. The x87 is focussed on the stack.

    -- Robert

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From R.Wieser@21:1/5 to All on Sun Aug 8 10:11:47 2021
    Robert,

    It might have some safeguards against executing data :)

    I've used the "trick" before, so I don't think so. Currently I'm torn
    between the posibilities that the processor I'm using might not be having
    that command, that I'm simply bungling up or that there is some kind of
    memory alignment involved (the latter one would not be the first time I've
    run into it).

    Is there any possibility you could take a look at and post what code gets generated for an "FXSAVE {register pointer}" ?

    I don't get that - why only the first one, and why would
    it clobber (the value at) the stack top ?

    The stack is eight FP registers, any load pushes the one
    on the top into the bit bucket.

    True. But such a push would only clobber anything if the (circular) stack
    is completely full.

    Actually, I believe the registers are a circular file,

    It has to be, as my example code works : after the second FLD1 the TOS is 6. But I can still execute a FSTP ST(2) ,which seemingly points at 6+2 = 8.

    and the load overwrites and decrements TOS.

    The info to, for instance, FLD mentions decrementing first, than store
    (which is why I didn't understand your "clobbering" remark).

    Aren't those three bits (0-7) the Top-of-Stack pointer?

    Yep. I was assuming that that value would (implicitily) tell me how many values where placed on the stack. Turns out it doesn't. :-\

    People sometimes compare the FPSW with the x86 flags register.
    It is not.

    Similar perhaps (both contain status flags), but (ofcourse) not the same.

    34 years ago I wrote an extention to MS-DOS DEBUG.COM
    to examine the x87.

    I'm not sure what you mean with an 'extension' (wasn't aware that Debug supported such a thing), but years ago I wrote something for it (using
    memory patching) so it could deal with a few more opcodes.

    Converting binaryFP to decimal FP was hard.

    Thats something I still have to take a look at. Just not at this moment.
    :-)

    Regards,
    Rudy Wieser

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From wolfgang kern@21:1/5 to R.Wieser on Sun Aug 8 12:03:39 2021
    On 07.08.2021 17:51, R.Wieser wrote:
    ...
    and for some reason I can't get the FXSAVE to work (my assembler shows its age by not knowing the opcode, and trying to use a "db 0Fh,0AEH, ....." sequence crashes the program).

    on older CPUs 0F AE xx will raise exception 6 [illegal opcode] if:

    1) bit 5 of xx is 1 (xx 20..3F, 60..7F, A0..BF)
    newer CPU may show a few valid instructions (see sandpile.org)

    2) mod=3 aka register operand (C0..FF) [memory only!]

    3) may raise EXC_6 if not supported
    0F AE 90..97 98..9f mean STMXCSR LDMXCSR [support specific]

    so I'd recommend either
    0F AE 06 00 xx FXSAVE [xx00h] (needs 512 byte DS: buffer !)
    or shorter
    0F AE 00 FXSAVE [bx+si] (ditto)
    or HLL styled :)
    0F AE 46 00 FXSAVE [bp+0] (needs 512 byte on SS: stack)
    __
    wolfgang

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Robert Prins@21:1/5 to R.Wieser on Sun Aug 8 11:26:25 2021
    On 2021-08-08 08:11, R.Wieser wrote:
    Robert,

    It might have some safeguards against executing data :)

    I've used the "trick" before, so I don't think so. Currently I'm torn between the posibilities that the processor I'm using might not be having that command, that I'm simply bungling up or that there is some kind of memory alignment involved (the latter one would not be the first time I've run into it).

    Is there any possibility you could take a look at and post what code gets generated for an "FXSAVE {register pointer}" ?

    I don't get that - why only the first one, and why would
    it clobber (the value at) the stack top ?

    The stack is eight FP registers, any load pushes the one
    on the top into the bit bucket.

    True. But such a push would only clobber anything if the (circular) stack
    is completely full.

    Actually, I believe the registers are a circular file,

    It has to be, as my example code works : after the second FLD1 the TOS is 6. But I can still execute a FSTP ST(2) ,which seemingly points at 6+2 = 8.

    and the load overwrites and decrements TOS.

    The info to, for instance, FLD mentions decrementing first, than store
    (which is why I didn't understand your "clobbering" remark).

    Aren't those three bits (0-7) the Top-of-Stack pointer?

    Yep. I was assuming that that value would (implicitily) tell me how many values where placed on the stack. Turns out it doesn't. :-\

    People sometimes compare the FPSW with the x86 flags register.
    It is not.

    Similar perhaps (both contain status flags), but (ofcourse) not the same.

    34 years ago I wrote an extention to MS-DOS DEBUG.COM
    to examine the x87.

    I'm not sure what you mean with an 'extension' (wasn't aware that Debug supported such a thing), but years ago I wrote something for it (using
    memory patching) so it could deal with a few more opcodes.

    Converting binaryFP to decimal FP was hard.

    Thats something I still have to take a look at. Just not at this moment.

    You still haven't told us what OS (DOS, Windoze, Linux) or CPU (32/64 bit) you're running this code on....

    David Lindauer's GRDB (DOS) can show the contents of FPU registers, and as you are/were a Pascal user, so can, I think Delphi. Virtual Pascal can definitely do
    it, I use the (sadly) wrapping code below:

    {************** Copyright (C) Robert AH Prins 2018-2018 ****************
    * *
    * This program is free software; you can redistribute it and/or modify *
    * it under the terms of the GNU General Public License as published by *
    * the Free Software Foundation; either version 3, or (at your option) *
    * any later version. *
    * *
    * This program is distributed in the hope that it will be useful, *
    * but WITHOUT ANY WARRANTY; without even the implied warranty of *
    * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the *
    * GNU General Public License for more details. *
    * *
    * You should have received a copy of the GNU General Public License *
    * along with this program; if not, write to the Free Software *
    * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110, USA * ************************************************************************ +------------+---------------------------------------------------------+
    | Date | Major changes | +------------+---------------------------------------------------------+
    | | | +------------+---------------------------------------------------------+
    | 2018-09-30 | Add x_int3 to selectively enable debug code | +------------+---------------------------------------------------------+
    | 2018-08-31 | Initial version | +------------+---------------------------------------------------------+ ************************************************************************
    * DEBUG.PAS *
    * *
    * This unit contains some code that enables viewing of extended (XMM & *
    * YMM) registers in various formats. * ***********************************************************************}
    unit debug;

    {============================} interface {=============================}
    const x_int3: boolean = false;

    type
    r_fpu = record { 16}
    st : extended;
    zz : array [0..5] of byte;
    end;

    r_mmx = record { 16}
    case integer of
    1: (_by: array [0..7] of byte;
    z1 : array [0..7] of byte);
    2: (_in: array [0..3] of shortint;
    z2 : array [0..7] of byte);
    3: (_lo: array [0..1] of longint;
    z3 : array [0..7] of byte);
    4: (_si: array [0..1] of single;
    z4 : array [0..7] of byte);
    5: (_do: array [0..0] of double;
    z5 : array [0..7] of byte);
    6: (_ch: array [0..7] of char;
    z0 : array [0..7] of byte);
    end;

    r_xmm = record { 16}
    case integer of
    1: (_by: array [0..15] of byte);
    2: (_in: array [0.. 7] of shortint);
    3: (_lo: array [0.. 3] of longint);
    4: (_si: array [0.. 3] of single);
    5: (_do: array [0.. 1] of double);
    6: (_ch: array [0..15] of char);
    end;

    xsave_hdr = array [0..63] of byte; { 64}

    fpu = array [0..7] of r_fpu; { 128}
    mmx = array [0..7] of r_mmx; { 128}
    xmm = array [0..7] of r_xmm; { 128}

    xsptr = ^a_xs;
    a_xs = record
    case integer of
    1: (legacy : array [0..159] of char; { 160} // raw
    legacy data
    xmm_32 : xmm; { 128} // XMM0-7 (low part of YMM0-7)
    xmm_64 : xmm; { 128} // XMM8-15 (low part of YMM8-15) (AMD64)
    xsave_hdr: xsave_hdr; { 64} // Storage bitmap for additional data
    ymm_32 : xmm; { 128} // YMM0-7 (high part, low in XMM0-XMM7)
    ymm_64 : xmm); { 128} // YMM8-15 (high part, low in XMM8-XMM15) (AMD64)

    2: (fcw : smallword; { 2} // x87
    FPU control word
    fsw : smallword; { 2} // x87
    status word
    ftw : byte; { 1} // x87
    res_1 : byte; { 1}
    fop : smallword; { 2} // x87
    last opcode
    fip : longint; { 4} // x87 EIP
    fcs : smallword; { 2} // x87 CS:
    res_1_x64: smallword; { 2} // + previous: RIP (AMD64)
    fdp : longint; { 4} // x87
    data pointer
    fds : smallword; { 2} // x87 DS:
    res_2_x64: smallword; { 2} // + previous: DIP (AMD64)
    mxcsr : longint; { 4} // SSE
    control word
    mxcsr_msk: longint; { 4}

    case integer of
    3: (fpu: fpu); { 128} // x87
    FPU registers
    4: (mmx: mmx)); { 128} // x86
    MMX registers

    3: (raw : array [0..1023] of byte); { 1024} // just raw data
    end;

    procedure xsave;

    {==========================} implementation {==========================}

    {***********************************************************************
    * XSAVE: *
    * *
    * Save the entire processor state for debugging purposes * ***********************************************************************} procedure xsave; assembler; {&uses none} {&frame+}
    var xs: array [0..2047] of char;
    var xp: xsptr;

    asm
    //a-in xsave
    cmp x_int3, true
    jne @99

    pushad

    //------------------------------------------------------------------
    // clear out save area
    //------------------------------------------------------------------
    lea edi, xs
    xor eax, eax
    mov ecx, type xs / 4
    rep stosd

    //------------------------------------------------------------------
    // save area must be aligned on 64-byte boundary
    //------------------------------------------------------------------
    lea edi, xs
    add edi, 63
    and edi, -64
    mov xp, edi

    //------------------------------------------------------------------
    // save everything that can be saved
    //------------------------------------------------------------------
    or eax, -1
    or edx, -1
    { xsave [edi] } db $0f,$ae,$27

    //------------------------------------------------------------------
    // display data in "Watches" window
    // - xp^ : all
    // - xp^.fpu : all FPU registers as extended
    // - xmm_32[0]._lo: contents of XMM0 as 4 longints
    // - etc...
    //------------------------------------------------------------------
    int 3

    popad

    @99:
    //a-out
    end; {xsave}

    end.

    Robert
    --
    Robert AH Prins
    robert(a)prino(d)org
    The hitchhiking grandfather - https://prino.neocities.org/indez.html
    Some REXX code for use on z/OS - https://prino.neocities.org/zOS/zOS-Tools.html

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From R.Wieser@21:1/5 to The idea is that I would be able to on Sun Aug 8 12:47:11 2021
    Robert,

    You still haven't told us what OS (DOS, Windoze, Linux) or CPU (32/64 bit) you're running this code on....

    My apologies, I did not think that it would matter (still don't, but ...).

    The OS is Windows, XP pro sp3, 32 bit. The used environment is Borlands
    Tasm v5.2 (Assembler).

    David Lindauer's GRDB (DOS) can show the contents of FPU registers

    The idea is that I would be able to write such FPU debugging code myself. Somehow I like it that way. :-)

    // save area must be aligned on 64-byte boundary
    ...
    { xsave [edi] } db $0f,$ae,$27

    Both where what I was looking for. Thanks.

    Alas, I still can't get it to work :

    lea edi,[@@Foo] ;size is 2000h. Plenty of space.
    add edi,003Fh ;[1]
    and edi,not 003Fh
    or eax,-1 ;Not mentioned in my docs, but ...
    or edx,-1
    db 0Fh,0AEh,27h ;xsave [edi]

    It still "crashes" ("{program.exe}has encountered a problem and needs to
    close. We are sorry for the inconvenience.")

    [1] My "The IA-32 Intel Architecture Software Developer's Manual, Volume 2" mentions an alignment of 16.

    Any ideas ?

    Regards,
    Rudy Wieser

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From wolfgang kern@21:1/5 to wolfgang kern on Sun Aug 8 13:41:35 2021
    On 08.08.2021 12:03, wolfgang kern wrote:
    On 07.08.2021 17:51, R.Wieser wrote:
    ...
    and for some reason I can't get the FXSAVE to work (my assembler shows
    its
    age by not knowing the opcode, and trying to use a "db 0Fh,0AEH, ....."
    sequence crashes the program).

    on older CPUs  0F AE xx will raise exception 6 [illegal opcode] if:

    1) bit 5 of xx is 1  (xx 20..3F, 60..7F, A0..BF)
      newer CPU may show a few valid instructions (see sandpile.org)

    2) mod=3 aka register operand (C0..FF) [memory only!]

    3) may raise EXC_6 if not supported
       0F AE 90..97 98..9f  mean STMXCSR LDMXCSR [support specific]

    so I'd recommend either
       0F AE 06 00 xx  FXSAVE [xx00h]  (needs 512 byte DS: buffer !)
    or shorter
       0F AE 00        FXSAVE [bx+si]  (ditto)
    or HLL styled :)
       0F AE 46 00     FXSAVE [bp+0]   (needs 512 byte on SS: stack)

    you seem to work with 32 bit:

    0F AE 07 FXSAVE [edi]

    you used 27, so I were confused and had you look at my AMD docs,
    it says: FXSAVE mem512env 0F AE /0 this Zero means bits 3..5
    and I also checked on sandpile.org.
    0F AE /4 means XSAVE (it's for CPU status and not for the FPU)
    __
    wolfgang

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From R.Wieser@21:1/5 to All on Sun Aug 8 13:35:46 2021
    Wolfgang,

    so I'd recommend either
    ...
    or shorter
    0F AE 00 FXSAVE [bx+si] (ditto)

    For testing purposes I tend to go with the most basic one first, so I took
    that one.
    Remark: I'm on XP 32 bit, so the registers are EBX and ESI respectivily.

    Alas, same problem : crash.

    Aligning [EBX+ESI] on a 64 byte boundary (as suggested by robert prins) did
    not make a difference.

    I'm starting to lean towards the possibility that the command is refused
    (does not exist). Is there any way to check it ?

    Regards,
    Rudy Wieser

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From wolfgang kern@21:1/5 to R.Wieser on Sun Aug 8 14:16:43 2021
    On 08.08.2021 13:35, R.Wieser wrote:

    so I'd recommend either
    ...
    or shorter
    0F AE 00 FXSAVE [bx+si] (ditto)

    For testing purposes I tend to go with the most basic one first, so I took that one.
    Remark: I'm on XP 32 bit, so the registers are EBX and ESI respectivily.

    Alas, same problem : crash.

    Aligning [EBX+ESI] on a 64 byte boundary (as suggested by robert prins) did not make a difference.

    I'm starting to lean towards the possibility that the command is refused (does not exist). Is there any way to check it ?

    look up CPUID, one of the returned bits tell if present or not.
    __
    wolfgang

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From wolfgang kern@21:1/5 to R.Wieser on Sun Aug 8 14:57:01 2021
    On 08.08.2021 13:35, R.Wieser wrote:

    0F AE 00 FXSAVE [bx+si] (ditto)

    For testing purposes I tend to go with the most basic one first, so I took that one.
    Remark: I'm on XP 32 bit, so the registers are EBX and ESI respectivily.

    Alas, same problem : crash.

    Aligning [EBX+ESI] on a 64 byte boundary (as suggested by robert prins) did not make a difference.

    within 32 bit:
    0F AE 00 is FXSAVE [eax] uses DS:
    __
    wolfgang

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From R.Wieser@21:1/5 to All on Sun Aug 8 16:00:47 2021
    Wolfgang,

    you seem to work with 32 bit:

    I am. Didn't think it would matter much.

    0F AE 07 FXSAVE [edi]

    I just tried that one, and it worked ! (got 288 bytes of data though, not 512) As a result I'm now thoroughly confused in regard to the mod, reg, r/m encoding. I tried different ones, but only got crashes.

    you used 27, so I were confused and had you look at my AMD docs,

    That value was suggested by Robert (in his code). And as I didn't get
    anywhere ...


    Oh blimy - I don't know how I did it, but I just noticed that I somehow
    mixed up the 16 and 32-bit mod/reg/rm encodings. With the MOD and REG both being zero the by R/M targetted registers are rather different between them. :-|

    Bottom line: I made a stupid mistake, created non-working code and got
    myself confused as a result. And as I presumptiously forgot to mention the basics of what I was busy with (32-bit coding) I did really help you guys
    find the cause of it. My apologies for that.

    Regards,
    Rudy Wieser

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From wolfgang kern@21:1/5 to R.Wieser on Sun Aug 8 16:35:02 2021
    On 08.08.2021 16:00, R.Wieser wrote:

    you seem to work with 32 bit:

    I am. Didn't think it would matter much.

    0F AE 07 FXSAVE [edi]

    I just tried that one, and it worked ! (got 288 bytes of data though, not 512) As a result I'm now thoroughly confused in regard to the mod, reg, r/m encoding. I tried different ones, but only got crashes.

    IIRC we got 288 bytes with FSAVE long, 512 bytes may be just the
    required buffer size.

    you used 27, so I were confused and had you look at my AMD docs,
    That value was suggested by Robert (in his code). And as I didn't get anywhere ...

    Oh blimy - I don't know how I did it, but I just noticed that I somehow
    mixed up the 16 and 32-bit mod/reg/rm encodings. With the MOD and REG both being zero the by R/M targetted registers are rather different between them. :-|

    Bottom line: I made a stupid mistake, created non-working code and got
    myself confused as a result. And as I presumptiously forgot to mention the basics of what I was busy with (32-bit coding) I did really help you guys find the cause of it. My apologies for that.

    I was once there as well :) experience can't be bought!
    just fine that we could help, no need for apology.
    __
    wolfgang

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From R.Wieser@21:1/5 to All on Sun Aug 8 16:30:38 2021
    I did really help you guys find the cause of it. My apologies for that.

    Ehrms ... "I did *not* really help" ofcourse.

    Regards,
    Rudy Wieser

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From R.Wieser@21:1/5 to All on Sun Aug 8 18:55:24 2021
    Wolfgang,

    IIRC we got 288 bytes with FSAVE long, 512 bytes may be just the required buffer size.

    Those 512 bytes do (currently) not seem to be /required/. I initialized the buffer using a specific byte, and by it could see that nothing from 288 and
    up was touched (the "reserved" areas below it however where).

    Perhaps that 288-and-up "reserved" area is ment for future generations of
    the x87 FPU.

    I was once there as well :) experience can't be bought!

    I can only hope that I remember it for quite a while.

    just fine that we could help,

    And thanks for that.

    no need for apology.

    :-) In that case you may regard it as an explanation of what the problem actually was. I know that when I try to help someone I often get curious
    to it.

    Regards,
    Rudy Wieser

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Robert@21:1/5 to R.Wieser on Sun Aug 8 22:50:02 2021
    R.Wieser <address@nospicedham.not.available> wrote in part:
    Robert,
    It might have some safeguards against executing data :)

    I've used the "trick" before, so I don't think so. Currently I'm
    torn between the posibilities that the processor I'm using might
    not be having that command, that I'm simply bungling up or that
    there is some kind of memory alignment involved (the latter one
    would not be the first time I've run into it).

    Well, please make sure the pointer is correct (trash easily
    gets caught in the upper bits in mixed-mode) and your pgm owns
    the memory it points at. Otherwise, segfault.

    Actually, I believe the registers are a circular file,
    It has to be, as my example code works : after the second FLD1 the TOS is 6. But I can still execute a FSTP ST(2) ,which seemingly points at 6+2 = 8.

    Ah, but circularity is achieved by masking, 8=0 when masked at 3bits.

    34 years ago I wrote an extention to MS-DOS DEBUG.COM
    to examine the x87.

    I'm not sure what you mean with an 'extension' (wasn't aware that
    Debug supported such a thing), but years ago I wrote something for it
    (using memory patching) so it could deal with a few more opcodes.

    Very similar. I added code and patched the command jump table
    to enter it when commanded.

    -- Robert

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From R.Wieser@21:1/5 to All on Mon Aug 9 09:27:16 2021
    Robert,

    Well, please make sure the pointer is correct

    :-) And how do you propose that should be done ? It sounds like a great idea, but ...

    (trash easily gets caught in the upper bits in mixed-mode)

    Somewhere along the line I forgot to mention that I was programming in
    32-bit mode (under Win XP). So, no mixed mode and no trash in the upper
    bits.

    Ah, but circularity is achieved by masking, 8=0 when masked at 3bits.

    Well ... It /can/ be achieved that way, but only under certain conditions (related to origin and size). :-)

    The problem has been located though : I simply used the wrong R/M value
    while hand-encoding the FXSAVE command (likely mixing up the 16 bit table
    with the 32 bit one). IOW, I was providing the target addres in a certain register while the command expected it in another register/form.

    Regards,
    Rudy Wieser

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From R.Wieser@21:1/5 to All on Mon Aug 9 15:50:18 2021
    Robert

    Well, please make sure the pointer is correct

    :-) And how do you propose that should be done ?
    It sounds like a great idea, but ...
    ...
    use MOV.

    How would that change anything ? If the target for an FXSAVE is wrong
    enough that it causes an exception, how /wouldn't/ that be in the same way wrong for a MOV ? (lets forget about alignment for a moment)

    It would even be making the problem larger, as you would than need to pick a REG value too - and wonder if it perhaps is having a negative influence on
    the result.

    FWI, I tried several R/M values, none of which wanted to work. Bad luck I guess.

    In retrospect I should perhaps have tried loading all the common registers
    with the same value and tried all R/M values until something worked. On success it would be a case of determining which register is the source, and than look back at the instruction set to find a match - and from it figure
    out what the/my mistake was.

    Zero origin, power-of-two size. Check on both.
    Ever wonder why there are so many buffers this way?

    No, never. Really ... <whistle>

    Debugging with MOV test (hand-assembled) could have caught.

    I doubt it. See above.

    Regards,
    Rudy Wieser

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Robert@21:1/5 to R.Wieser on Mon Aug 9 13:08:37 2021
    R.Wieser <address@nospicedham.not.available> wrote in part:
    Robert,
    Well, please make sure the pointer is correct

    :-) And how do you propose that should be done ?
    It sounds like a great idea, but ...

    Walk before you run, when in trouble, drop back. Before trying
    a potentially troublesome instruction like FXSAVE, use MOV.
    Even hand-assemble from hex if those facilities are in doubt:

    MOV EAX, "pointer" ; to see if you can read loc
    MOV "pointer", EAX ; to see if you can write


    (trash easily gets caught in the upper bits in mixed-mode)

    Somewhere along the line I forgot to mention that I was programming in 32-bit mode (under Win XP). So, no mixed mode and no trash in the upper bits.

    I don't think XP does 64, but the CPU might. The upper-upper could
    get trash. ISTR needing to set something to get IN/OUT to work.


    Ah, but circularity is achieved by masking, 8=0 when masked at 3bits.

    Well ... It /can/ be achieved that way, but only under
    certain conditions (related to origin and size). :-)

    Zero origin, power-of-two size. Check on both.
    Ever wonder why there are so many buffers this way?

    The problem has been located though : I simply used the wrong R/M value
    while hand-encoding the FXSAVE command (likely mixing up the 16 bit table with the 32 bit one). IOW, I was providing the target addres in a certain register while the command expected it in another register/form.

    Debugging with MOV test (hand-assembled) could have caught.

    -- Robert

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Robert Redelmeier@21:1/5 to R.Wieser on Mon Aug 9 14:56:52 2021
    R.Wieser <address@nospicedham.not.available> wrote in part:
    Robert
    Well, please make sure the pointer is correct
    :-) And how do you propose that should be done ?
    It sounds like a great idea, but ...
    use MOV.

    How would that change anything ? If the target for
    an FXSAVE is wrong enough that it causes an exception,
    how /wouldn't/ that be in the same way wrong for a MOV ?
    (lets forget about alignment for a moment)

    It is a purer memory test. I thought there was question
    of whether FXSAVE was available or supported on your CPU.
    This checks opcode encoding too.

    It would even be making the problem larger, as you would
    than need to pick a REG value too - and wonder if it perhaps
    is having a negative influence on the result.

    All GP registers should be available at all times.

    FWI, I tried several R/M values, none of which wanted
    to work. Bad luck I guess.

    Encoding should not be a guessing game.
    The odds are bad, <1% .

    In retrospect I should perhaps have tried loading all the
    common registers with the same value and tried all R/M
    values until something worked. On success it would be
    a case of determining which register is the source, and
    than look back at the instruction set to find a match -
    and from it figure out what the/my mistake was.

    x86 has quirky indirect addressing modes that
    are unlikely to yield to trial-and-error.

    -- Robert

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From R.Wieser@21:1/5 to All on Mon Aug 9 18:11:26 2021
    Robert,

    It is a purer memory test.

    In what way ? And mind you, I already adressed that.

    I thought there was question of whether FXSAVE was available
    or supported on your CPU.

    As I could not get a working FXSAVE encoding I started to doubt.

    This checks opcode encoding too.

    No need for that, as those two bytes came from an opcode list. The only unknown part was the adressing of the target memory.

    All GP registers should be available at all times.

    Agreed. But it is an extra factor, and as such interference.

    Encoding should not be a guessing game.

    What makes you think I was ? I tried a few different R/M encodings
    (while providing different registers), and none of them wanted to work.
    Hence my (above) described doubt to if the command was available on my 'puter/processor. (read: I was quite certain I did it "by the book")

    But when you /know/ something ought to work and you cannot make it so than a pragmatic approach will be called for. Which includes throwing everything
    and the kitchen sink at it to see if /something/ will work. And from that
    try to reason back why it does and where you went wrong with the first attempts.

    x86 has quirky indirect addressing modes that
    are unlikely to yield to trial-and-error.

    True. But I would not be looking for those. Just a simple one that
    /does/ function. From that foot-in-the-door the rest often follows.

    And that is effectivily what happened when Wolfgang supplied me with a
    working encoding for FXSAVE [EDI] : while trying to match the 0x07 to the mod,reg,r/m tables I had used I realized I had been using the wrong one. It was as simple as that.

    Regards,
    Rudy Wieser

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Robert Prins@21:1/5 to R.Wieser on Tue Aug 10 15:54:47 2021
    On 2021-08-09 16:11, R.Wieser wrote:

    And that is effectivily what happened when Wolfgang supplied me with a working encoding for FXSAVE [EDI] : while trying to match the 0x07 to the mod,reg,r/m tables I had used I realized I had been using the wrong one. It was as simple as that.
    Use

    <https://defuse.ca/online-x86-assembler.htm>

    for all your "db" needs. I use it "all the time" to get P5+ opcodes for Virtual Pascal in-line assembler, I've become a huge fan of using AVX instructions, and miraculously, most of the data structures I was using in 1985 (TP3), then 16-bit, now 32-bit, are almost perfectly suited for XMM and YMM code, go figure!

    Robert
    --
    Robert AH Prins
    robert(a)prino(d)org
    The hitchhiking grandfather - https://prino.neocities.org/indez.html
    Some REXX code for use on z/OS - https://prino.neocities.org/zOS/zOS-Tools.html

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From R.Wieser@21:1/5 to All on Tue Aug 10 16:46:26 2021
    Robert,

    Use

    <https://defuse.ca/online-x86-assembler.htm>

    for all your "db" needs.

    Thank you very much. It will certainly come in handy. :-)

    ... and it doesn't even need JS to "do its thing". <thumbs up>

    Regards,
    Rudy Wieser

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kerr-Mudd, John@21:1/5 to R.Wieser on Tue Aug 10 20:53:41 2021
    On Tue, 10 Aug 2021 16:46:26 +0200
    "R.Wieser" <address@nospicedham.not.available> wrote:

    Robert,

    Use

    <https://defuse.ca/online-x86-assembler.htm>

    for all your "db" needs.

    Thank you very much. It will certainly come in handy. :-)

    ... and it doesn't even need JS to "do its thing". <thumbs up>

    I tried mov ax,bx and got
    6689D8

    I guess x86 means 32bit nowadays!

    --
    Bah, and indeed Humbug.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From R.Wieser@21:1/5 to All on Wed Aug 11 10:00:00 2021
    John,

    I tried mov ax,bx and got
    6689D8

    I guess x86 means 32bit nowadays!

    Not for the people in this newsgroup perhaps, but for the majority of users
    out there ? Certainly.

    But yes, I noticed that too. A 16-bit option would have been nice to have.

    Regards,
    Rudy Wieser

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to John" on Wed Aug 11 07:59:34 2021
    "Kerr-Mudd, John" <admin@nospicedham.127.0.0.1> writes:
    I guess x86 means 32bit nowadays!

    That's the problem with "x86": People use it to mean any of several
    different ISAs. So better avoid that term, and use:

    8086 (rarely called IA-16) when you mean that instruction set.
    IA-32 when you mean that instruction set (first implementation: 80386)
    AMD64 when you mean that instruction set (first implementation: AMD K8
    (Opteron, Athlon 64))

    And then there are extensions, like the additional 80186 and 80286
    instructions (plus the 80286 offers protected mode), or SSE, SSE2,
    AVX, ...

    Now what does that mean for the name of this newsgroup.

    - anton
    --
    M. Anton Ertl Some things have to be seen to be believed anton@mips.complang.tuwien.ac.at Most things have to be believed to be seen http://www.complang.tuwien.ac.at/anton/home.html

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From wolfgang kern@21:1/5 to Anton Ertl on Wed Aug 11 10:45:44 2021
    On 11.08.2021 09:59, Anton Ertl wrote:
    "Kerr-Mudd, John" <admin@nospicedham.127.0.0.1> writes:
    I guess x86 means 32bit nowadays!

    That's the problem with "x86": People use it to mean any of several
    different ISAs. So better avoid that term, and use:

    8086 (rarely called IA-16) when you mean that instruction set.
    IA-32 when you mean that instruction set (first implementation: 80386)
    AMD64 when you mean that instruction set (first implementation: AMD K8
    (Opteron, Athlon 64))

    And then there are extensions, like the additional 80186 and 80286 instructions (plus the 80286 offers protected mode), or SSE, SSE2,
    AVX, ...

    Now what does that mean for the name of this newsgroup.

    I wont recommend to split our CLAX into several CPU-related groups.
    all Intel/AMD 16 bit instruction sets are different for CPU families.
    And almost all regular readers of this group are aware of this anyway.
    __
    wolfgang

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From wolfgang kern@21:1/5 to John on Wed Aug 11 10:34:11 2021
    On 10.08.2021 21:53, Kerr-Mudd, John wrote:

    <https://defuse.ca/online-x86-assembler.htm>

    I tried mov ax,bx and got
    6689D8
    I guess x86 means 32bit nowadays!

    :) of course!
    16 bit code will soon just belong to history.

    I'm happy to have my own 16/32/64bit disassembler although it already
    needs many updates now, but it saves me from internet access.

    89 D8 is the STORE variant which should only be used for memory write.

    8B c3 would be the correct LOAD opcode. I fight for this since decades
    but no one ever listened, so Intel and AMD will never get rid of this
    doubles and can't make space for 64 other instructions with 89 mod 3,
    like the added valid opcodes for the former illegal 8F08 and 8F10.
    __
    wolfgang

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)