• High level - low level

    From Robert Prins@21:1/5 to All on Sun Aug 8 16:42:43 2021
    Again a post that never showed up. Replies to clax86 seem to work, new postings do not???


    -------- Forwarded Message --------
    Subject: High level - low level
    Date: Sun, 8 Aug 2021 11:07:28 +0000
    From: Robert Prins <robert(*)prino.org>
    Newsgroups: comp.lang.asm.x86

    After nearly six months, I've started having another look at the code in my hitchhike statistics programs, see lift32bit.rar, password "lift" @ <https://goo.gl/ZN3XAB>.

    The programs were originally written in Turbo Pascal V3.01a, changed to use its "inline()", byte-encoded assembler, converted to pure Pascal again for Turbo Pascal 6.0, then changed to use inline assembler, converted to Virtual Pascal (with a surprisingly small effort to convert the inline assembler code), and right now they are probably well over 95% inline assembler, which in many cases no longer bears any resemblance to the original compiler generated code.

    The reason for returning to the code? One heading needed updating, and jQuery, used in one of the generated html files, has been upgraded to V3.6.0. Obviously I decided to have another look at the code, and while doing so, I realised that there are three places where I do min/max compares on three contiguous longint variables, and hey, weren't there some new instructions, "vp[max/min]sd" to do those in parallel? A bit of restructuring, to move a fourth irrelevant longint after the three already there, allowed me to use them, and that makes me wonder:

    Would a modern C(++) compiler be able to do such a transformation? Obviously not
    by moving the fields in the structure, but in this case using a (V)PMAXSD variant to compare just two of the three, even if the code first does max/min compares on a per field basis?

    Which of course leads to the more general question, if I were to convert, I still keep copies of the "Pure Pascal" or PL/I (also on my Google Drive, "lift.pli") versions of the program to C, how would the generated code (MSVC, GCC, Intel, all "32-bit + opt(max)") compare to what I've been hacking together,
    given that my assembler skills might just be a bit over average? Maybe someone could show me what a C compiler would generate for

    var n : double; {Local temporary}
    var c : double; {Local temporary}
    var yyyy: longint;
    var mm : longint;
    var dd : longint;
    var jdn : longint;

    n:= 1.0 * jdn - 1721119.2;
    c:= trunc(n / 36524.25);

    if jdn < 2299161 then
    n:= n + 2
    else
    n:= n + c - trunc(c / 4);

    yyyy:= trunc(n / 365.25);
    n := n - trunc(365.25 * yyyy) - 0.3;
    mm := trunc(n / 30.6);
    dd := trunc(n - 30.6 * mm + 1);

    if mm > 9 then
    begin
    dec(mm, 9);
    inc(yyyy);
    end
    else
    inc(mm, 3);

    Would these compilers be smart enough to? ;)

    And how would you code this in 32-bit assembler, when you can use your grey-cell
    based RI processor?

    Or a Heapsort where the elements to be sorted are pointers in an array on the heap, pointing to the sort fields, i.e. something like

    while (not ready) and
    (_j <= n) do
    begin
    if _j < n then
    begin
    _k:= succ(_j);

    if (sort^[_j]^.major < sort^[_k]^.major) or
    (sort^[_j]^.major = sort^[_k]^.major) and
    (sort^[_j]^.minor < sort^[_k]^.minor) then
    inc(_j);
    end;

    if (rra^.major < sort^[_j]^.major) or
    (rra^.major = sort^[_j]^.major) and
    (rra^.minor < sort^[_j]^.minor) then
    begin
    sort^[_i]:= sort^[_j];
    _i := _j;
    _j := _j * 2;
    end
    else
    _j:= succ(n);
    end;

    where major and minor are longint fields, and again as 32-bit code.

    Let's just say that the code generated by VP for the above is not much better than the code originally generated by TP 3.01a...

    So where do compilers stand today, if you've read this snippet from Paul Hsieh: <http://www.azillionmonkeys.com/qed/optimize.html#asmdebate>?

    For example, how much faster and/or smaller would an x86/AMD64 assembler implementation of <http://www.jhauser.us/arithmetic/SoftFloat.html> be when written in assembler?

    Robert
    --
    Robert AH Prins
    robert(a)prino(d)org
    The hitchhiking grandfather - https://prino.neocities.org/indez.html
    Some REXX code for use on z/OS - https://prino.neocities.org/zOS/zOS-Tools.html

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Frank Kotler@21:1/5 to Robert Prins on Sun Aug 8 17:31:29 2021
    On 08/08/2021 12:42 PM, Robert Prins wrote:
    Again a post that never showed up. Replies to clax86 seem to work, new postings do not???

    Sorry, Robert. I'm "pretty" sore posts that make it to the submission
    mailbox do get posted. Nothing I can do, if not.

    Best,
    Frank

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)