• Fwd: ITOA in 65 bytes

    From Robert Prins@21:1/5 to Terje Mathisen on Sun May 16 21:43:10 2021
    -------- Forwarded Message --------
    Subject: Re: ITOA in 65 bytes
    Date: Sat, 15 May 2021 19:01:58 +0000
    From: Robert Prins <robert@prino.org>
    Newsgroups: comp.lang.asm.x86
    References: <s7m8nd$f4p$1@dont-email.me> <s7mkfs$196d$1@gioia.aioe.org>

    On 2021-05-14 19:53, Terje Mathisen wrote:
    Robert Prins wrote:
    Is it possible to fit the conversion of a 16 bit signed integer to
    left-aligned ASCII without leading zeroes in just 65 bytes?

    Naively I would say yes: I assume you don't care about speed here?

    You're getting soft... ;)

    Anyway, this is the original TP3 RTL code, which uses repeated subtractions, as disassembled by Hex-Rays' IDA Pro

    111B intasc proc near
    111B 0B C0 or ax, ax
    111D 79 06 jns short iapos
    111F F7 D8 neg ax
    1121 C6 07 2D mov byte ptr [bx], '-'
    1124 43 inc bx
    1125
    1125 iapos:
    1125 32 ED xor ch, ch
    1127 BA 10 27 mov dx, 10000
    112A E8 15 00 call iadigit
    112D BA E8 03 mov dx, 1000
    1130 E8 0F 00 call iadigit
    1133 BA 64 00 mov dx, 100
    1136 E8 09 00 call iadigit
    1139 B2 0A mov dl, 10
    113B E8 04 00 call iadigit
    113E 8A C8 mov cl, al
    1140 EB 14 jmp short iadput
    1140 intasc endp
    1140
    1142
    1142 iadigit proc near
    1142
    1142 32 C9 xor cl, cl
    1144
    1144 iadsub:
    1144 FE C1 inc cl
    1146 2B C2 sub ax, dx
    1148 73 FA jnb short iadsub
    114A 03 C2 add ax, dx
    114C FE C5 inc ch
    114E FE C9 dec cl
    1150 75 04 jnz short iadput
    1152 FE CD dec ch
    1154 74 06 jz short iadnoput
    1156
    1156 iadput:
    1156 80 C1 30 add cl, '0'
    1159 88 0F mov [bx], cl
    115B 43 inc bx
    115C
    115C iadnoput:
    115C C3 retn
    115C iadigit endp


    ;; AX has the value to be converted to ascii
    ;; Store the string to the buffer pointed to by DI

      xor cx,cx    ; Count how many digits we find
      test ax,ax    ; Positive?
       jge next

    ;; Negative input value, so print a '-' sign
      mov byte ptr [di],'-'
      neg ax
      inc di

    next:
      xor dx,dx
      mov bx,10
      div bx
      push dx    ; Remainder is the digit
      inc cx
      test ax,ax    ; Is it zero yet?
       jnz next

    dump_digits:
      pop ax
      add al,'0'
      stosb
       loop dump_digits

    That looks like 17 instructions, most of them two-byte, 5 one-byte and a couple
    that are longer, so 32-35 bytes?

    I've done a check, and found that a multiply of eax by 0x1999_999a works over the full 16 bit range to divide by 10 without requiring additional shifts. Of course it needs a back-multiply and subtraction to actually get the digit. However, even the above code using a divide will quite likely be (significantly)
    faster than the multiple subtraction loops in the original RTL. (And having mentioned multiplying eax, it should be clear that I don't really care about keeping the RTL compatible with the 8086/286, and given that replacing Int21h/AX=2C with "RDTSC" is also an option...)

    The really [b|s]ad parts of the RTL are the compares of REALs, there are six procedures like

    1A0E realeq proc near
    1A0E 8F 06 86 01 pop errpos
    1A12 59 pop cx
    1A13 5E pop si
    1A14 5F pop di
    1A15 58 pop ax
    1A16 5B pop bx
    1A17 5A pop dx
    1A18 E8 99 FE call recmp
    1A1B FF 36 86 01 push errpos
    1A1F B8 01 00 mov ax, 1 ; setz al
    1A22 74 01 jz short realeq1 ; and ax, 1
    1A24 48 dec ax
    1A25
    1A25 realeq1:
    1A25 0B C0 or ax, ax
    1A27 C3 retn
    1A27 realeq endp

    The last part of all of them can be replaced with a "SETcc al/and ax, 1", and using 32 bit registers it's likely that the first nine instructions can also be commonned out (a bit). Similar code (6x) is used for comparing strings (>, <, =,
    <>, >=, <=) and pointers (2x, =, <>)

    Of course all of this is strangely silly, fiddling with the RTL of a 35 year old
    compiler, but it keeps me off the street, Lithuania is officially still locked down until 31 May and it's just as much fun as hitchhiking to Nordkapp, which I did in 1982, for more or less similar reasons, because it's there.

    Robert
    --
    Robert AH Prins
    robert(a)prino(d)org
    The hitchhiking grandfather - https://prino.neocities.org/indez.html
    Some REXX code for use on z/OS - https://prino.neocities.org/zOS/zOS-Tools.html

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kerr-Mudd, John@21:1/5 to Robert Prins on Sun May 16 21:01:05 2021
    On Sun, 16 May 2021 21:43:10 +0000
    Robert Prins <robert.ah.prins@nospicedham.gmail.com> wrote:

    []

    Anyway, this is the original TP3 RTL code, which uses repeated subtractions, as
    disassembled by Hex-Rays' IDA Pro

    I guess on the original PCs 8088 the subtractions *were* faster?


    I've done a check, and found that a multiply of eax by 0x1999_999a works over the full 16 bit range to divide by 10 without requiring additional shifts. Of course it needs a back-multiply and subtraction to actually get the digit. However, even the above code using a divide will quite likely be (significantly)
    faster than the multiple subtraction loops in the original RTL. (And having mentioned multiplying eax, it should be clear that I don't really care about keeping the RTL compatible with the 8086/286, and given that replacing Int21h/AX=2C with "RDTSC" is also an option...)

    []

    --
    Bah, and indeed Humbug.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)