• Optimize speed 8086 instruction "rep movsb" and "rep stosb"

    From Phu Tran Hoang@21:1/5 to All on Thu Jul 21 22:19:55 2022
    ;Replace "rep movsb" by the following code
    test di,1 ; alaign by word
    jz $+4
    movsb
    dec cx

    shr cx,1
    rep movsw
    jnc $+3
    movsb



    ;Replace "rep stosb" by the following code
    mov ah, al
    test di,1 ; alaign by word
    jz $+4
    stosb
    dec cx

    shr cx,1
    rep stosw
    jnc $+3
    stosb

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From wolfgang kern@21:1/5 to Phu Tran Hoang on Fri Jul 22 15:37:33 2022
    On 22/07/2022 07:19, Phu Tran Hoang wrote:
    ;Replace "rep movsb" by the following code
    test di,1 ; alaign by word
    jz $+4
    movsb
    dec cx

    shr cx,1
    rep movsw
    jnc $+3
    movsb



    ;Replace "rep stosb" by the following code
    mov ah, al
    test di,1 ; alaign by word
    jz $+4
    stosb
    dec cx

    shr cx,1
    rep stosw
    jnc $+3
    stosb

    [jnc+1 ? stosb/stosw are only one byte code "AA/AB"]

    Yes, pre- and post-aligning string operations are
    the main speed-gain in my OS. It works with 32-bit
    reduction/extension for any odd start and size.

    But I also align source or destination to quad bounds.

    TEST esi,3
    JZ isAligned
    ... ;adjust for an aligned loop start here
    isAligned:
    SHR ecx,1 ;no action at all if ecx=0
    JNC +1
    LODSB
    SHR ecx,1
    JNC +2 ; +2 for use32
    LODSW ; because prefix required here
    REP LODSD ;falls through if ECX=Zero

    and with similar dummy reads up front and at end it
    can part-read disk sectors at any offset and size.
    __
    wolfgang

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)