On 22/07/2022 07:19, Phu Tran Hoang wrote:
;Replace "rep movsb" by the following code
test di,1 ; alaign by word
jz $+4
movsb
dec cx
shr cx,1
rep movsw
jnc $+3
movsb
;Replace "rep stosb" by the following code
mov ah, al
test di,1 ; alaign by word
jz $+4
stosb
dec cx
shr cx,1
rep stosw
jnc $+3
stosb
[jnc+1 ? stosb/stosw are only one byte code "AA/AB"]
Yes, pre- and post-aligning string operations are
the main speed-gain in my OS. It works with 32-bit
reduction/extension for any odd start and size.
But I also align source or destination to quad bounds.
TEST esi,3
JZ isAligned
... ;adjust for an aligned loop start here
isAligned:
SHR ecx,1 ;no action at all if ecx=0
JNC +1
LODSB
SHR ecx,1
JNC +2 ; +2 for use32
LODSW ; because prefix required here
REP LODSD ;falls through if ECX=Zero
and with similar dummy reads up front and at end it
can part-read disk sectors at any offset and size.
__
wolfgang
--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)