PDA

View Full Version : Replacement of cmov?? instruction


vlad59
10th April 2002, 08:15
Hi,

Yesterday I was trying to test some 3DNow IDCT on my old K6II-400, I found that Farok has already added 2 3DNow IDCT in DVD2Avi.
When I tested one of them I got a crash because the cmovl instruction was used and it is not available on K6II.
I'm a total newbie in asm, but I tried to replace :

cmovl eax, ecx

by :

jge TST1
mov eax, ecx
TST1:

Is it the best solution ? I'm quite sure it isn't !
Can I have a more generic code like that :
jge IP+4
mov eax, ecx

Thanks in advance

PS : Sorry for my bad english

LigH
10th April 2002, 08:30
(As far as I know - don't kill me if I'm wrong here...)

The executable binary code just works this way, relative short jumps use the jump distance as parameter. Therefore, a binary replacement patch would be relatively easy as long as both instructions are using the same code size (the complicated part would only be to ensure that the bytes you are patching really represent a 'cmovl' instruction and nothing else). If the replacement is longer than the original instruction, binary replacement will of course not work.

If you know where to get it, you might enjoy tools like HIEW (Hackers' view; latest versions are unfortunately commercial), or e.g. BIEW as alternative. Those two are file editors in text mode (looking like a Norton Commander viewer) which can also disassemble and allow binary editing.

vlad59
10th April 2002, 09:56
In fact, it's not exactly my problem.
I have to replace cmov in an unrolled loop so I have to replace 64 cmovl and 64 cmovg.
That's not a very interesting task.
And creating 128 labels (from TST1: to TST128:) will make the code totally unreadable.
So I'm looking for a trick to keep my code as clean as possible.

I hope I'm clear enought

LigH
10th April 2002, 11:18
So you really need to search and replace text in your assembler source, and the best solution would be exchanging a part of the replacement text by a running count... such a rather complex task sounds like being made for a tool like the UNIX commands grep/sed. I'm afraid I can't help you with those tools, I'm not a specialist with them (although they shall be at least available as Win32 port, but usually as command line tools). The only help I could offer is to hack up a tool which helps in this special case; else you may have to search for an application which allows some kind of template-based search&replace.

vinouz
11th April 2002, 00:15
I'm a total newbie in asm, but I tried to replace :

cmovl eax, ecx

by :

jge TST1
mov eax, ecx
TST1:


first quick'n dirty idea
well so the point would be trying to change the test in order to set the carry bit then :
push ebx
xor ebx, ebx
adc ebx, ebx
and ebx, ecx
subc eax, eax
and eax, ecx
add eax, ebx
pop ebx
or some sort of (4 years since I didn't touch an x86... and I always preferred to have only souvenirs of motorola asm ;) ) way to clear or set a register to 1111111... then with an AND you get the value or 0 in that reg. Just repeat with another reg you set on the opposite value of the same condition, ad the two and you have your data into the reg.
verify my code : pretty sure it's crap (was it addc or adc ?.... I just remember just that there was no addx on x86 eh !)

second idea, just un peu moins crappy
before this you perform a test, some sort of substraction. If there was some l/ge condition then the sign bit of this substraction is equal to the value of that cc (or opposite... well, representating).
So making some asr by 31 you get either 000000... or 11111....
so if there was a tst, put a sub then the following code
; here I suppose the result of your sub is in ebx
asr ebx, 31 (was it #31 ? ;p )
mov eax, -1 (was it imov ? movq.l #-1,d0... =>)
sub eax, ebx
and eax, ecx
and ebx, ecx
add eax, ebx

take car ef your ebx (and correct my code !)

Vincent

edit : this method, on superpipelined+ processors (every modern x86 clone) has the advantage the cmov has over the jump one : no unpredicted branch and no pipeline miss (on a today's P4 it must cost, I would say some 8 or 10 cycles, and on a cycle you can perform multiple operations. Here for example the setting of the 0000.../ecx in eax and in ebx can easily be parrallelized on the processor, and only the final add couldn't be parllelized so it would take some 4 cycles to achieve, instead of (1 or 8+)...