View Full Version : how to start learning assembler/mmx/...?
E-Male
14th August 2005, 21:22
currently i only know c++ (more or less), enough for simple avisynth plug-ins
i'm interested in learning optimizations (assembler/mmx/integer-sse/...) to speed my plug-ins up
i'm looking for assembler/mmx beginners guides, guides for assembler/mmx in c++ and a reference 'manual' that includes all commands
i had no luck with google, so i hope for some help here
thx
hank315
14th August 2005, 23:20
link: http://developer.intel.com/design/Pentium4/documentation.htm
and look in the manual section.
Another one: http://www.tommesani.com/Docs.html
jstelly
15th August 2005, 22:03
Personally I'd look at learning assembly as an interim step towards learning intrinsics. I don't know if that makes a ton of sense, but I mean I wouldn't worry about syntax, I'd learn the ins and outs of the processors, then start looking at intrinsics.
I'm not much of an assembly guy. I mean I know enough to be dangerous, but not much more... maybe someone who's more into assembly can comment on my plan. I just figure as a C/C++ developer, intrinsics can be easier to use and can be just as effective as hand-written assembly.
Wilbert
16th August 2005, 00:33
Don't forget http://www.avisynth.org/AssemblerOptimizing.
E-Male
16th August 2005, 03:23
thx guys
my first "oh my god, that looks complicated"-shock is starting to fade away...
but i start to wonder in which cases i can use this:
i now know that if 8 pixels in a row get the same math-operation performed on (add/subtract/.. a constant vallue or values from an other row of pixels) i can do that in one go
but in most cases (at leats for me) every pixel need his own treatment (lookup-table [gicocu], copy data from struct to avs-style pixel-array [immaavs], get value from other pixels based on distance [logo removal], use values form neighboars [blur, undot, removegrain, masktools in/expand and so on], ...)
in short: i think i'll be able to learn the commands while using them, but i wonder how to do the real optimization in most cases
maybe someone can recommend a non-trivial but not to complicated plug-in with open-source that i can study
thx again for your help
i hope i can use it to give something back that is really usefull to the comunity
EDIT:
something else i find interesting is that often operations like division are replaced by (to me) more complicated looking but faster algoritms (often including shifts, which IIRC mean multiplication or division by 2 to the power of n)
is there a site which lists the most common of those replacements
squid_80
16th August 2005, 09:10
Similar to the shifting trick you can use the lea instruction to multiply general purpose registers by a number which is of the form 2^n+1, where n is >=0 and <=3. For example to multiply register eax by 5 (5=2^2+1) you could use "lea eax, [eax+4*eax]".
Also if you use shifting to divide be aware that negative numbers will behave differently, the result will round downwards instead of truncating like in C.
bill_baroud
17th August 2005, 10:32
something else i find interesting is that often operations like division are replaced by (to me) more complicated looking but faster algoritms (often including shifts, which IIRC mean multiplication or division by 2 to the power of n)
is there a site which lists the most common of those replacements
Look into the AMD "Optimisation guide", they talk about a little tool called "udiv" that give you the multiplier and the number of shift to divide by any integer...
exemple :
Unsigned division by constant
=============================
enter divisor: 9
; dividend: register other than EAX or memory location
MOV EAX, 038E38E39h
MUL dividend
SHR EDX, 1
; quotient now in EDX
And my own current documents base on the subject :
« The Art of Assembly » - Randall Hyde – 1996
http://webster.cs.ucr.edu/
« IA-32 Intel Architecture Optimization Reference manual » - Intel White paper N° 24896611
« MMX Technology Developers Guide » - Intel – Mars 1996
« MMX Technology Programmer’s Reference Manual » - Intel White paper N° 243007 – Mars 1996
« How To Optimize for the Pentium Family of Microprocessors » - Agner Fod – Avril 2004
http://www.agner.org/assem/pentopt.pdf
And this one, a not-so-good exemple infact, but can give you some idea :
http://www.inv3rsion.com/Whitepapers/SobelFilter/
I'm writing some things about mmx and its uses (that's my internship subject) so if you don't find anything else, i can translate some parts if you want (it's in french)
Oh yeah and remember : you cannot make everything run through mmx, and you often have to re-think everything if you want to use it (and it's often not a good idea)
E-Male
17th August 2005, 20:51
from avisynth.org IntermediateMmxOptimization (http://www.avisynth.org/IntermediateMmxOptimization):void additiveblend4(unsigned char *dst, const unsigned char *src, int quads) {
__asm {
mov ecx, src
mov edx, dst
mov eax, quads
pxor mm7, mm7
top:
movq mm0, [ecx]
paddusb mm0, [edx]
movq [edx], mm0
add ecx, 8
add edx, 8
sub eax, 2
jne top
emms
}
}i wonder if the "pxor mm7, mm7" is just a copy-n-past error or if it is needed for something i missed (it does make sense to me in additiveblend2)
Wilbert
17th August 2005, 22:38
Ask Sh0dan to look at it, because he wrote it :)
squid_80
18th August 2005, 01:32
i wonder if the "pxor mm7, mm7" is just a copy-n-past error or if it is needed for something i missed (it does make sense to me in additiveblend2)
Definitely looks leftover from a copy-paste to me. Not much point zeroing mm7 if it never gets used for anything.
Kopernikus
23rd August 2005, 18:22
something else i find interesting is that often operations like division are replaced by (to me) more complicated looking but faster algoritms (often including shifts, which IIRC mean multiplication or division by 2 to the power of n)
is there a site which lists the most common of those replacements
There are some crazy tricks: http://www.aggregate.org/MAGIC/
vBulletin® v3.8.4, Copyright ©2000-2009, Jelsoft Enterprises Ltd.