Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
22nd September 2005, 21:04 | #1 | Link |
Clouded
Join Date: Jul 2003
Location: Cambridge, UK
Posts: 1,148
|
Inline assembly and ebx (and fast function calls)
So, while trying to speed up filters in VC 2003, I get this warning every time I use ebx in inline assembly:
warning C4731: 'UnmaskedRangeTranslateDifference_P4::_match' : frame pointer register 'ebx' modified by inline assembly code As I understand it, that means that ebx is being used in the same way as ebp, to (approximately) point to the local variables. Two questions: 1. Why is ebx needed for this as well as ebp ? 2. Is there something sensible to do to get round the warning -- i.e. to flag to the compiler that ebx is not needed in a particular instance? (I am aware that you can suppress warnings by number, but this is prevention rather than cure...) Thanks! M. Last edited by mg262; 7th November 2005 at 15:04. |
22nd September 2005, 21:40 | #2 | Link |
Registered User
Join Date: Jan 2002
Location: San Jose, CA
Posts: 216
|
ebx is used by the compiler to keep track of aligned variables (stack aligned to 8 or 16 bytes).
This is commonly used when declaring local variables with __declspec(align) and/or _m64/_m128 data types. |
23rd September 2005, 01:40 | #3 | Link |
Avisynth Developer
Join Date: Jan 2003
Location: Melbourne, Australia
Posts: 3,167
|
Sulik has the right answer. The warning is really a serious error, but because of the credo "assembler programmers know what they are doing" it is classed only as a warning. If you push/pop ebx in your asm code and the compiler will forgive you but this sucks. When doing __asm always ask for an asm compiler listing and check what stupidity the compiler comitting.
As a hackey workaround when I absolutly have to put aligned variables on the stack, I declare a stack based char array 7, 15 or 63 bigger than I need and mask and assign it to a stack based pointer to the type I need. Code:
char dummy[(N+1)*sizeof(__int64)-1]; __int64 *var1 = (__int64*)(((int)dummy+sizeof(__int64)-1)&(-sizeof(__int64))); IanB |
23rd September 2005, 09:16 | #4 | Link |
Clouded
Join Date: Jul 2003
Location: Cambridge, UK
Posts: 1,148
|
Thank you both. I don't follow why aligned variables are kept separately (improve packing?), but it doesn't in itself matter; the problem is more that I need the extra register. I think there's an option like "omit frame pointer" in the compiler... I'm going to look into it and see if it frees up ebp.
|
23rd September 2005, 11:42 | #5 | Link |
Registered User
Join Date: Nov 2001
Posts: 291
|
When you declare a variable aligned and is not static compiler will no alocate it ,it will be alocated in execution time ,so compiler reserves a register (ebx)as pointer.
Solutions the one IanB has pointed; use static if you know your variable will be a constant in any instance your plugin is called , otherwise pass the parameter in a way you arrange it in a register and you can forget about its alignment. And finally dont use ebx register I forgot, there is a trick I dont like but works, declare your variable static and aligned but without any value,pass the parameter to your inline code and fill the variables within your own code with the values you want in execution time, that way you are sure you can have several instance of your plugin. I hope this can be usefull ARDA |
23rd September 2005, 12:00 | #6 | Link | ||
Clouded
Join Date: Jul 2003
Location: Cambridge, UK
Posts: 1,148
|
Quote:
Quote:
I guess one alternative is not to try and use any non-aligned variables in the main part of the code... so I can push/pop ebp and use that in place of ebx. Edit: looks like it may not always be safe to rely on the omit frame pointer compiler option: http://groups.google.com/group/micro...1fdaad1d22b5d4 Last edited by mg262; 23rd September 2005 at 12:06. |
||
23rd September 2005, 12:18 | #7 | Link |
Registered User
Join Date: Nov 2001
Posts: 291
|
Unless they are both trying to run simultaneously?
One could try and change the static variable while the other was halfway through using it? If I am not wrong avisynth just delivers frame to next filter in the chain ,once it has finished the one is processing I guess one alternative is not to try and use any non-aligned variables in the main part of the code... so I can push/pop ebp and use that in place of ebx. You are right, but that is not always possible and pass the variable and manage to allocate it in a register is still better, you will avoid memory access. Any way a safe mode is to declare your inline assembler as static void __declspec(naked) look as an example inline assembler of Tweak by Dividee. Regards ARDA |
23rd September 2005, 12:29 | #8 | Link | ||
Clouded
Join Date: Jul 2003
Location: Cambridge, UK
Posts: 1,148
|
Quote:
At the moment, I'm not accessing any variables from inside the main body of the assembly... but in a long inner loop, one xmm register is permanently tied up to hold a constant (at least, constant per class whose member function is being called) and since movdqa is fast I thought I'd try freeing up that register. Quote:
Last edited by mg262; 23rd September 2005 at 12:32. |
||
23rd September 2005, 12:50 | #9 | Link |
Registered User
Join Date: Nov 2001
Posts: 291
|
Sincerly I´ve not read anything about cf. tsp's multithreading filter I am not update.
movdqa reg,memory is fast but movda reg,reg is faster and if you repeat access memory in your in loop always an operation over a register is faster than over memory overall in pentium 4 architecture,obviusly as I dont know your algo this is a general consideration,you have to measure how to implent it better and benchmark, this is always a contradition between the registers you need and can have free and the perfomance you want , not easy many times.In future 64 bits editions we shall have more registers and this question probably will be overcome. With static void __declspec(naked) you will be obliged to push/pop registers and parameters will pass throug stack pointer(esp).That will free not only ebx but ebp. With more specific reference to your code maybe any guru developer could help you better than me. Once more Regards ARDA |
23rd September 2005, 13:16 | #10 | Link | ||
Clouded
Join Date: Jul 2003
Location: Cambridge, UK
Posts: 1,148
|
Quote:
Quote:
Code:
class MeasureCycles { int begincycles; int endcycles; int scale; unsigned int cycles(); public: MeasureCycles(int _scale=1000): scale(_scale) {begincycles = cycles();} void reset() { begincycles = cycles (); } int mark() { endcycles = cycles(); int cyclesused=endcycles-begincycles; // loga <<cyclesused <<"\t"<< float (cyclesused)/scale<<"\n" ; // loga.flush (); begincycles = endcycles; return cyclesused; } operator int() {return mark();} }; inline unsigned int MeasureCycles::cycles () { int result; __asm { xor ebx,ebx rdtsc xor ebx,ebx mov [result], eax } return result; } Edit: The static void __declspec(naked) does look straightforward, but in this case the function is inline... or at least is flagged with the inline keyword. I wish the inline assembler had the features of a proper macro assembler! (I'm aware that you can use standard macros, but it's very awkward to use...) Last edited by mg262; 23rd September 2005 at 13:24. |
||
25th September 2005, 03:06 | #11 | Link | |
Avisynth Developer
Join Date: Jan 2003
Location: Melbourne, Australia
Posts: 3,167
|
@mg262,
You and Arda seem to have it mostly in hand As I said earlier Quote:
Try variants of this (it's untested off the top of my head) Code:
declspec(naked) inline unsigned int MeasureCycles::cycles () { __asm { mov edx,ebx xor ebx,ebx rdtsc mov ebx,edx // ret // can't remember if compiler gives you this } } |
|
25th September 2005, 09:54 | #12 | Link | |
Clouded
Join Date: Jul 2003
Location: Cambridge, UK
Posts: 1,148
|
Ah! I had read (on another forum) that __declspec(naked) inline was an illegitimate combination. Good to know otherwise.|Thank you!
Quote:
Edit: Looking back at that old code again, I just checked one case and using movdqa to load a register with a _declspec(align(16)) value used ebp rather than ebx. [ebx is being used for something completely different.] That is in a perfectly normal non-inline non-declspec class member function using VC 2003. Edit 2: Neither switching on omit frame pointers nor using a __declspec(naked) function (with no particular initialising code) results in the use of esp rather than ebp. I'm going to keep testing. Edit: I can't find a way to stop it from using ebp. Last edited by mg262; 25th September 2005 at 12:38. |
|
25th September 2005, 13:50 | #13 | Link | |
Avisynth Developer
Join Date: Jan 2003
Location: Melbourne, Australia
Posts: 3,167
|
Quote:
When using __declspec(naked) you are expected to write ALL the code in asm for the routine. When referencing stack local variables from __asm, the compiler only seems to know how to reference them via EBP and this re-enable frame pointer code mode. It's a cruel world To make life easy when I want frame pointerless code, I knock up a framework routine with just the C code I need and reference all the variables I am going to need as "var++" in the middle. I compile with omit frame pointer and ask for an asm listing, swipe all the code the compiler generates and paste it into a __declspec(naked) routine and then add the code I need. All the var++ references I laced in generate samples for the stack offsets so I don't need to calculate them. IanB |
|
25th September 2005, 14:58 | #14 | Link |
Clouded
Join Date: Jul 2003
Location: Cambridge, UK
Posts: 1,148
|
Ian, thank you for all the help -- it really is appreciated.
In this case I think I'm not going to try that trick -- doing it now is one thing, but reading it three months later when I want to make a small change to the 200 or so lines of assembly is another. In this case, it would only save a couple of push/pop/movd instructions... and I'm sure I can find latencies to squish those into! Warnings aside, ebx really doesn't seem to be being used as a frame pointer with aligned variables (none are static, because I want them to be allocated close to each other) -- so I shall just continue as I have been going and use ebx. Again, Ian, ARDA, thanks for all the guidance. Edit: From what I have now seen, I am beginning to suspect that ebx is used with aligned variables in functions that are not class member functions. I haven't made any systematic attempt to check that yet. Last edited by mg262; 25th September 2005 at 20:47. |
7th November 2005, 15:12 | #15 | Link | |
Clouded
Join Date: Jul 2003
Location: Cambridge, UK
Posts: 1,148
|
I have just found something else for speeding up function calls in the vein of __declspec(naked) ... Ian et al, you've probably seen this, but in case it's useful to someone:
Under the MSVC project properties, look under C++ and then under Advanced; the first entry is Calling Convention, and it can be switched to __fastcall, which makes all functions default to taking their first two arguments via registers rather than the stack. You can also add the __fastcall (Microsoft) keyword to individual function calls. More here: http://msdn.microsoft.com/library/de...__fastcall.asp Edit: Also... Quote:
__________________
a.k.a. Clouded. Come and help by making sure your favourite AVISynth filters and scripts are listed. Last edited by mg262; 7th November 2005 at 15:17. |
|
7th November 2005, 15:59 | #16 | Link |
Registered User
Join Date: Nov 2001
Posts: 291
|
@MG262
Hi mate I've out for more than a month on business travel , but many times I remembered this subjet and as I knew you would search for better solutions, I was curious.I have read too fast your post and links. Can you summarize your conclusions and if you have some tests; what do you think en general terms. And you if have used please point us your solutions with an example. Probably this could be usefull for many new and young developers but I guess that not only. Regards ARDA Last edited by ARDA; 7th November 2005 at 16:17. |
7th November 2005, 19:00 | #17 | Link | |
developer wannabe
Join Date: Nov 2001
Location: Brooklyn, NY
Posts: 1,211
|
FYI -
Quote:
|
|
8th November 2005, 19:57 | #18 | Link |
Clouded
Join Date: Jul 2003
Location: Cambridge, UK
Posts: 1,148
|
ARDA,
I'm not sure I'm the best person to answer this! In any case, I'm afraid I don't have that much to report... I got distracted onto other things and only just came back to this. Part of the difficulty is that getting to the bottom of this requires sitting down and repeatedly analysing the assembly output of the compiler... which is something I don't want to get into doing right now. I can give you some scattered thoughts, for what they are worth:
Sorry not to be able to be more helpful...
__________________
a.k.a. Clouded. Come and help by making sure your favourite AVISynth filters and scripts are listed. |
8th November 2005, 20:38 | #19 | Link |
Registered User
Join Date: Nov 2001
Posts: 291
|
@MG262
You needn't apologize at all. The only fact you've searching for options is a good step, including for discard them.I will read carefully and maybe make some tests.Don't know when. I've seen your developing some filters .Go ahead, and thanks for your report Regards Arda |
|
|