Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Capturing and Editing Video > Avisynth Development
Register FAQ Calendar Today's Posts Search

Reply
 
Thread Tools Search this Thread Display Modes
Old 15th November 2011, 21:05   #21  |  Link
cretindesalpes
͡҉҉ ̵̡̢̛̗̘̙̜̝̞̟̠͇̊̋̌̍̎̏̿̿
 
cretindesalpes's Avatar
 
Join Date: Feb 2009
Location: No support in PM
Posts: 712
Quote:
Originally Posted by redfordxx View Post
For pointers and counters, do I have anything else available to use than eax,ebx,ecx,edx,edi,esi?
Unless some wizardry, I don't think so.

Quote:
Also, the compiled always warns me: frame pointer register 'ebx' modified by inline assembly code.
Everything seems to work OK though, so should that bother me?
http://msdn.microsoft.com/en-us/library/k1a8ss06.aspx
Quote:
Some SSE types require eight-byte stack alignment, forcing the compiler to emit dynamic stack-alignment code. To be able to access both the local variables and the function parameters after the alignment, the compiler maintains two frame pointers. If the compiler performs frame pointer omission (FPO), it will use EBP and ESP. If the compiler does not perform FPO, it will use EBX and EBP. To ensure code runs correctly, do not modify EBX in asm code if the function requires dynamic stack alignment as it could modify the frame pointer. Either move the eight-byte aligned types out of the function, or avoid using EBX.
Also: http://forum.doom9.org/showthread.php?t=100374
__________________
dither 1.28.1 for AviSynth | avstp 1.0.4 for AviSynth development | fmtconv r30 for Vapoursynth & Avs+ | trimx264opt segmented encoding
cretindesalpes is offline   Reply With Quote
Old 16th November 2011, 00:21   #22  |  Link
redfordxx
Registered User
 
Join Date: Jan 2005
Location: Praha (not that one in Texas)
Posts: 863
Well, sorry, I never nowhere learned C or asm except now for the purpose of Avisynth, so I am a little slow. But after slowly getting thing, I wonder whether this should be enough to safely use all registers for my purposes, as long as I do not use POP and PUSH?
Code:
int saveesp;
__asm
{	
pushad
mov saveesp, esp
....
mov esp, saveesp
emms
popad
}
EDIT: now I realized ebp seems to by used when I am getting variable from memory. So, maybe, I could use ebp, but everytime I reference memory like eg mov eax, [variable] ebp value will be overwritten...and I should keep it in mind.

Last edited by redfordxx; 16th November 2011 at 01:04.
redfordxx is offline   Reply With Quote
Old 16th November 2011, 00:38   #23  |  Link
redfordxx
Registered User
 
Join Date: Jan 2005
Location: Praha (not that one in Texas)
Posts: 863
cretindesalpes, by the way, as you are the guy of 16bits, I would appreciate your opinion on this (or whoever else's):
When I will extend the Average plugin to 16bits, the formula should be
Code:
int(bias+sum(clip_i*mask_i)/65536+0.5)       i=1...n
But I believe doing it like this:
Code:
int(bias+0.5)+sum(int(clip_i*mask_i/65536))       i=1...n
could be easier and faster code, I think. However, there will be some inaccuracy in the lsb. What do you think of it?
redfordxx is offline   Reply With Quote
Old 16th November 2011, 02:47   #24  |  Link
IanB
Avisynth Developer
 
Join Date: Jan 2003
Location: Melbourne, Australia
Posts: 3,167
No, ESP always need to point at a valid stack except when interrupts are disabled.

No leaving out the proper rounding causes problems.

Implement as :-
Code:
int K=(bias<<16)+32768;
....
(K + sum(clip_i*mask_i) ) >> 16;       i=1...n
IanB is offline   Reply With Quote
Old 16th November 2011, 02:59   #25  |  Link
redfordxx
Registered User
 
Join Date: Jan 2005
Location: Praha (not that one in Texas)
Posts: 863
Well, I achieved one of my benchmark. One of the function, in following, parameters:
Code:
RAverageW(c1,1,c2,-1,bias=128)
Does the same as mt_makediff and I believe it is tiny bit faster on xmm. But, of course, you can choose multiple number of clips and different weights.
There is one think I am fighting with:
It is calculated scaled to signed short with scale 32*256 and as soon as the weight is 4, it switches to non-asm.
I need variable scale which I tried with following thing but doesnot work:
Code:
if (maxweight<4) {
#define SCALE (256*32)
#define SCALEPOWER (8+5)
} else {
#define SCALE (256/2)
#define SCALEPOWER (8-1)
}
Either this is wrong approach or I have bug somewhere. Bug is up to me to find, but pls tell me, if this kind of if else define is possible.
redfordxx is offline   Reply With Quote
Old 16th November 2011, 03:49   #26  |  Link
redfordxx
Registered User
 
Join Date: Jan 2005
Location: Praha (not that one in Texas)
Posts: 863
Quote:
Originally Posted by IanB View Post
Code:
int K=(bias<<16)+32768;
....
(K + sum(clip_i*mask_i) ) >> 16;       i=1...n
Well then it would be slow or even slower.
As of my knowledge I see two instruction I can use:
pmaddwd however, there still can be error on last bit. Because this instruction is signed, I have to do
Code:
(K + sum((clip_i>>1)*(mask_i>>1)) ) >> 14;       i=1...n
Or pmuludq which would be precise but processes fourtimes less data... and will be four times slower

I don't see other options but maybe there are instructions I can't think of.
redfordxx is offline   Reply With Quote
Old 16th November 2011, 08:20   #27  |  Link
IanB
Avisynth Developer
 
Join Date: Jan 2003
Location: Melbourne, Australia
Posts: 3,167
Quote:
Originally Posted by redfordxx View Post
...
It is calculated scaled to signed short with scale 32*256 and as soon as the weight is 4, it switches to non-asm.
I need variable scale which I tried with following thing but doesnot work:
Code:
if (maxweight<4) {
#define SCALE (256*32)
#define SCALEPOWER (8+5)
} else {
#define SCALE (256/2)
#define SCALEPOWER (8-1)
}
Either this is wrong approach or I have bug somewhere. Bug is up to me to find, but pls tell me, if this kind of if else define is possible.
I take it you actually mean something like this :-
Code:
if (maxweight<4) {
#define SCALE (256*32)
#define SCALEPOWER (8+5)
#include "common_include_code.hpp"
#undef SCALEPOWER
#UNDEF SCALE
} else {
#define SCALE (256/2)
#define SCALEPOWER (8-1)
#include "common_include_code.hpp"
#undef SCALEPOWER
#UNDEF SCALE
}
You cannot mix program flow with macro substitution. The C preprocessor does all the macro parsing then the compiler compiles the resulting source. If you ask for an ASM listing with C code as the comments you can see what you actually compiled.

Perhaps you need the power of Softwire or similar to generate dynamic assembly on the fly.
IanB is offline   Reply With Quote
Old 16th November 2011, 08:34   #28  |  Link
IanB
Avisynth Developer
 
Join Date: Jan 2003
Location: Melbourne, Australia
Posts: 3,167
Quote:
Originally Posted by redfordxx View Post
Quote:
Originally Posted by IanB View Post
Code:
int K=(bias<<16)+32768;
....
(K + sum(clip_i*mask_i) ) >> 16;       i=1...n
Well then it would be slow or even slower.
As of my knowledge I see two instruction I can use:
pmaddwd however, there still can be error on last bit. Because this instruction is signed, I have to do
Code:
(K + sum((clip_i>>1)*(mask_i>>1)) ) >> 14;       i=1...n
Or pmuludq which would be precise but processes fourtimes less data... and will be four times slower

I don't see other options but maybe there are instructions I can't think of.
The assumption was from your earlier code, where you zeroed a register then looped about summing the clip_i*mask_i values. The significant hint here was precalculate K and start with K instead of zero.

Really this guessing in the dark is not helpful. Please post complete code fragments and ask direct questions about that code.

As I guess your problem it is to do sum(clip_i * mask_i) quickly.

If so you need code to do DWORD=K, Loop{UWORD=BYTE*BYTE, DWORD+=UWORD}, ScaleAndRound(DWORD)

Am I close with the above pseudo code guess
IanB is offline   Reply With Quote
Old 16th November 2011, 09:24   #29  |  Link
jpsdr
Registered User
 
Join Date: Oct 2002
Location: France
Posts: 2,316
Quote:
Originally Posted by redfordxx View Post
Small question. For pointers and counters, do I have anything else available to use than eax,ebx,ecx,edx,edi,esi?
Under 32bits, no. On 64bits, yes, you have r8 to r15 avaibles.
jpsdr is offline   Reply With Quote
Old 16th November 2011, 13:03   #30  |  Link
redfordxx
Registered User
 
Join Date: Jan 2005
Location: Praha (not that one in Texas)
Posts: 863
Quote:
Originally Posted by IanB View Post
If so you need code to do DWORD=K, Loop{UWORD=BYTE*BYTE, DWORD+=UWORD}, ScaleAndRound(DWORD)
In 16bits I need two unsigned words to multiply. If UWORD means unsigned word and UDWORD means unsigned doubleword:

DWORD=K, Loop{UDWORD=UWORD*UWORD, DWORD+=UDWORD}, ScaleAndRound(UDWORD)

Clip and Mask would be numbers in < 0 ; 65535 > range. So the sum before scaling would be in < 0 ; 256^4 )

However bias can be negative also and has no real boundaries, although the only range which makes sense is ( -65536*n ; 65535 > where n is number of input clips.

I don't have code yet but the first idea with the one bit inaccuracy would be something like
Code:
move xmm7, dword ptr [bias_scaled]     //here I cannot scale 16bits up beacause of the range is outside signed word so I will scale 8bits up

loop:
...
psrlw   xmm0, 1   //xmm0 contains packed interleaved clip_i and clip_j where j=i+1
psrlw   xmm1, 1   //xmm1 contains packed interleaved mask_i and mask_j   (I need to do these operations to clear the sign bit because pmaddwd expects signed arguments
pmaddwd xmm0, xmm1
psrld   xmm0, 6   //extra op to get on the scale of K in xmm7 
paddd xmm7, xmm0
...
jnz loop
...
psraw xmm7, 8
...
pack with unsigned saturation to word
Now I see there is lot of psr* which I don't believe is very fast op
redfordxx is offline   Reply With Quote
Old 16th November 2011, 13:47   #31  |  Link
redfordxx
Registered User
 
Join Date: Jan 2005
Location: Praha (not that one in Texas)
Posts: 863
Read EDIT2 first!
I am trying other thing when I have problems with this if {#define} and it is to use variable instead of constant. But I got really confused and please correct my error:
These are the definitions:
Code:
int scaleI;
scaleI=13;
#define SCALE 13
__asm{
mov eax, SCALE
mov ebx, scaleI
Now, following instructions have different results and I dont know why:
Code:
psrad   xmm0, SCALE  //only this one is correct
psrad   xmm0, scaleI
psrad   xmm0, eax
psrad   xmm0, ebx
EDIT: I think the root cause can be found in this asm listing, which...well what's happening there is beyond my understanding:
Code:
; 75   : 	scalepowerA=13;

	mov	DWORD PTR [edi+16], 13			; 0000000dH

; 76   : 
; 77   : 	__asm
; 78   : 	 {	
; 79   : 	pushad

	pushad

; 82   : 	mov		eax, scalepowerA

	mov	eax, 16					; 00000010H

; 110  : 			  psrad   xmm0, eax 

	psrad	xmm0, eax

; 111  : 			  psrad   xmm1, scalepowerA 

	psrad	xmm1, 16				; 00000010H
I deleted some lines but nowhere was changed or accessed eax or scalepowerA

EDIT2:
I seem to figure that out, is that so, that inline asm can access only local variables? And when the variable is defined in *.h file, it is silently replaced with number 16...

EDIT3:
But again:
Code:
This doesnot work:
mov	eax, 13
psrad   xmm0, eax
This does:
psrad   xmm0, 13
So, shortly:
how can I shift xmm register based on some variable I create in C++ code? (hopefully it won't be too slow, compared to immediate value...)

Last edited by redfordxx; 16th November 2011 at 17:42.
redfordxx is offline   Reply With Quote
Old 17th November 2011, 00:37   #32  |  Link
IanB
Avisynth Developer
 
Join Date: Jan 2003
Location: Melbourne, Australia
Posts: 3,167
Again with the keyhole view. If you want help post enough so the full context is available to us.

PSRAD has no Reg32 variant. Only Immediate and MMreg versions.
Code:
Opcode Instruction Description
0F E2 /r        PSRAD mm, mm/m64       Shift doublewords in mm right by mm/m64 while shifting in sign bits.
66 0F E2 /r     PSRAD xmm1, xmm2/m128  Shift doubleword in xmm1 right by xmm2 /m128 while shifting in sign bits.
0F 72 /4 ib     PSRAD mm, imm8         Shift doublewords in mm right by imm8 while shifting in sign bits.
66 0F 72 /4 ib  PSRAD xmm1, imm8       Shift doublewords in xmm1 right by imm8 while shifting in sign bits.
IanB is offline   Reply With Quote
Old 17th November 2011, 09:30   #33  |  Link
redfordxx
Registered User
 
Join Date: Jan 2005
Location: Praha (not that one in Texas)
Posts: 863
Hi, please, how do I change dimensions of the video? I tried:
Code:
PVideoFrame __stdcall RAverageM::GetFrame(int n, IScriptEnvironment *env) {...
vi.width  = dst_width;
vi.height = dst_height;
return WeightedAverageM16(env, vi);
...}


PVideoFrame RAverageM::WeightedAverageM16(IScriptEnvironment* env, VideoInfo vi)
{
	result = env->NewVideoFrame (vi,PitchAlign);

	if (y==3) AveragePlaneM16(PLANAR_Y, env);
	if (u==3) AveragePlaneM16(PLANAR_U, env);
	if (v==3) AveragePlaneM16(PLANAR_V, env);

	return (result);
}
This idea I copied from somewhere, but it causes crashes.
redfordxx is offline   Reply With Quote
Old 17th November 2011, 10:43   #34  |  Link
Gavino
Avisynth language lover
 
Join Date: Dec 2007
Location: Spain
Posts: 3,431
Dimensions should only be changed in the filter's constructor, not in GetFrame(). All frames of a clip are assumed to have the same dimensions.
__________________
GScript and GRunT - complex Avisynth scripting made easier
Gavino is offline   Reply With Quote
Old 17th November 2011, 18:22   #35  |  Link
Youka
Registered User
 
Youka's Avatar
 
Join Date: Mar 2011
Location: Germany
Posts: 64
From Filter SDK documentation.
Youka is offline   Reply With Quote
Old 17th November 2011, 18:26   #36  |  Link
redfordxx
Registered User
 
Join Date: Jan 2005
Location: Praha (not that one in Texas)
Posts: 863
Thanx guys. I am only not sure, if I have multiple input clips, how to check dimensions of all of them in constructor? I know how to do it in GetFrame function...
redfordxx is offline   Reply With Quote
Old 17th November 2011, 18:30   #37  |  Link
Gavino
Avisynth language lover
 
Join Date: Dec 2007
Location: Spain
Posts: 3,431
All input clips are available in the constructor (if you pass them in as arguments), so what's the problem?
The GetFrame() function has no more information available than the constructor has.
__________________
GScript and GRunT - complex Avisynth scripting made easier
Gavino is offline   Reply With Quote
Old 17th November 2011, 18:54   #38  |  Link
SEt
Registered User
 
Join Date: Aug 2007
Posts: 374
Writing in assembler means that you know what you are doing, so don't worry about compiler warning if you are sure your code is correct.
If you really want registers you can use all 8 of them:
1) no problems with 6 you mentioned;
2) after you modify ebp in inline assembler you won't be able to access most of your C/C++ variables by name, so load it last and restore at the end;
3) you can even use esp in extreme cases - just need to save it somewhere and restore at the end, don't worry about interrupts - they'll switch to their own stack.

Also it's important to understand that not everything has to be put in registers - for example, putting counters of outer loops to memory is perfectly fine and won't change your program speed in any noticeable way.
SEt is offline   Reply With Quote
Old 17th November 2011, 23:18   #39  |  Link
redfordxx
Registered User
 
Join Date: Jan 2005
Location: Praha (not that one in Texas)
Posts: 863
Quote:
Originally Posted by Gavino View Post
The GetFrame() function has no more information available than the constructor has.
Just to be clear, so I should call GetFrame in the constructor for every input clip, just to know its size?
redfordxx is offline   Reply With Quote
Old 17th November 2011, 23:44   #40  |  Link
cretindesalpes
͡҉҉ ̵̡̢̛̗̘̙̜̝̞̟̠͇̊̋̌̍̎̏̿̿
 
cretindesalpes's Avatar
 
Join Date: Feb 2009
Location: No support in PM
Posts: 712
No need to call GetFrame:

Code:
PClip p = ...;
const VideoInfo & v = p->GetVideoInfo ();
area = v.width * v.height;
__________________
dither 1.28.1 for AviSynth | avstp 1.0.4 for AviSynth development | fmtconv r30 for Vapoursynth & Avs+ | trimx264opt segmented encoding
cretindesalpes is offline   Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 23:49.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.