Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
15th November 2011, 21:05 | #21 | Link | |||
͡҉҉ ̵̡̢̛̗̘̙̜̝̞̟̠͇̊̋̌̍̎̏̿̿
Join Date: Feb 2009
Location: No support in PM
Posts: 712
|
Quote:
Quote:
Quote:
__________________
dither 1.28.1 for AviSynth | avstp 1.0.4 for AviSynth development | fmtconv r30 for Vapoursynth & Avs+ | trimx264opt segmented encoding |
|||
16th November 2011, 00:21 | #22 | Link |
Registered User
Join Date: Jan 2005
Location: Praha (not that one in Texas)
Posts: 863
|
Well, sorry, I never nowhere learned C or asm except now for the purpose of Avisynth, so I am a little slow. But after slowly getting thing, I wonder whether this should be enough to safely use all registers for my purposes, as long as I do not use POP and PUSH?
Code:
int saveesp; __asm { pushad mov saveesp, esp .... mov esp, saveesp emms popad } Last edited by redfordxx; 16th November 2011 at 01:04. |
16th November 2011, 00:38 | #23 | Link |
Registered User
Join Date: Jan 2005
Location: Praha (not that one in Texas)
Posts: 863
|
cretindesalpes, by the way, as you are the guy of 16bits, I would appreciate your opinion on this (or whoever else's):
When I will extend the Average plugin to 16bits, the formula should be Code:
int(bias+sum(clip_i*mask_i)/65536+0.5) i=1...n Code:
int(bias+0.5)+sum(int(clip_i*mask_i/65536)) i=1...n |
16th November 2011, 02:47 | #24 | Link |
Avisynth Developer
Join Date: Jan 2003
Location: Melbourne, Australia
Posts: 3,167
|
No, ESP always need to point at a valid stack except when interrupts are disabled.
No leaving out the proper rounding causes problems. Implement as :- Code:
int K=(bias<<16)+32768; .... (K + sum(clip_i*mask_i) ) >> 16; i=1...n |
16th November 2011, 02:59 | #25 | Link |
Registered User
Join Date: Jan 2005
Location: Praha (not that one in Texas)
Posts: 863
|
Well, I achieved one of my benchmark. One of the function, in following, parameters:
Code:
RAverageW(c1,1,c2,-1,bias=128) There is one think I am fighting with: It is calculated scaled to signed short with scale 32*256 and as soon as the weight is 4, it switches to non-asm. I need variable scale which I tried with following thing but doesnot work: Code:
if (maxweight<4) { #define SCALE (256*32) #define SCALEPOWER (8+5) } else { #define SCALE (256/2) #define SCALEPOWER (8-1) } |
16th November 2011, 03:49 | #26 | Link | |
Registered User
Join Date: Jan 2005
Location: Praha (not that one in Texas)
Posts: 863
|
Quote:
As of my knowledge I see two instruction I can use: pmaddwd however, there still can be error on last bit. Because this instruction is signed, I have to do Code:
(K + sum((clip_i>>1)*(mask_i>>1)) ) >> 14; i=1...n I don't see other options but maybe there are instructions I can't think of. |
|
16th November 2011, 08:20 | #27 | Link | |
Avisynth Developer
Join Date: Jan 2003
Location: Melbourne, Australia
Posts: 3,167
|
Quote:
Code:
if (maxweight<4) { #define SCALE (256*32) #define SCALEPOWER (8+5) #include "common_include_code.hpp" #undef SCALEPOWER #UNDEF SCALE } else { #define SCALE (256/2) #define SCALEPOWER (8-1) #include "common_include_code.hpp" #undef SCALEPOWER #UNDEF SCALE } Perhaps you need the power of Softwire or similar to generate dynamic assembly on the fly. |
|
16th November 2011, 08:34 | #28 | Link | ||
Avisynth Developer
Join Date: Jan 2003
Location: Melbourne, Australia
Posts: 3,167
|
Quote:
Really this guessing in the dark is not helpful. Please post complete code fragments and ask direct questions about that code. As I guess your problem it is to do sum(clip_i * mask_i) quickly. If so you need code to do DWORD=K, Loop{UWORD=BYTE*BYTE, DWORD+=UWORD}, ScaleAndRound(DWORD) Am I close with the above pseudo code guess |
||
16th November 2011, 13:03 | #30 | Link | |
Registered User
Join Date: Jan 2005
Location: Praha (not that one in Texas)
Posts: 863
|
Quote:
DWORD=K, Loop{UDWORD=UWORD*UWORD, DWORD+=UDWORD}, ScaleAndRound(UDWORD) Clip and Mask would be numbers in < 0 ; 65535 > range. So the sum before scaling would be in < 0 ; 256^4 ) However bias can be negative also and has no real boundaries, although the only range which makes sense is ( -65536*n ; 65535 > where n is number of input clips. I don't have code yet but the first idea with the one bit inaccuracy would be something like Code:
move xmm7, dword ptr [bias_scaled] //here I cannot scale 16bits up beacause of the range is outside signed word so I will scale 8bits up loop: ... psrlw xmm0, 1 //xmm0 contains packed interleaved clip_i and clip_j where j=i+1 psrlw xmm1, 1 //xmm1 contains packed interleaved mask_i and mask_j (I need to do these operations to clear the sign bit because pmaddwd expects signed arguments pmaddwd xmm0, xmm1 psrld xmm0, 6 //extra op to get on the scale of K in xmm7 paddd xmm7, xmm0 ... jnz loop ... psraw xmm7, 8 ... pack with unsigned saturation to word |
|
16th November 2011, 13:47 | #31 | Link |
Registered User
Join Date: Jan 2005
Location: Praha (not that one in Texas)
Posts: 863
|
Read EDIT2 first!
I am trying other thing when I have problems with this if {#define} and it is to use variable instead of constant. But I got really confused and please correct my error: These are the definitions: Code:
int scaleI; scaleI=13; #define SCALE 13 __asm{ mov eax, SCALE mov ebx, scaleI Code:
psrad xmm0, SCALE //only this one is correct psrad xmm0, scaleI psrad xmm0, eax psrad xmm0, ebx Code:
; 75 : scalepowerA=13; mov DWORD PTR [edi+16], 13 ; 0000000dH ; 76 : ; 77 : __asm ; 78 : { ; 79 : pushad pushad ; 82 : mov eax, scalepowerA mov eax, 16 ; 00000010H ; 110 : psrad xmm0, eax psrad xmm0, eax ; 111 : psrad xmm1, scalepowerA psrad xmm1, 16 ; 00000010H EDIT2: I seem to figure that out, is that so, that inline asm can access only local variables? And when the variable is defined in *.h file, it is silently replaced with number 16... EDIT3: But again: Code:
This doesnot work: mov eax, 13 psrad xmm0, eax This does: psrad xmm0, 13 how can I shift xmm register based on some variable I create in C++ code? (hopefully it won't be too slow, compared to immediate value...) Last edited by redfordxx; 16th November 2011 at 17:42. |
17th November 2011, 00:37 | #32 | Link |
Avisynth Developer
Join Date: Jan 2003
Location: Melbourne, Australia
Posts: 3,167
|
Again with the keyhole view. If you want help post enough so the full context is available to us.
PSRAD has no Reg32 variant. Only Immediate and MMreg versions. Code:
Opcode Instruction Description 0F E2 /r PSRAD mm, mm/m64 Shift doublewords in mm right by mm/m64 while shifting in sign bits. 66 0F E2 /r PSRAD xmm1, xmm2/m128 Shift doubleword in xmm1 right by xmm2 /m128 while shifting in sign bits. 0F 72 /4 ib PSRAD mm, imm8 Shift doublewords in mm right by imm8 while shifting in sign bits. 66 0F 72 /4 ib PSRAD xmm1, imm8 Shift doublewords in xmm1 right by imm8 while shifting in sign bits. |
17th November 2011, 09:30 | #33 | Link |
Registered User
Join Date: Jan 2005
Location: Praha (not that one in Texas)
Posts: 863
|
Hi, please, how do I change dimensions of the video? I tried:
Code:
PVideoFrame __stdcall RAverageM::GetFrame(int n, IScriptEnvironment *env) {... vi.width = dst_width; vi.height = dst_height; return WeightedAverageM16(env, vi); ...} PVideoFrame RAverageM::WeightedAverageM16(IScriptEnvironment* env, VideoInfo vi) { result = env->NewVideoFrame (vi,PitchAlign); if (y==3) AveragePlaneM16(PLANAR_Y, env); if (u==3) AveragePlaneM16(PLANAR_U, env); if (v==3) AveragePlaneM16(PLANAR_V, env); return (result); } |
17th November 2011, 18:22 | #35 | Link |
Registered User
Join Date: Mar 2011
Location: Germany
Posts: 64
|
From Filter SDK documentation.
|
17th November 2011, 18:30 | #37 | Link |
Avisynth language lover
Join Date: Dec 2007
Location: Spain
Posts: 3,431
|
All input clips are available in the constructor (if you pass them in as arguments), so what's the problem?
The GetFrame() function has no more information available than the constructor has. |
17th November 2011, 18:54 | #38 | Link |
Registered User
Join Date: Aug 2007
Posts: 374
|
Writing in assembler means that you know what you are doing, so don't worry about compiler warning if you are sure your code is correct.
If you really want registers you can use all 8 of them: 1) no problems with 6 you mentioned; 2) after you modify ebp in inline assembler you won't be able to access most of your C/C++ variables by name, so load it last and restore at the end; 3) you can even use esp in extreme cases - just need to save it somewhere and restore at the end, don't worry about interrupts - they'll switch to their own stack. Also it's important to understand that not everything has to be put in registers - for example, putting counters of outer loops to memory is perfectly fine and won't change your program speed in any noticeable way. |
17th November 2011, 23:44 | #40 | Link |
͡҉҉ ̵̡̢̛̗̘̙̜̝̞̟̠͇̊̋̌̍̎̏̿̿
Join Date: Feb 2009
Location: No support in PM
Posts: 712
|
No need to call GetFrame:
Code:
PClip p = ...; const VideoInfo & v = p->GetVideoInfo (); area = v.width * v.height;
__________________
dither 1.28.1 for AviSynth | avstp 1.0.4 for AviSynth development | fmtconv r30 for Vapoursynth & Avs+ | trimx264opt segmented encoding |
|
|