Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
1st June 2021, 11:54 | #1001 | Link |
Registered User
Join Date: Oct 2002
Location: France
Posts: 2,316
|
Unfortunately i don't have time right now, can quick test could be this :
Code:
static void Sobel_16(const unsigned char *psrc,unsigned char *pdst,const int32_t src_pitch, const int32_t dst_pitch, const int32_t src_height,int32_t dst_row_size, int32_t thresh,uint8_t bit_pixel) { const int32_t i = (dst_row_size + 3) >> 2; const int32_t i0 = (dst_row_size + 3-2) >> 2; dst_row_size >>= 1; thresh <<= (bit_pixel-8); if (aWarpSharp_Enable_AVX) { for (int32_t y=0; y<src_height; y++) { uint16_t *dst=(uint16_t *)pdst; if (y==0) JPSDR_Sobel_16_AVX(psrc+2,pdst+2,src_pitch,y,src_height,i0,thresh); else { if (y==src_height-1) JPSDR_Sobel_16_AVX(psrc,pdst,src_pitch,y,src_height,i0,thresh); else JPSDR_Sobel_16_AVX(psrc,pdst,src_pitch,y,src_height,i,thresh); } dst[0]=dst[1]; dst[dst_row_size-1]=dst[dst_row_size-2]; psrc += src_pitch; pdst += dst_pitch; } } else { for (int32_t y=0; y<src_height; y++) { uint16_t *dst=(uint16_t *)pdst; if (y==0) JPSDR_Sobel_16_SSE2(psrc+2,pdst+2,src_pitch,y,src_height,i0,thresh); else { if (y==src_height-1) JPSDR_Sobel_16_SSE2(psrc,pdst,src_pitch,y,src_height,i0,thresh); else JPSDR_Sobel_16_SSE2(psrc,pdst,src_pitch,y,src_height,i,thresh); } dst[0]=dst[1]; dst[dst_row_size-1]=dst[dst_row_size-2]; psrc += src_pitch; pdst += dst_pitch; } } } The Code:
dst[0]=dst[1]; dst[dst_row_size-1]=dst[dst_row_size-2];
__________________
My github. Last edited by jpsdr; 1st June 2021 at 11:57. |
1st June 2021, 13:27 | #1002 | Link | |
Registered User
Join Date: Apr 2010
Location: I have a statue in Hakodate, Japan
Posts: 744
|
Quote:
I need your help. Last edited by GMJCZP; 1st June 2021 at 13:29. |
|
1st June 2021, 16:26 | #1003 | Link | |
HeartlessS Usurer
Join Date: Dec 2009
Location: Over the rainbow
Posts: 10,980
|
@Wonkey,
Avisynth is a bit laxly specified, you can even have a variable with same name as a function, its not until it tries to evaluate an expression that it can figure out what it is, and whether the result of that expression is assigned to some variable or used in some way as an argument in another expression. It may not be used for anyhting at all, eg just plonk a "123456" on a line by itself somewhere, with or without the double quotes. I doubt whether avisynth script lnaguage could be properly described in Backus–Naur [used in Kernighan & Ritchie, "The C Programming Language", at the back of the book to describe Std/ISO C]. Or in those language describing tools derived from Unix "Lex", and the like. Making "\" line continuation optional, is just making it even more lax than it already is, and is bound to come with a bundle of trip wires, safer to forget any changes at this stage in the life of AVS. I guess Ben implemented the language to the point that it could get the job done, and no further. Backus–Naur form:- https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_form The C Programming Language:- https://en.wikipedia.org/wiki/The_C_...mming_Language Lex:- https://en.wikipedia.org/wiki/Lex_(software) Lexx [The bestest Sci-Fi ever created, weird mix of Canadian-German humour]:- https://en.wikipedia.org/wiki/Lexx EDIT: Quote:
EDIT: Fixed the Lex link, the trailing ")" of the url is wrongly appended as simple text after the remaining part of the link by the vBulletin url insertion thingy.
__________________
I sometimes post sober. StainlessS@MediaFire ::: AND/OR ::: StainlessS@SendSpace "Some infinities are bigger than other infinities", but how many of them are infinitely bigger ??? Last edited by StainlessS; 1st June 2021 at 17:15. |
|
1st June 2021, 17:01 | #1004 | Link |
Registered User
Join Date: Oct 2002
Location: France
Posts: 2,316
|
Ok, i've tested the following change :
Code:
static void Sobel_8(const unsigned char *psrc,unsigned char *pdst,const int32_t src_pitch, const int32_t dst_pitch, const int32_t src_height,const int32_t dst_row_size, int32_t thresh) { const int32_t i = (dst_row_size-2 + 3) >> 2; if (aWarpSharp_Enable_AVX) { for (int32_t y=0; y<src_height; y++) { JPSDR_Sobel_8_AVX(psrc+1,pdst+1,src_pitch,y,src_height,i,thresh); pdst[0] = pdst[1]; pdst[dst_row_size-1] = pdst[dst_row_size-2]; psrc += src_pitch; pdst += dst_pitch; } } else { for (int32_t y=0; y<src_height; y++) { JPSDR_Sobel_8_SSE2(psrc+1,pdst+1,src_pitch,y,src_height,i,thresh); pdst[0] = pdst[1]; pdst[dst_row_size-1] = pdst[dst_row_size-2]; psrc += src_pitch; pdst += dst_pitch; } } } Code:
a=AVISource("SP_IVTC.avi",False,"YV12").SetPlanarLegacyAlignment(True) b=aWarpSharp2(a,chroma=3,threads=1) c=aWarpSharp2(a,chroma=3,threads=0) Subtract(b,c).Levels(127, 1, 129, 0, 255) So can you test if still crashing with previous change, and following change : Code:
static void Sobel_16(const unsigned char *psrc,unsigned char *pdst,const int32_t src_pitch, const int32_t dst_pitch, const int32_t src_height,int32_t dst_row_size, int32_t thresh,uint8_t bit_pixel) { const int32_t i = (dst_row_size-4 + 3) >> 2; dst_row_size >>= 1; thresh <<= (bit_pixel-8); if (aWarpSharp_Enable_AVX) { for (int32_t y=0; y<src_height; y++) { uint16_t *dst=(uint16_t *)pdst; JPSDR_Sobel_16_AVX(psrc+2,pdst+2,src_pitch,y,src_height,i,thresh); dst[0]=dst[1]; dst[dst_row_size-1]=dst[dst_row_size-2]; psrc += src_pitch; pdst += dst_pitch; } } else { for (int32_t y=0; y<src_height; y++) { uint16_t *dst=(uint16_t *)pdst; JPSDR_Sobel_16_SSE2(psrc+2,pdst+2,src_pitch,y,src_height,i,thresh); dst[0]=dst[1]; dst[dst_row_size-1]=dst[dst_row_size-2]; psrc += src_pitch; pdst += dst_pitch; } } } Code:
aWarpSharp2(a,chroma=3,threads=1) .... I realised too late, should have put this on my aWarpsharp thread...
__________________
My github. Last edited by jpsdr; 1st June 2021 at 17:04. |
1st June 2021, 17:39 | #1005 | Link | |
Registered User
Join Date: Jan 2014
Posts: 2,314
|
Quote:
Code:
movntdq XMMWORD ptr[rsi+rdi],xmm2 Unfortunately one have to treat the left and rightmost loads specially when we want to keep the aligned access in the middle. |
|
1st June 2021, 20:03 | #1006 | Link |
Registered User
Join Date: Oct 2002
Location: France
Posts: 2,316
|
Argh... Forgot this. For purpose testing right not, does replacing "movntdq" with "movdqu" make it work ?
If yes, will create a specific asm function with "movdqu" for 1rst and last line. Euh.... Why Sobel_8 worked for me on my Windows7 x86 avs2.60... .... Ah... picture was grey as expected in VDub and probably didn't notice an error message displayed in the bottom information line. If there is no pop-up crash, displayed picture was grey as expected, and i didn't realise... I'll redo the test. Edit Indeed, didn't notice the error message in the bottom line of VDub... Edit2 Replacing with "movdqu" solved, but if it seems to produce the same result for the 1 line pixel, it seems not for the last line pixel. Can you confirm that with "movdqu" there is also no crash with CUDA ?
__________________
My github. Last edited by jpsdr; 1st June 2021 at 20:16. |
1st June 2021, 20:33 | #1007 | Link |
Registered User
Join Date: Jan 2014
Posts: 2,314
|
I'll look into it tomorrow, this bug however is not Cuda aware build specific; pure luck if it did not cause troubles. Movnt is the streaming version of mov mnemonic and requires aligned access. Replacing it with .u (unaligned) will surely solve the problem.
|
1st June 2021, 22:19 | #1008 | Link |
Registered User
Join Date: Apr 2010
Location: I have a statue in Hakodate, Japan
Posts: 744
|
Friends, I am extremely worried, I made a great effort to buy my "new" card with a dual core processor and I have not been able to take advantage of its two cores, I feel like I have a Celeron, if someone gave a light, I am trying to update and improve all the scripts that I have posted but this situation is out of my hands.
|
2nd June 2021, 11:01 | #1010 | Link | ||
Avisynth language lover
Join Date: Dec 2007
Location: Spain
Posts: 3,431
|
Quote:
However, there are some cases where a newline is required (see this post and this one), so it is not possible to have the parser ignore them altogether. Quote:
|
||
2nd June 2021, 11:35 | #1012 | Link |
Registered User
Join Date: Jan 2014
Posts: 2,314
|
Nekopanda extracted and rewrote some stuff what was needed for his project. Actually KTGMC filter is the beginning of what you are searching for.
https://github.com/pinterf/AviSynthC...C/MV.cpp#L5416 |
2nd June 2021, 12:33 | #1014 | Link | |
Registered User
Join Date: Jan 2018
Posts: 2,156
|
Quote:
https://drive.google.com/uc?export=d...Q4x49ZCt8W2jkS |
|
2nd June 2021, 13:15 | #1016 | Link | |||
Registered User
Join Date: Sep 2010
Location: Ukraine, Bohuslav
Posts: 377
|
Apparently it doesn't work.
Quote:
Quote:
Quote:
|
|||
2nd June 2021, 14:50 | #1018 | Link |
Registered User
Join Date: Sep 2010
Location: Ukraine, Bohuslav
Posts: 377
|
I went through some steps of your building manual with minor changes (using cuda 11.3 and setting cuda arch 7.5 where it was set to lower version, also fixed afxres.h to windows.h in nnedi3 since it doesn't build on latest msvc).
|
2nd June 2021, 15:13 | #1019 | Link |
Registered User
Join Date: Jan 2014
Posts: 2,314
|
Ehh, that build manual is rather a 'log of my adventures on doing something I've never encountered before'. O.K., in a somewhat polished version, but I was happy that it worked as is for me. Anyway it can be a good start for someone who really wants to be involved in the project. (Probably not me, it requires weeks or months to have an active knowledge on it, though programming CUDA is a very interesting topic).
|
2nd June 2021, 16:10 | #1020 | Link | |
Registered User
Join Date: Jan 2014
Posts: 2,314
|
Quote:
Such filters cannot be called again for a new frame until the previous frame is ready. There is blocking, new requests are getting into a queue. Prefetch(2) is starting threads with prefetching 4 frames in advance. It is possible that the calls in this scenario are producing nonlinear frame access. In this case the somewhat slower execution is blocking the other frame requests, we are seeing a negative feedback. This is not something which is debuggable easily. There is probably an 1/10000sec delay in timing conditions which results in the first out-of-sequence access to LWLibavVideoSource. Debug build is quick. But release build is getting into this state after some ten frames. When I put a simple line in the source code around the delay (cout::stdout << frame_number - really not a time consuming operation) which is writing the actual frame number to the standard output - the problem disappears. Pretty much an the observer effect: the disturbance of an observed system by the act of observation. I'd like to understand how it begins and how this chaotic internal state can be healed but it is not easy at all. |
|
|
|