Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
16th September 2007, 00:27 | #21 | Link |
brainless
Join Date: Mar 2003
Location: Germany
Posts: 3,653
|
@revgen: if you cannot use something else than Paint then just ensure to set the image size to 1x1 pixels via
Image -> Attributes before pasting an image! this will avoid the white borders!
__________________
Don't forget the 'c'! Don't PM me for technical support, please. |
16th September 2007, 00:30 | #22 | Link | |
Registered User
Join Date: Sep 2004
Location: Near LA, California, USA
Posts: 1,545
|
Quote:
I'll do it next time.
__________________
Pirate: Now how would you like to die? Would you like to have your head chopped off or be burned at the stake? Curly: Burned at the stake! Moe: Why? Curly: A hot steak is always better than a cold chop. |
|
16th September 2007, 00:48 | #23 | Link |
Registered User
Join Date: Dec 2003
Location: MO, US
Posts: 999
|
scharfis, I put up a new nnedi.dll at the same location as before. Can you dl it and see if it fixes the problems with sse. The last one I put up always used sse (then compared the results for each routine to the C version) so opt didn't do anything.
|
16th September 2007, 00:58 | #24 | Link |
brainless
Join Date: Mar 2003
Location: Germany
Posts: 3,653
|
this version behaves like the original one:
- no debugview output - opt=1 produces a nice output - opt=2 produces garbage btw.: I am working with AVS 2.58
__________________
Don't forget the 'c'! Don't PM me for technical support, please. |
16th September 2007, 04:30 | #26 | Link | |
brainless
Join Date: Mar 2003
Location: Germany
Posts: 3,653
|
Still the same:
(to quote myself) Quote:
EDIT: I just tested it in Microsoft VirtualPC on a fresh, virgin-like install of WindowsXP. The result was the same: - opt=1 OK - opt=2/0 Garbage Is it possible, that my CPU is faulty and processes SSE commands in a wrong way? Are there programs to check for correct execution of commands (or command sets like SSE)?
__________________
Don't forget the 'c'! Don't PM me for technical support, please. Last edited by scharfis_brain; 16th September 2007 at 04:53. |
|
16th September 2007, 07:04 | #27 | Link |
Registered User
Join Date: Dec 2003
Location: MO, US
Posts: 999
|
I don't know of any programs to check correct execution of sse, but I also haven't looked for one. The only thing that makes the findCluster sse routine (which is the only one that doesn't work correctly on your computer) different from the other sse routines is that it uses the 'comiss' instruction. The rest of it is almost exactly the same as one of the other routines which works correctly.
Maybe someone else with an athlon xp can test? |
16th September 2007, 10:11 | #29 | Link |
Registered User
Join Date: Dec 2003
Location: MO, US
Posts: 999
|
Here are the C/sse routines, maybe someone can see something I can't:
Code:
int findCluster_C(const float *input, const float *clusters, const int n) { int idx; float mdiff = FLT_MAX; for (int i=0; i<n; ++i) { float diff = 0.0f; for (int j=0; j<100; ++j) diff += (input[j]-clusters[j])*(input[j]-clusters[j]); if (diff < mdiff) { mdiff = diff; idx = i; } clusters += 100; } return idx; } __declspec(align(16)) const float sse_floatmax[4] = { FLT_MAX, FLT_MAX, FLT_MAX, FLT_MAX }; int findCluster_SSE(const float *input, const float *clusters, const int n) { int idx; __asm { xor eax,eax mov edx,n mov esi,clusters movaps xmm7,sse_floatmax i_loop: mov edi,input mov ecx,5 xorps xmm0,xmm0 xorps xmm1,xmm1 twenty_loop: movaps xmm2,[esi] movaps xmm3,[esi+16] movaps xmm4,[esi+32] movaps xmm5,[esi+48] movaps xmm6,[esi+64] subps xmm2,[edi] subps xmm3,[edi+16] subps xmm4,[edi+32] subps xmm5,[edi+48] subps xmm6,[edi+64] mulps xmm2,xmm2 mulps xmm3,xmm3 mulps xmm4,xmm4 mulps xmm5,xmm5 mulps xmm6,xmm6 addps xmm1,xmm2 addps xmm3,xmm4 addps xmm5,xmm6 addps xmm0,xmm3 addps xmm1,xmm5 add esi,80 add edi,80 sub ecx,1 jnz twenty_loop addps xmm0,xmm1 movhlps xmm1,xmm0 addps xmm0,xmm1 movaps xmm1,xmm0 psrlq xmm1,32 addss xmm0,xmm1 comiss xmm0,xmm7 jae check_loop movss xmm7,xmm0 mov idx,eax check_loop: add eax,1 cmp eax,edx jl i_loop } return idx; } |
16th September 2007, 13:35 | #30 | Link |
Registered User
Join Date: Nov 2001
Posts: 291
|
@tritical
First of all thank for this contribution; in a fast look (didn't analyze code) if I don't remember wrong psrlq xmm1,32 is a SSE2 instruction not supported in old SSE capables cpus. All xmm instructions in SSE are just for floating point ones. I have not my papers here but ALMOST sure about that. Best regards for this project ARDA |
16th September 2007, 15:51 | #31 | Link |
Avisynth Developer
Join Date: Jan 2003
Location: Melbourne, Australia
Posts: 3,167
|
Yep, psrlq xmm1,32 is an SSE2 instruction.
A convienient reference is distrib/include/SoftWire/InstructionSet.cpp One of SHUFPS, UNPCKLPS or UNPCKHPS is probably what you want. Last edited by IanB; 16th September 2007 at 16:15. |
16th September 2007, 19:17 | #33 | Link | |
Registered User
Join Date: Sep 2004
Location: Near LA, California, USA
Posts: 1,545
|
Quote:
That's not too bad at all for Yadif. I'll try it out myself later.
__________________
Pirate: Now how would you like to die? Would you like to have your head chopped off or be burned at the stake? Curly: Burned at the stake! Moe: Why? Curly: A hot steak is always better than a cold chop. |
|
16th September 2007, 19:35 | #34 | Link |
Registered User
Join Date: Dec 2003
Location: MO, US
Posts: 999
|
Thank you ARDA and IanB. I replaced psrlq with shufps. The funny thing is I originally added movaps/psrlq to replace pshufd so that it wouldn't require SSE2.
I put up a new version at the same link as before. scharfis or Fizick, could you test it when you have time? |
16th September 2007, 19:54 | #36 | Link |
Huh?
Join Date: Sep 2003
Location: Uruguay
Posts: 3,103
|
Revgen, could you try to include TDeint+NNEDI+TMM on your comparison?
__________________
Read Decomb's readmes and tutorials, the IVTC tutorial and the capture guide in order to learn about combing and how to deal with it. |
16th September 2007, 20:33 | #37 | Link |
Registered User
Join Date: Sep 2004
Location: Near LA, California, USA
Posts: 1,545
|
Hmm... I didn't know about TMM until you mentioned it. I'll try it out as soon as my other encode is finished.
__________________
Pirate: Now how would you like to die? Would you like to have your head chopped off or be burned at the stake? Curly: Burned at the stake! Moe: Why? Curly: A hot steak is always better than a cold chop. |
17th September 2007, 06:06 | #38 | Link |
Registered User
Join Date: Sep 2004
Location: Near LA, California, USA
Posts: 1,545
|
Okay I checked out TDeint+TMM+NNEDI. The good news is that it rivals MVBob (with either EEDI or NNDI in the script) in terms of quality and stability. The bad news is that it's about as slow as MVBob too. And this is with Threads=2 enabled for NNEDI. It doesn't come close to MCBob though, regardless of whether MCBob is using the NNEDI or not.
[Hint]I wonder if Tritical would be interested in adding an Emask parameter to Yadifmod.[/Hint] It would be nice to see what result we get with Yadif combined with NNEDI and TMM.
__________________
Pirate: Now how would you like to die? Would you like to have your head chopped off or be burned at the stake? Curly: Burned at the stake! Moe: Why? Curly: A hot steak is always better than a cold chop. Last edited by Revgen; 17th September 2007 at 06:14. |
17th September 2007, 07:35 | #39 | Link |
Registered User
Join Date: Dec 2003
Location: MO, US
Posts: 999
|
If you were to going to use tmm/nnedi you would get the same output as using tdeint+tmm+nnedi... there wouldn't be anything for yadif to do. It doesn't matter anyways, because yadif doesn't use a motion mask like tmm outputs. Yadif doesn't make a straight weave or don't weave decision. It starts with the spatial prediction, and then limits that value to be within 'diff' of the weaved prediction (average of pixels from the prev and next fields). 'diff' is calculated from temporal differences and spatial differences.
There is one obvious improvement that can be made to yadif, and that is to slide the temporal window. Right now it is basically a five field check that checks only the middle case... so, for example, it will never output the weaved prediction if the center field (the one being turned into a frame) is within 2 fields (ahead or back) of a scenechange. The only downside is the added computational complexity. Making it check all five cases is on my list of things to do. |
17th September 2007, 10:39 | #40 | Link |
Registered User
Join Date: Dec 2002
Location: UK
Posts: 1,673
|
Thanks for more toys to play with!
What's the difference, algorithmically, between NNEDI and EEDI2? (Apart from EEDI2 wanting the fields, and NNEDI throwing one field away from a frame?) Should I stop using EEDI2 and start using pointresize.NNEDI? Cheers, David. |
Tags |
deinterlace, nnedi |
|
|