RemoveGrain - Page 2

Audionut · 17th August 2004, 15:48

I was missing a couple of dll's.

Got the same problem with the SSE2 version as Boulder.

kassandro · 17th August 2004, 15:54

Quote:

Originally posted by Boulder
Has the SSE2 version changed in any way or am I better off using the SSE optimised one? I use mainly mode=8.

The SSE2 version has changed in the same way as the SSE version (thus only for mode=2,3,4). Actually I use macros for the registers (for SSE I take the 64 bit mmx registers and for SSE2 I take the 128 bit SSE registers) and for the instructions, where SSE and SSE2 differ. Thus all changes effect both versions equally and simultaneously. Unfortunately there must be something wrong with this very economical way of programming.

Quote:

Originally posted by Audionut
In Vitualdubmod, I get an error,

Unable to load "C:\RemoveGrain.dll"
Same goes for the SSE2 version.

You probably have not installed "msvcr70.dll" properly (see the installation section of www.AvsTimer.de.tf for more details). You may also use the staticly linked version RemoveGrainS.dll instead. In fact, this is the only reason, why I provide it.

kassandro · 17th August 2004, 16:01

Quote:

Originally posted by Audionut
I was missing a couple of dll's.

Got the same problem with the SSE2 version as Boulder.

My response to your problem in the previous posting is obsolete (your posting was almost simultaneous). Thanks you for your test. I will buy an SSE2 capable cpu (2.66GHZ Prescott Celeron + motherboard with Intel chip set + 512 MB of DDR-RAM is now well below a very affordable 200 Euros) by the end of the year. Then I should be able to track down the problem.

Audionut · 17th August 2004, 16:08

Quote:

Originally posted by kassandro
My response to your problem in the previous posting is obsolete (your posting was almost simultaneous). Thanks you for your test. I will buy an SSE2 capable cpu (2.66GHZ Prescott Celeron + motherboard with Intel chip set + 512 MB of DDR-RAM is now well below a very affordable 200 Euros) by the end of the year. Then I should be able to track down the problem.

Thankyou, for this fine filter. Hope you can get the SSE2 version working, without many hassles.

Audionut · 17th August 2004, 16:39

I love this filter.

6000 frame clip (Clean PAL DVD) single pass, const. quant 2, with no filter's returned 2979kbps bitrate.
With Undot(), returned 2828kbps
With removegrain(), returned 2662kbps bitrate.

With removegrain() and unfilter(-5,-5) returned 2328kbps.

Thankyou again.

ARDA · 17th August 2004, 20:20

@kassandro

First of all thanks for the great work you're doing.
Second I want to make a question about a reference of Undot you've posted.

Quote:

kassandro wrote:
By the very nature of the algorithm border pixels cannot be processed and should therefore be left unchanged. While Undot does just this for the left, the top and the bottom border. it makes a mistake on the right border,where instead of copying the last pixel on the line, it copies the penultimate pixel to the last pixel,which simply doesn't make sense and should be considered as a bug.

Please would you please be more specific about that bug; some time ago I've made some test with Undot sources and don't remember such a thing. Is it in YV12 or YUY2 colorspace?

Thanks ARDA

kassandro · 18th August 2004, 11:37

Quote:

Originally posted by ARDA
Please would you please be more specific about that bug; some time ago I've made some test with Undot sources and don't remember such a thing. Is it in YV12 or YUY2 colorspace?

First of all, the Undot bug is not severe. It doesn't cause a crash and concerns only the last pixel on a line. It is therefore hardly noticable. Nevertheless it is instructive to see how I uncovered the Undot bug. I started with the following script:

Code:

input=MPEG2Source("test.d2v")
Undot_clip=Undot(input)
RemoveGrain_clip=RemoveGrain(input, mode=1)
difference(Undot_clip, removegrain_clip)

The difference filter is taken from my AlignFields plugin. It reports for each frame the SAD difference as well as the number of different pixels to the debugview utility. Each pixel may be counted up to three times (Y, U, V values). Thus for a standard 720x576 clip there may be roughly up to 600000 different pixels for each frame.
Running the script with Vdubmod I was surprised to get the following output from debugview:

Code:

[1016] [129] total difference = 1, different Pixels 1
[1016] [130] total difference = 3, different Pixels 3
[1016] [131] total difference = 2, different Pixels 2
[1016] [132] total difference = 1, different Pixels 1
[1016] [133] total difference = 2, different Pixels 2
[1016] [134] total difference = 4, different Pixels 4
[1016] [135] total difference = 5, different Pixels 5
[1016] [136] total difference = 4, different Pixels 4
[1016] [137] total difference = 5, different Pixels 5
[1016] [138] total difference = 3, different Pixels 3

Thus there were differences between Undot and RemoveGrain(mode=1) but only very few. The border pixels were the immediate suspects (Undot and RemoveGrain require 8 adjacent pixels, whence border pixels cannot be processed and should therefore be left unchanged). To prove this I ran the following modified script:

Code:

input=MPEG2Source("test.d2v")
Undot_clip=Undot(input)
Undot_clip=crop(Undot_clip, 2,2,-2,-2)
RemoveGrain_clip=RemoveGrain(input, mode=1)
RemoveGrain_clip=crop(RemoveGrain_clip, 2,2,-2,-2)
difference(Undot_clip, removegrain_clip)

and fortunately I now got

Code:

[1016] [204] total difference = 0, different Pixels 0
[1016] [205] total difference = 0, different Pixels 0
[1016] [206] total difference = 0, different Pixels 0
[1016] [207] total difference = 0, different Pixels 0
[1016] [208] total difference = 0, different Pixels 0
[1016] [209] total difference = 0, different Pixels 0
[1016] [210] total difference = 0, different Pixels 0
[1016] [211] total difference = 0, different Pixels 0
[1016] [212] total difference = 0, different Pixels 0

Thus the differences between Undot and RemoveGrain(mode=1) are indeed on the border only. Next I asked the question: Are all border pixels handled differently or only a part? Running the following modified script:

Code:

input=MPEG2Source("test.d2v")
Undot_clip=Undot(input)
Undot_clip=crop(Undot_clip, 0,0,-4,0)
RemoveGrain_clip=RemoveGrain(input, mode=1)
RemoveGrain_clip=crop(RemoveGrain_clip, 0,0,-4,0)
difference(Undot_clip, removegrain_clip)

I still got the zero difference ouput

Code:

[1016] [251] total difference = 0, different Pixels 0
[1016] [252] total difference = 0, different Pixels 0
[1016] [253] total difference = 0, different Pixels 0
[1016] [254] total difference = 0, different Pixels 0
[1016] [255] total difference = 0, different Pixels 0
[1016] [256] total difference = 0, different Pixels 0
[1016] [257] total difference = 0, different Pixels 0
[1016] [258] total difference = 0, different Pixels 0
[1016] [259] total difference = 0, different Pixels 0
[1016] [260] total difference = 0, different Pixels 0
[1016] [261] total difference = 0, different Pixels 0

Thus only the border pixels on the right side are processed differently. Having located the differences I now turned to the Undot source code and found the following cause:

Code:

// do last qword
lea	esi, [esi+eax-8-BPP]	// point at last qword
movq	mm0, qword ptr[esi+ebx]		// move 1st 4 pixel
movq	qword ptr[edi+eax-8], mm0

The comment "point at last qword" is incorrect. Because of -BPP=-1 he doesn't "point at the last qword". Removing "-BPP" from the source code, Undot and RemoveGrain(mode=1) should become identical. Right now, because of -BPP, Undot copies the penultimate pixel of a line of the input frame to the last pixel of the same line of the output frame, which may improve compression a tiny little bit but is simply not correct and was not intended either.

Boulder · 18th August 2004, 11:44

Heh, that's one nice post about troubleshooting

Maybe you should notify trbarry so that he might fix the bug once he has the time. I haven't seen him in a while though.

Fizick · 18th August 2004, 18:15

Tom Barry can be notified by e-mail (recently I inform him about other filter).

Kassandro, may I to use some part of your RemoveGrain code in my DeGrainMedian code ?

ARDA · 18th August 2004, 20:06

@kassandro
Thanks for the long explanation, I think it is very instructive; at least for me.
I've borrowed this function modified a little and moving the slide in virtualdub could see the difference between your plugin and undot on the right side of amplified difference; and as you say it's hardly noticeble.

Code:

 
clip=MPEG2Source("mysource")
v1 =clip.UnDot
v2 =clip.RemoveGrain(mode=1)
sub = v2.subtract(v1)
substrong = sub.levels(122,1,132,0,255)
v3 = StackVertical(StackHorizontal(substrong.subtitle("Difference amplified"),
\v1.subtitle("UnDot")),StackHorizontal(sub.subtitle("Difference"),v2.subtitle("original")))
return v3

Thanks again for this great work.

ARDA

kassandro · 18th August 2004, 21:36

@Boulder:
Yes, I haven't seen trbarry here for quite a while. He seems to be busy with other things. I will draw his attention to this slight problem, when he returns.

@ARDA:
Thanks for the nice sript. Actually, my input source was not very suitable for the test this morning. It had a black stripe on the right hand side. Thus the penultimate and the last pixel on a line were almost always identical. Only noise did give small differences. Cropping away the black stripe, your script displayed the differences nicely. Actually it shows that the chroma difference effects the last two pixels on the right hand side.
Since trbarry's assembly routine is nearly the same for YUY2, but with BPP=2 instead of BPP=1 for YV12, the same copying mistake is made for YUY2. On the other hand for YUY2, Undot and RemoveGrain(mode=1) cannot be compared, because Undot doesn't process the chroma of YUY2 clips. He could have done it with nearly the same assembly routine and BPP=4, but more attention at the border would have been necessary. Thus Undot and RemoveGrain(mode=1, modeU=0) are nearly identical for YUY2, but for YUY2 Undot is much faster.

kassandro · 20th August 2004, 04:36

Quote:

Originally posted by Fizick
Kassandro, may I to use some part of your RemoveGrain code in my DeGrainMedian code ?

Sorry, Fizick, for having overlooked your posting yesterday, which I cannot understand, because it was just between Boulder's and ARDA's last posting.
Of course you can use my code. However, as I mentioned earlier, the programming style is very different from trbarry's. Because I have always SSE2 in mind, I use macros where SSE and SSE2 differ, which makes the code more difficult to understand. Understanding is also hampered by my passion to overoptimise. Though the SSE2 code doesn't work currently, I should be able to fix it easily once I have a SSE2 capable cpu. In the end, we all have to switch to SSE2, because the Athlon64 platform, which will abolish the current 32 bit platform within 2 years, because Intel has adopted it, has no mmx registers anymore in 64 bit mode. Rather it has 16 SSE 128 bit registers. It would have been much better to have 8 256 bit registers instead, with which one could easily double dct/idct performance and process 32 pixels instead of 16 simultaneously with various Avisynth filters.

Fizick · 20th August 2004, 19:29

Kassandro,
Thanks for your permission!
I am quite easy understand your code. It is very well optimized and structured.
I already put it to my filter beta (will be released soon, of course under GPL). But i remove SSE2.

About SSE2. I think its time is not come for the present
(not compatible with Athlon XP, so SSEMMX is standard now). I do not think, that SSE2 speed can be used in every filter. And the speed is not most important thing, but algo and quality (for me). But the next generation of filter-writers will use SSE4 of course!

BTW, why you do not like Adobe so much?

kassandro · 29th August 2004, 12:38

If there are no special restrictions, like in RemoveDirt, where the block size is fixed, then one should be able to produce a SSE and an SSE2 version simultaneously by using some simple macros for registers and some load and store instructions. I just learned that there is one important difference, which I am was completely unaware, when I made RemoveGrain. For 128 bit memory operands of instructions like paddusb, pminub etc. the memory operand must always be aligned (i.e. the address must be a multiple of 16), while 64 bit operands as used for SSE are allowed to be unaligned. I have to blame Intel for my ignorance, because Intel always speaks only of 128 bit memory operands and not of 128 bit aligned memory operands. Only when Intel discusses exceptions, which I never read, because I simply do not expect such exceptions for my programs, then one can read, that a memory alignment error triggers a read access exception, as reported by Boulder. Fortunately I read at least one time the exception stuff and then everything was obvious for me. Now it is my programming style to load into a register, if the data is used more than once (that is a key difference between Undot and RemoveGrain(mode=1)), because only one instruction with a memory operand can be executed at a time (that is the reason why RemoveGrain(mode=1) is about 4 fps faster than Undot). Thus for some modes I don't use at all instructions with a memory operands and these were exactly the modes reported to work by Boulder.
I just uploaded a new SSE test version (it must be called with DRemoveGrain) to the web site. The other parts of the binary archive as well as the source archive have not changed. The SSE2 problem should now be fixed, though I may have overlooked 1 or 2 instructions with a memory operand.

Boulder · 29th August 2004, 13:00

Great, I'll have a go ASAP!

Boulder · 29th August 2004, 13:21

OK, I tested the SSE2 version. It looks like all the other modes except number 4 work and produce no differences compared to the SSE version, viewed with DebugViewer. If I use mode=4, VirtualDubMod just closes without giving any error messages whatsoever.

Here's another bug as well, I noticed it earlier but forgot to post

If I use modeU=-1 to disable chroma processing, I first get a screen like this:

After scrolling for a few dozen frames in VDubMod, the output turns like this (notice the odd ghosting) :

If I use modeU=0, everything's OK. With mode=-1, I also get weird results. This occurs on both SSE and SSE2 version.

kassandro · 29th August 2004, 13:57

Quote:

Originally posted by Boulder
OK, I tested the SSE2 version. It looks like all the other modes except number 4 work and produce no differences compared to the SSE version, viewed with DebugViewer. If I use mode=4, VirtualDubMod just closes without giving any error messages whatsoever.

I looked again at the source code and the SSE2 bug seems to be corrected. On the other hand, there have been a lot of changes meanwhile and the current SSE2 binary is only a snapshot. In fact, the next version 0.6 will contain a new plugin, which is derived from the same source file, such that both plugins can be maintained simultaneously. Unfortunately, the technique to creat different plugins from the same source code has one drawback: the source code gets a little bit messy.

Quote:

Here's another bug as well, I noticed it earlier but forgot to post If I use modeU=-1 to disable chroma processing, I first get a screen like this:

This is not a bug. If you want to leave the chroma unchanged, you have to use modeU=0. If you want the chroma to be not processed at all (not even copied), which is faster, you should choose modeU=-1. If modeU=-1, then the chroma becomes random (in your case green and then it changes again). Why modeU=-1?. If you have black&video, you can save time if you use modeU=-1 and then at the end you simply use the greyscale command. The chroma is automaticly erased if RemoveDirt is used with grey=true.
If you you use mode=-1, then the luma becomes random as well. Of course, this doesn't make sense even for b&w material.

Boulder · 29th August 2004, 14:06

Quote:

Originally posted by kassandro

This is not a bug. If you want to leave the chroma unchanged, you have to use modeU=0. If you want the chroma to be not processed at all (not even copied), which is faster, you should choose modeU=-1. If modeU=-1, then the chroma becomes random (in your case green and then it changes again).

Oh, then it was a mistake on my part. I actually thought that using modeU=-1 ended up in a b&w clip, but that has to be done either with RemoveDirt(grey=true) or Greyscale().

When can we expect the v0.6?

kassandro · 29th August 2004, 15:16

Quote:

Originally posted by Boulder
Oh, then it was a mistake on my part. I actually thought that using modeU=-1 ended up in a b&w clip, but that has to be done either with RemoveDirt(grey=true) or Greyscale().

From this design you can see that RemoveGrain was designed as a precleaner for RemoveDirt. Unfortunately the very sophisticated RemoveDirt becomes more and more obsolete. The new plugin derived from the RemoveGrain source, currently called Repair, leads to a much more compression efficient way of cleaning: firstly I apply a very simple and very fast temporal clenser (essentially RemoveDirt without any artifact protection), then the Repair plugin compares the clensed with the original frames and instead of removing grain it removes the massive clenser artifacts (unfortunately it restores also medium and big temporal dirt). Finally RemoveGrain is applied to erase some left over. While this filter, called RemoveDust, can never remove medium or big dirt like RemoveGrain, it is much more efficient on dust than RemoveDirt and much more efficent on grain than RemoveGrain. My first tests show very remarkable compression gains with only slight motion blurring. The static parts have the same softness as the corresponding RemoveGrain modes.
Now already, Fizick in his denoiser plugin, made a temporal extension of RemoveGrain mode 5-9, but for mode=9 (trbarry's ST-median) an additional change limitation is necessary to avoid very unpleasant artifacts. While this temporal extension makes a lot of sense, the gain over the purely spatial variants is limited. Like RemoveGrain with mode >=5 it can never remove grain consisting of three equal pixels on a line segment, even if this piece of grain is only on one frame. Thus Fizick can't really exploit the temporal nature of grain either. That will change with RemoveDust.
I will supply various compression comparisons of RemoveDust with RemoveGrain and DeGrainMedian. A comparison with convolution based smoothers doesn't make sense to me.

Quote:

When can we expect the v0.6?

The most difficult work is the documentation and of course a lot of tests are necessary. I hope to release it within one to three weeks.

Boulder · 29th August 2004, 15:26

Quote:

Originally posted by kassandro
From this design you can see that RemoveGrain was designed as a precleaner for RemoveDirt. Unfortunately the very sophisticated RemoveDirt becomes more and more obsolete. The new plugin derived from the RemoveGrain source, currently called Repair, leads to a much more compression efficient way of cleaning: firstly I apply a very simple and very fast temporal clenser (essentially RemoveDirt without any artifact protection), then the Repair plugin compares the clensed with the original frames and instead of removing grain it removes the massive clenser artifacts. Finally RemoveGrain is applied to erase some left over. While this filter, called RemoveDust, can never remove medium or big dirt like RemoveGrain, it is much more efficient on dust than RemoveDirt and much more efficent on grain than RemoveGrain. My first tests show very remarkable compression gains with only slight motion blurring. The static parts have the same softness as the corresponding RemoveGrain modes.

Sounds interesting, it will once again be one to try for me. I just hope you'll be able to keep one of the strong points of RemoveDirt - the incredible speed

Do you have any estimates on when it might be released? I'll have 2-4 analog TV capture clips to encode weekly so I might find the filter very useful for my purposes.

18th August 2004, 11:44	#28 \| Link
Boulder Pig on the wing Join Date: Mar 2002 Location: Finland Posts: 5,733	Heh, that's one nice post about troubleshooting Maybe you should notify trbarry so that he might fix the bug once he has the time. I haven't seen him in a while though. __________________ And if the band you're in starts playing different tunes I'll see you on the dark side of the Moon...

18th August 2004, 20:06	#30 \| Link
ARDA Registered User Join Date: Nov 2001 Posts: 291	@kassandro Thanks for the long explanation, I think it is very instructive; at least for me. I've borrowed this function modified a little and moving the slide in virtualdub could see the difference between your plugin and undot on the right side of amplified difference; and as you say it's hardly noticeble. Code: clip=MPEG2Source("mysource") v1 =clip.UnDot v2 =clip.RemoveGrain(mode=1) sub = v2.subtract(v1) substrong = sub.levels(122,1,132,0,255) v3 = StackVertical(StackHorizontal(substrong.subtitle("Difference amplified"), \v1.subtitle("UnDot")),StackHorizontal(sub.subtitle("Difference"),v2.subtitle("original"))) return v3 Thanks again for this great work. ARDA

29th August 2004, 12:38	#34 \| Link
kassandro Registered User Join Date: May 2003 Location: Germany Posts: 502	SSE2 bug, I got you! If there are no special restrictions, like in RemoveDirt, where the block size is fixed, then one should be able to produce a SSE and an SSE2 version simultaneously by using some simple macros for registers and some load and store instructions. I just learned that there is one important difference, which I am was completely unaware, when I made RemoveGrain. For 128 bit memory operands of instructions like paddusb, pminub etc. the memory operand must always be aligned (i.e. the address must be a multiple of 16), while 64 bit operands as used for SSE are allowed to be unaligned. I have to blame Intel for my ignorance, because Intel always speaks only of 128 bit memory operands and not of 128 bit aligned memory operands. Only when Intel discusses exceptions, which I never read, because I simply do not expect such exceptions for my programs, then one can read, that a memory alignment error triggers a read access exception, as reported by Boulder. Fortunately I read at least one time the exception stuff and then everything was obvious for me. Now it is my programming style to load into a register, if the data is used more than once (that is a key difference between Undot and RemoveGrain(mode=1)), because only one instruction with a memory operand can be executed at a time (that is the reason why RemoveGrain(mode=1) is about 4 fps faster than Undot). Thus for some modes I don't use at all instructions with a memory operands and these were exactly the modes reported to work by Boulder. I just uploaded a new SSE test version (it must be called with DRemoveGrain) to the web site. The other parts of the binary archive as well as the source archive have not changed. The SSE2 problem should now be fixed, though I may have overlooked 1 or 2 instructions with a memory operand.

29th August 2004, 13:00	#35 \| Link
Boulder Pig on the wing Join Date: Mar 2002 Location: Finland Posts: 5,733	Great, I'll have a go ASAP! __________________ And if the band you're in starts playing different tunes I'll see you on the dark side of the Moon...

29th August 2004, 13:21	#36 \| Link
Boulder Pig on the wing Join Date: Mar 2002 Location: Finland Posts: 5,733	OK, I tested the SSE2 version. It looks like all the other modes except number 4 work and produce no differences compared to the SSE version, viewed with DebugViewer. If I use mode=4, VirtualDubMod just closes without giving any error messages whatsoever. Here's another bug as well, I noticed it earlier but forgot to post If I use modeU=-1 to disable chroma processing, I first get a screen like this: After scrolling for a few dozen frames in VDubMod, the output turns like this (notice the odd ghosting) : If I use modeU=0, everything's OK. With mode=-1, I also get weird results. This occurs on both SSE and SSE2 version. __________________ And if the band you're in starts playing different tunes I'll see you on the dark side of the Moon...

17th August 2004, 15:48	#21 \| Link
Audionut Registered User Join Date: Nov 2003 Posts: 1,281	I was missing a couple of dll's. Got the same problem with the SSE2 version as Boulder.

17th August 2004, 16:39	#25 \| Link
Audionut Registered User Join Date: Nov 2003 Posts: 1,281	I love this filter. 6000 frame clip (Clean PAL DVD) single pass, const. quant 2, with no filter's returned 2979kbps bitrate. With Undot(), returned 2828kbps With removegrain(), returned 2662kbps bitrate. With removegrain() and unfilter(-5,-5) returned 2328kbps. Thankyou again.

18th August 2004, 18:15	#29 \| Link
Fizick AviSynth plugger Join Date: Nov 2003 Location: Russia Posts: 2,183	Tom Barry can be notified by e-mail (recently I inform him about other filter). Kassandro, may I to use some part of your RemoveGrain code in my DeGrainMedian code ?

18th August 2004, 21:36	#31 \| Link
kassandro Registered User Join Date: May 2003 Location: Germany Posts: 502	@Boulder: Yes, I haven't seen trbarry here for quite a while. He seems to be busy with other things. I will draw his attention to this slight problem, when he returns. @ARDA: Thanks for the nice sript. Actually, my input source was not very suitable for the test this morning. It had a black stripe on the right hand side. Thus the penultimate and the last pixel on a line were almost always identical. Only noise did give small differences. Cropping away the black stripe, your script displayed the differences nicely. Actually it shows that the chroma difference effects the last two pixels on the right hand side. Since trbarry's assembly routine is nearly the same for YUY2, but with BPP=2 instead of BPP=1 for YV12, the same copying mistake is made for YUY2. On the other hand for YUY2, Undot and RemoveGrain(mode=1) cannot be compared, because Undot doesn't process the chroma of YUY2 clips. He could have done it with nearly the same assembly routine and BPP=4, but more attention at the border would have been necessary. Thus Undot and RemoveGrain(mode=1, modeU=0) are nearly identical for YUY2, but for YUY2 Undot is much faster.

20th August 2004, 19:29	#33 \| Link
Fizick AviSynth plugger Join Date: Nov 2003 Location: Russia Posts: 2,183	Kassandro, Thanks for your permission! I am quite easy understand your code. It is very well optimized and structured. I already put it to my filter beta (will be released soon, of course under GPL). But i remove SSE2. About SSE2. I think its time is not come for the present (not compatible with Athlon XP, so SSEMMX is standard now). I do not think, that SSE2 speed can be used in every filter. And the speed is not most important thing, but algo and quality (for me). But the next generation of filter-writers will use SSE4 of course! BTW, why you do not like Adobe so much?

Thread Tools	Search this Thread
Show Printable Version Email this Page	Search this Thread: Advanced Search
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode