Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
23rd May 2009, 22:36 | #1 | Link |
Registered User
Join Date: Aug 2007
Posts: 374
|
aWarpSharp2 – rewrite of aWarpSharp
Current version: 2012.03.28
Previous versions: 2009.06.19 2009.05.24 aWarpSharp by MarcFD is nice plugin (especially for tasks like halo removing), but has some bugs and like to produce green artifacts on the image borders. Other WarpSharp plugins produced worse results for me, so i decided to rewrite aWarpSharp algorithm with better handling of borders and optimization for modern CPUs. Besides complete algorithm aWarpSharp2, its parts are also available as aSobel, aBlur, aWarp and aWarp4. This way you can do advanced edge mask filtering (like MDegrain) before passing it to warp stage to get more stable result. Good usage examples: Code:
aWarp4(Spline36Resize(width*4, height*4, 0.375, 0.375), aSobel().aBlur(), depth=3) aWarp4(nnedi3_rpow2(rfactor=2).Spline36Resize(width*4, height*4, 0.25, 0.25), aSobel().aBlur(), depth=3) aWarp4(nnedi3_rpow2(rfactor=2).nnedi3_rpow2(rfactor=2), aSobel().aBlur(), depth=2) For options explanation and values mapping from used in aWarpSharp by MarcFD - read the included aWarpSharp.txt. Binary patched Toon-v1.0 to use aWarpSharp2 instead of aWarpSharp: Toon-v1.1 Last edited by SEt; 28th March 2012 at 02:07. |
23rd May 2009, 23:29 | #2 | Link |
Registered User
Join Date: Dec 2005
Location: Germany
Posts: 1,813
|
wow nice...
Just made a quick test. awarpsharp(154,2,20)#new awarpsharp(20,2,0.6)#old (marcFD) 1. new, 2. old As you can see, the border is no longer green. The pictures look very similar and the plugin seems to be about 20% faster on my Pentium D. Very good job SEt, i waited so long for a bug free awarpshap
__________________
AVSRepoGUI // VSRepoGUI - Package Manager for AviSynth // VapourSynth VapourSynth Portable FATPACK || VapourSynth Database |
24th May 2009, 01:17 | #3 | Link |
Fighting spam with a fish
Join Date: Sep 2005
Posts: 2,714
|
You wouldn't happen to have tested the speed now would you, ChaosKing?
I would do it my self, but my rig is currently encoding a Bluray and will be doing so for at least the next 17 hours.
__________________
FAQs:Bond's AVC/H.264 FAQ Site:Adubvideo Zsmooth - Cross-platform smoothing for Vapoursynth |
2nd June 2009, 03:19 | #4 | Link |
Guest
Posts: n/a
|
Thanks SEt !!
Tried it for a drop in for the original aWarpSharp.dll, but MCTDenoise is giving errors with it not supporting some of the paramters passed (and dropping the original aWarpSharp.dll back in resolves it) ... It's seems to be about 20-30% faster, so well done !! Tek |
3rd June 2009, 04:29 | #8 | Link |
x264 developer
Join Date: Sep 2005
Posts: 8,666
|
Code:
movdqu xmm2, [esi-1] movdqa xmm3, [esi] movdqu xmm4, [esi+1] movdqu xmm5, [esi+edx-1] movdqa xmm6, [esi+edx] movdqu xmm7, [esi+edx+1] ... movdqu xmm1, [esi+eax-1] movdqu xmm3, [esi+eax+1] Code:
movdqu xmm6, [esi-6] movdqu xmm0, [esi+6] pavgb xmm6, xmm0 movdqu xmm5, [esi-5] movdqu xmm7, [esi+5] pavgb xmm5, xmm7 movdqu xmm4, [esi-4] movdqu xmm0, [esi+4] pavgb xmm4, xmm0 movdqu xmm3, [esi-3] movdqu xmm7, [esi+3] pavgb xmm3, xmm7 movdqu xmm2, [esi-2] movdqu xmm0, [esi+2] pavgb xmm2, xmm0 movdqu xmm1, [esi-1] movdqu xmm7, [esi+1] pavgb xmm1, xmm7 movdqa xmm0, [esi] Code:
movd eax, xmm2 psrldq xmm2, 4 pinsrw xmm3, [eax+esi], 0 pinsrw xmm4, [eax+edx], 0 movd eax, xmm2 psrldq xmm2, 4 pinsrw xmm3, [eax+esi+1], 1 pinsrw xmm4, [eax+edx+1], 1 movd eax, xmm2 psrldq xmm2, 4 pinsrw xmm3, [eax+esi+2], 2 pinsrw xmm4, [eax+edx+2], 2 movd eax, xmm2 pinsrw xmm3, [eax+esi+3], 3 pinsrw xmm4, [eax+edx+3], 3 movd eax, xmm7 psrldq xmm7, 4 pinsrw xmm3, [eax+esi+4], 4 pinsrw xmm4, [eax+edx+4], 4 movd eax, xmm7 psrldq xmm7, 4 pinsrw xmm3, [eax+esi+5], 5 pinsrw xmm4, [eax+edx+5], 5 movd eax, xmm7 psrldq xmm7, 4 pinsrw xmm3, [eax+esi+6], 6 pinsrw xmm4, [eax+edx+6], 6 movd eax, xmm7 pinsrw xmm3, [eax+esi+7], 7 pinsrw xmm4, [eax+edx+7], 7 mov eax, [esp] Code:
movq xmm7, qword ptr [edi+ebx-1] // one line above actual position, but it gives 1.4x speedup |
4th June 2009, 16:09 | #9 | Link |
Registered User
Join Date: Mar 2009
Posts: 44
|
Help!! with this new updated famous plugin i m geting kind of like this image
i used Code:
aWarpSharp(depth=12,blur=4,thresh=51,chroma=1) the colours are dancing with old plugin i m getting normal, yeah but having green lines i used Code:
aWarpSharp(depth=12,blurlevel=4,thresh=0.2,cm=1) Edited i hav found little bit that it is due to chroma=1 bascally i don know wat is chroma cause i m new to video (just started on march and learn a lot ) for me till now chroma= 2or 3 works well and 4 also, problem is with 1 for me .. 0 was giving me black and white colour hehe Last edited by owais; 4th June 2009 at 16:35. |
6th June 2009, 13:10 | #10 | Link |
Registered User
Join Date: Aug 2007
Posts: 374
|
Dark Shikari, i know not everything is optimally written, but i thought better release working version now than super-optimized never. I know that horizontal blur is made for palignr and will look into this when i have time, but i have no idea how to save kittens or why loading correct line gives 1.4x speed drop for the whole function, including those awful pinsrw that should be much more time consuming than just unaligned load from additional memory location.
owais, have you tried to read all the aWarpSharp.txt? There explained that cm=1 of original aWarpSharp is chroma=4 in mine and what chroma values mean. |
6th June 2009, 15:39 | #11 | Link | |
x264 developer
Join Date: Sep 2005
Posts: 8,666
|
Quote:
Does that imply you tested it, and found it to be faster? If it's faster, I'm going to be inclined to blame cacheline-split. Test on an AMD chip or Nehalem and watch the penalties melt away. |
|
6th June 2009, 19:34 | #12 | Link |
Registered User
Join Date: Aug 2007
Posts: 374
|
I'm already on Nehalem and when i change
Code:
movq xmm7, qword ptr [edi+pitch*0-1] movq xmm4, qword ptr [edi+pitch*0+1] movq xmm1, qword ptr [edi+pitch*0] movq xmm2, qword ptr [edi+pitch*2] Code:
movq xmm7, qword ptr [edi+pitch*1-1] movq xmm4, qword ptr [edi+pitch*1+1] movq xmm1, qword ptr [edi+pitch*0] movq xmm2, qword ptr [edi+pitch*2] |
6th June 2009, 19:37 | #13 | Link | |
x264 developer
Join Date: Sep 2005
Posts: 8,666
|
Quote:
If you're getting such a large slowdown merely by changing that, you should try to figure out why. Performance counters might be useful for analyzing that. |
|
6th June 2009, 19:56 | #14 | Link |
AviSynth plugger
Join Date: Nov 2003
Location: Russia
Posts: 2,182
|
Can I ask, why do you change parameters meaning? It is confusing, and not "fully compatible with original aWarpSharp".
If you prefer new parameters, please use new parameters names (or new name of plugin).
__________________
My Avisynth plugins are now at http://avisynth.org.ru and mirror at http://avisynth.nl/users/fizick I usually do not provide a technical support in private messages. |
6th June 2009, 22:20 | #15 | Link |
Registered User
Join Date: Aug 2007
Posts: 374
|
Played with performance counters for some time and here is what i found (removed all pinsrw for tests as they don't change the situation):
global slowdown is produced by Code:
movq xmm7, qword ptr [edi+pitch*1-1] Code:
movq xmm4, qword ptr [edi+pitch*1+1] I've tried to change the only writing instruction here from movq to movdq2q,movntq but that changed nothing. Fizick, i think the situation is similar to MVTools 1-2 It's fully compatible in terms of available functionality and effective ranges of parameters are supersets of the original ones. I know i should probably change the name to aWarpSharp2, but it looks kind of strange with aSobel, aBlur, aWarp. In truth it's more like a beta release to me due to mentioned wrong offsets in Warp and saturated multiplication by 6 at the end of Sobel that i don't like at all. |
6th June 2009, 23:40 | #16 | Link | |
Avisynth Developer
Join Date: Jan 2003
Location: Melbourne, Australia
Posts: 3,167
|
Quote:
|
|
7th June 2009, 00:12 | #17 | Link |
Registered User
Join Date: Aug 2007
Posts: 374
|
The code reads 3 lines of one frame while writing one line to another frame in simple loop. edi is global counter increased by 8 that is used as offset for all sources and destination. It's not cache line split problem as [edi+pitch*1+1] and [edi+pitch*0-1] are ok but not [edi+pitch*1-1], also the order of impact is too big and on Nehalem such penalties are small. It seems like some kind of cache (address?) conflict as the descriptions of performance counters that produce spikes:
REPL - Counts the number of lines brought into the L1 data cache. M_REPL - Counts the number of modified lines brought into the L1 data cache. M_EVICT - Counts the number of modified lines evicted from the L1 data cache due to replacement. M_SNOOP_EVICT - Counts the number of modified lines evicted from the L1 data cache due to snoop HITM intervention. But the code linearly reads from one location and linearly writes to another in simple loop. EDIT: It's indeed seems like cache address conflict as lower 16 bits of [edi+pitch*1] and [output] are the same, but it doesn't give me any idea how to fix it besides caching [edi+pitch*1-1] from previous iteration (as both memory locations are what i get from avisynth). EDIT2: And it seems to be L2-3 problem with scenario something like: Cache lines in L1 are allocated independently, but when output L1 line is written to L2+ it mistakes next reference to [edi+pitch*1-1] as accessing the same location for that single -1 byte which results in the L1 cache lines ping-pong hell as seen by counters. Last edited by SEt; 7th June 2009 at 00:48. |
7th June 2009, 11:10 | #19 | Link | |
ffdshow/AviSynth wrangler
Join Date: Feb 2003
Location: Austria
Posts: 2,441
|
Quote:
np: Plastikman - I Don't Know (Closer)
__________________
now playing: [artist] - [track] ([album]) |
|
7th June 2009, 15:32 | #20 | Link |
Avisynth Developer
Join Date: Jan 2003
Location: Melbourne, Australia
Posts: 3,167
|
@Leak <- ,
I asked because movq xmm7, qword ptr [edi+pitch*1-1] does not appear in the code, but movq xmm7, qword ptr [edi+ebx-1] does appear in the code. And when people want help, I like to make sure nothing is clouding the issue. |
Thread Tools | Search this Thread |
Display Modes | |
|
|