Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
13th August 2014, 20:10 | #1 | Link |
Beyond Kawaii
Join Date: Feb 2008
Location: Russia
Posts: 724
|
[VapourSynth] DctFilter
DCT Filter for VapourSynth r3
Source code Only C routines are implemented now, so relatively very slow. Usage: dct.Filter(clip clip, float factors[8]) Performs DCT on 8x8 blocks of source clip, applies modification to it, then performs IDCT. Modification is done as following: dct(x, y) = dct(x, y) * factor[x] * factor[y] This filter does essentially the same as original Tom Barry's DctFilter for AviSynth, but does it differently. All calculations are done on floating point values, and factors are applied as they are, not rounded. Thus, the accuracy is higher. Padding to mod8 is automatic for every plane, but cropping to non-mod16 values before applying this filter is impractical and shouldn't be done.
__________________
...desu! Last edited by Mystery Keeper; 20th September 2016 at 21:35. |
31st May 2015, 16:15 | #5 | Link |
Registered User
Join Date: Aug 2012
Posts: 203
|
After seeing handmade hero i wanted to port this function to AVX for understanding how optimization works [i don't think i'll do SSE].
A changed all the variable from double to float so that a whole stride can fit into a 256bit register, i hope that this wont change the behaviour of the function this much. Right now i've almost finished the cdct function Edit: did some test on the function fillfactors, went down from 134cycles full unrolled to 16, to bad it's only called once XD Last edited by MonoS; 31st May 2015 at 18:37. |
31st May 2015, 21:06 | #6 | Link |
Registered User
Join Date: Aug 2012
Posts: 203
|
Ooook, i think i converted properly all the function inside croutines.
I had no chance to test it because the dll produced by my copy of codeblocks is not recognized by vs. Notable changes: - All intermediate computation is done in float instead of doubles, less work for the cpu and less work for me - Added a transposed lut, so that during the dct all the 8 values can be loaded in a single instruction - Reworked all the function [except for fillLUT] to use avx instruction [for example, iaca said that fillFactors needed 134 cycles to execute fully unrolled, now it need only 16 cycles, the row loop instead now is only 384 cycles fully unrolled, i don't even imagine how many cycles required before]. Probably there are other places to optimize [clamping perhaps??] but i didn't dig to deep into the code. Hope someone can test this and/or let me know if i made any mistake, i repeat, it's my first time doing simd optimization EDIT: As i expected there are room for other optimization [but not using a profiler, i'll BTW], the old algorithm, before the row/column split may, be faster now with simd and transposed lut. If i have some other spare time i'll try to implement it and if i'll succeed to compile it and test it i'll make some tests Last edited by MonoS; 31st May 2015 at 23:34. |
20th September 2016, 21:39 | #7 | Link |
Beyond Kawaii
Join Date: Feb 2008
Location: Russia
Posts: 724
|
DCTFilter r3 is here with fixed stupid bug causing memory leak.
__________________
...desu! |
Thread Tools | Search this Thread |
Display Modes | |
|
|