Avisynth+ [Archive] - Page 4

View Full Version : Avisynth+

Pages : 1 2 3 [4] 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112

ultim

16th October 2013, 14:59

I'm at work, but I'll post a fixed build very soon, I'll be home in a few hours. The bug is only in the most recent build from okt.15. and it mistakenly flags TemporalSoften to be unavailable on 32-bits.

zero9999

16th October 2013, 20:33

(removed build)

ultim

16th October 2013, 22:01

And we also get an ICC-build :) Thanks Zero9999!

But for the sake of completeness (or more like coz I've promised) I also updated the release in the first post. It also contains the fix for the typo ofc, and a small number of other things only important for developers (details in the GitHub commit log).

TurboPascal7: I have not forgotten about alignment.h, changes will come next time.

Groucho2004

16th October 2013, 22:12

x86 ICC14 build
That's slower than the VS8 build from ultim. Did you use PGO? Did you read the Intel optimization guides?
Anyway, my experience with Intel compilers is that they get slower with each new version.

zero9999

16th October 2013, 22:57

That's slower than the VS8 build from ultim. Did you use PGO? Did you read the Intel optimization guides?
Anyway, my experience with Intel compilers is that they get slower with each new version.

it was a little faster for the first script i tested with, but using a test script with only the internal functions/filters, it's indeed a little slower. I didn't have much luck with PGO so far, it made the build much slower.

ARDA

17th October 2013, 02:08

@ultim;
First of all thanks for the great effort you are doing.

That's slower than the VS8 build from ultim. Did you use PGO? Did you read the Intel optimization guides?
Anyway, my experience with Intel compilers is that they get slower with each new version.

it was a little faster for the first script i tested with, but using a test script with only the
internal functions/filters, it's indeed a little slower. I didn't have much luck with PGO so far,
it made the build much slower.

I don't know if this is the right thread, I have no intention to pollute it.
But as we are going to have several and different compiled versions of avisynth around, I would like to arrive
to a common benchmark tool and conditions, so we can compare.

@Groucho2004 and zero9999;
Will you please post your benchmark methods with most possible details ?

Related to Gpo or cpu dispatch options in compilers, I had no good experience with avisynth
in that matter. I think the question is that in many in built functions in avisynth mmx, isse
sse2, sse3 and ssse3 inline asm code cannot be touched by the compiler, so just c++ will be
optimized or cpu dispatched which obviously will improve almost nothing for that cases.
Some years ago I tried with intel compiler 10 but never could get any good result, but remember
also (don't know about newer versions) that in some cases when those codes run in a AMD
cpu were derivated to a general not optimized library.

Thanks ARDA

TurboPascal7

17th October 2013, 02:19

There is not much point in compiling avs+ with ICL right now. However it will be "a must" for optimal performance after some code gets ported to intrinsics as VC tends to do a pretty bad job here. I also can confirm that auto-vectorization does not and probably will never help avs core much.

Next, I do not believe that there should be several differently compiled version of avisynth flying around and I would like to ask other people to refrain from posting binaries at least for now as it might just create additional confusion. In the end I believe the best solution would be to provide one binary, built with ICL, as code it generates for intrinsics seems to be faster than VC even on AMD processors. Getting ICL is not hard even if you are not a student.

As for benchmark methods, I think AvsMeter is pretty much the standard tool right now.

ARDA

17th October 2013, 02:35

Next, I do not believe that there should be several differently compiled version of avisynth flying around
I would like to ask other people to refrain from posting binaries at least for now as it might just create
additional confusion. In the end I believe the best solution would be to provide one binary,
built with ICL, as code it generates for intrinsics seems to be faster than VC even on AMD processors.
Getting ICL is not hard even if you are not a student.

I second

...but remember also (don't know about newer versions) that in some cases when those codes run in a AMD
cpu were derivated to a general not optimized library.

I was talking about this http://www.agner.org/optimize/blog/read.php?i=49#49

Thanks

TurboPascal7

17th October 2013, 02:38

This is not relevant here because avisynth doesn't use any Intel libraries and will not rely on automatic vectorization in the future. Better register allocation and instruction ordering that ICL offers for intrinsics is still better even on AMD processors (but probably not optimal).

ARDA

17th October 2013, 02:48

This is not relevant here because avisynth doesn't use any Intel libraries

I don't understand. When you compile with intel compilers, just to mention a few by heart it has it own
memcpy, memset, memmcmp, all strings libraries, etc, etc, etc.
I don't know if that legal question was solved and it is not present in nowadays versions.
I don't want to pollute this thread, so for me this subject is enough discussed.

I'll be waiting Groucho2004 answer about benchmark methods,

Thanks

Groucho2004

17th October 2013, 02:57

in some cases when those codes run in a AMD cpu were derivated to a general not optimized library.
This only applies if you use automatic CPU dispatch optimizations (Qax..). I only use Qx... which "hardcodes" the supported instruction set into the binary. For example, if you use "QxSSE2", you need at least a P4 or a Athlon64.
Also, the maximum optimizations don't always ensure maximum performance. It's a lot of work to optimize a binary (especially with the Intel compiler) but sometimes you get pretty decent gains.

Groucho2004

17th October 2013, 03:04

I'll be waiting Groucho2004 answer about benchmark methods
Here goes:
I used this (http://www.mediafire.com/?8g500sjm9tr575a) script (which I also use for PGO) and AVSMeter.
The results are (i5 2500K @ 4GHz):
Ultim's latest DLL: 29.73 fps
Official Alpha5 DLL: 28.95 fps
My Alpha5 ICL10.1 DLL: 30.02 fps
zero9999's ICC14 DLL: 24.56 fps

I'm aware that the script does not cover all internal functions and I'd be happy about suggestions to improve it.

ARDA

17th October 2013, 03:22

Thanks Goucho2004, I'll work with it and if I have some suggestion I will post here

ARDA

ultim

17th October 2013, 10:10

Theoretically there shouldn't be any difference between the runtime performance of Avs+ and Avs yet. Any current differences are probably due to different compilers. While some of my current changes might improve running time marginally (removed allocations, less exceptions), I have not yet done anything that was specifically aimed at increasing performance.

Edit: The only difference in speed should be initialization/loading time, where I did make conscious improvements.

ARDA

19th October 2013, 02:07

- The "crop" function now defaults to aligned crop. You can still controll alignment using its second
parameter, but if you omit it the default is now for the new frame to be aligned.
This is important for plugin authors so that they can have a stronger alignment guarantee, in the end leading
to faster processing in multiple plugins.

Allow me please to point some issues.
I know it is not an important subject, but just a warning for plugin developers but mainly for users.
Take following code just as an example.

MPEG2Source("D:\TRABAJO32BITS\MALENA\MALENA.d2v")# 720 x 576, use your source

AvsTimer(frames=1000, name="ANYONE",type=3, frequency=2000, total=false, quiet=true)#use your cpu frequency

#crop(8,72,700,432,align=true)
#or
crop(8,72,700,432,align=false)

#BicubicResize(640,352) # or anyother sizes and resizers, and many filters as well
#or # just two cases I've tested.
flipvertical() # as flipvertical that in fact is just a bitblt

AvsTimer(frames=1500 ,name="ANYONE",type=3, frequency=2000, difference=1, total=false)

Which one of crops do you think is faster? At first sight we should say with align=true, but
it isn't; the fastest one is with align=false, between 3% till 20% faster with resizers and more than
40% faster with filpvertical(). Why?
Simple with crop + align=true in fact there are two movements of the whole frame, first a bitblt of align=true
and second to the new created frame by resize function, or just another bitblit with flipvertical.
Obviuos this situation only will happen if following filter in the chain inmediately after crop creates
a new frame. And from that frame onwards we shall have again an aligned buffer.

There are a few filters that could work in place under the appropiate conditions and in that case
an align=true maybe could be better. I've still some doubts, I didn't test.
(one by heart, Isse code of Tweak by dividee, but probably many others)

So I don't say align=false is better, just that final user should be aware how next filter in the
chain after the crop works. Is that possible? Probably not. At least I should add a WARNING
in documentation for users. IF YOU RESIZE IMMEDIATELY AFTER A CROP, ALIGN=FALSE COULD BE FASTER.

As far as plugin developers, I would never trust that the source (last) you are receiving is always
aligned b16 for sse2 or b32 for new AVX. So I would check always, except that the only posibility
to have misligned data were with crop align=false, I don't remember and didn't check.?

I hope this can be useful. ARDA

TurboPascal7

19th October 2013, 02:26

Yes, without a doubt, unaligned crop is faster in some (most) cases. Sometimes significantly so. Should a user know about it? Probably. Does it matter? Not really. You don't use crop often in a single script and even on my Core i7 860 system (stock clocks, 4 years old), aligned crop at 1080p runs over 2000fps. Hardly a noticeable impact for me as my scripts rarely run faster than 1fps in the end.

As for developers - yes, one should always assume that he can get any kind of data. But one also can assume that most of the times data he will receive will be aligned. This means that he can omit unaligned routines completely, dispatching all such cases to plain C implementation.

For example in the core filters recommended dispatch looks like this:
if ((env->GetCPUFlags() & CPUF_SSE2) && IsPtrAligned(srcp, 16)) {
process_sse2(some args);
} else
#ifdef X86_32
if (env->GetCPUFlags() & CPUF_MMX) {
process_mmx(some args);
} else
#endif
process_c(some args);

Groucho2004

19th October 2013, 09:38

Hardly a noticeable impact for me as my scripts rarely run faster than 1fps in the end.
Ouch. You must have terrible sources if they require this kind of filtering.

ultim

19th October 2013, 11:21

As Turbo pointed it out, what is important is that plugin developers can know that in the majority of the cases, they will get aligned data. Of course they should not rely on this, but they should definitely optimize for it. Crop is a relatively cheap filter, little more than a bitblt, and the performance that is lost by copying a frame one extra time should be lower than what you'd loose if you ran a more complex filter in its unoptimized version.

ultim

20th October 2013, 21:27

Here goes:
I used this (http://www.mediafire.com/?8g500sjm9tr575a) script (which I also use for PGO) and AVSMeter.
The results are (i5 2500K @ 4GHz):
Ultim's latest DLL: 29.73 fps
Official Alpha5 DLL: 28.95 fps
My Alpha5 ICL10.1 DLL: 30.02 fps
zero9999's ICC14 DLL: 24.56 fps

I'm aware that the script does not cover all internal functions and I'd be happy about suggestions to improve it.

Out of curiosity, I ran your script through a profiler, and have found that over 40% of time was spent in text antialiasing, more specifically in Antialiaser::GetAlphaRect().

Groucho2004

20th October 2013, 21:49

Out of curiosity, I ran your script through a profiler, and have found that over 40% of time was spent in text antialiasing, more specifically in Antialiaser::GetAlphaRect().
I know, it's far from a good source for profiling. It does however have a bunch of functions but apparently their CPU usage pales in comparison to the text rendering. I guess I should just use "blankclip" as a source.

Mystery Keeper

20th October 2013, 22:03

Text anti-aliasing as separate procedure? Why don't you use Anti-Grain Geometry library and render text already anti-aliased?

ultim

20th October 2013, 23:44

I know, it's far from a good source for profiling. It does however have a bunch of functions but apparently their CPU usage pales in comparison to the text rendering. I guess I should just use "blankclip" as a source.

Basically anything that doesn't dominate over all the other filters this much. Also, maybe text rendering could be kept, only less text/smaller area for text.

Text anti-aliasing as separate procedure? Why don't you use Anti-Grain Geometry library and render text already anti-aliased?
Antialiaser::GetAlphaRect() is just part of the antialiasing. There might be other libraries out there for this purpose (anti-grain, cairo, freetype etc.), but the code for this is not much, it worked just fine, and at least there was no need to link to yet another library. We'll need to link to one when we go cross-platform though, since we're currently relying on GDI for the actual drawing.

qyot27

21st October 2013, 06:43

Just because I've been curious about this since the rewritten demuxer was committed this past March, can the 64-bit build of AviSynth+ (or heck, the old builds of AviSynth64) work correctly* with Win64 builds of FFmpeg that were compiled with --enable-avisynth? I have no 64-bit Windows setup to test with, so I've never bothered to actually check.

*obviously defining 'work correctly' in the context of what's actually available in said 64-bit builds of AviSynth(+). Even if it's just Version() or simple file loading, that's enough.

l33tmeatwad

21st October 2013, 16:57

ultim

22nd October 2013, 09:00

I'm giving Avs+ a Boost (pun). It has libraries that are really useful for Avs, so much that in fact many have been included in the new C++11 standard from there. I guess you could ask, why not just use the corresponding standard headers then? First, not all are in C++11. But much more importantly, even those that are, are not all supported by Visual Studio, or only supported by Visual Studio 2013. And I don't want to limit developers to use only the newest VS that been released literally a week ago. In the far future, when C++11 support has become widespread enough among people, in another 2-3 iterations of VS maybe, I'll be glad to switch from Boost to the standard headers, but until then, Boost will make sure we get the appropriate support on every compiler. Below is a list of libraries that could prove really usefull. Many of them are non-trivial, but Boost has well-tested, time-proven, maintained, and portable implementations of them.

Boost libs that I've set my eyes on: Atomic, Smart Ptr, Filesystem, Thread, Lockfree.
Other Boost libs that can come in handy: Locale, Unordered, Log, Static Assert, and some others.

zerowalker

23rd October 2013, 10:49

Just to make sure, it does not speed up scripting itself, meaning for example QTGMC?

TurboPascal7

23rd October 2013, 10:52

Right now - no. To speed up something like QTGMC one will either need faster plugins or maybe good multithreading. Plus faster plugins.

zerowalker

23rd October 2013, 10:56

Okay then i got things right, but will this take over the Avisynth development?
Cause seem to be quite many branches going on, as well as Vapoursynth, which however is supposed to superseed Avisynth eventually if i understand things correctly.

TurboPascal7

23rd October 2013, 11:10

Vapoursynth is an entirely different world that some people are not completely happy about (including the author of this fork (http://forum.doom9.org/showpost.php?p=1644133&postcount=17) and myself for example).

As for other Avisynth branches - Avxsynth and Avs64 are effectively dead. Avisynth MT is alive but it's not a real "branch" of avs and it does not get much development and it won't "superseed" Avisynth.

As for avs+ taking over the avisynth development - this depends on your understanding of "taking over". No one is going to assassinate IanB and I'm pretty sure he will be maintaining the official avisynth branch in the future. Whether the official branch will continue to be the most widespread version is another question. We'll see.

zerowalker

23rd October 2013, 11:15

I see, had quite some positive thoughts of Vapoursynth as it seems good in itself (Avisynth is old, time to renew so to speak).
But well it´s always more than meets the eye.

Good to know, and as you end up, it´s those 2 i am concerned, IanB and avs+.

Like to have 1 to follow, but guess i will have to wait to see if if one of these will "win" the war, or if it will continue to be 2 independent branches.

Thanks for the info.

ultim

25th October 2013, 09:04

Avisynth Plugin Writing Tips #2: Parallel execution

Avisynth-MT and VapourSynth both support multithreading, and it is being implemented in Avisynth+ too. All of them require the same things from your plugin. Here is a list of what you as a plugin author can do to support execution on multiple threads. If you are the author of any Avisynth plugin, please update your filter according to these rules if needed. Doing so will make sure your plugin can execute seamlessly when multithreaded. Furthermore, following these rules will not only guarantee correct execution in multihtreaded environments, it will also provide optimal mulithreaded performance.

Short list for those on the run:
- Unless you have the slowest filter in the world, don't start threads in your plugin.
- Never use global or static variables. In addition, your filter class should only have read-only members which are initialized during construction.
- Don't reuse the IScriptEnvionment pointer between method executions.

And again, the same points with a bit of more explanation:
- In general, do not slice up your frame and start multiple threads on your own. Threading has its own performance overhead, and it is only worth doing it manually if your filter takes a lot of time to execute. And even if your filter is extremely slow (like fft3dfilter), you should try to optimize its single-threaded performance (by using SIMD instructions or choosing a more efficient algorithm) rather then manually threading it. Optimize for single-threaded performance, and as long as you follow the other rules below, you will get automatic and correct multithreading from Avisynth.
- Do not cache frames yourself. You might think it is efficient because you won't have to request/compute them in the next frame, but you are wrong, and there are several reasons why. First, if you write your own cache, you will have to introduce a global state to your filter, which means you will have to take care of synchronization between multiple threads too, which is not easy to do efficiently with caches. Second, keeping copies of past frames also means there will always be multiple references to them, thus Avisynth cannot pass them as write pointers to other filters, and will have to do an extra copy of it more often. And last but not least, Avisynth has a very extensive caching mechanism, and if you request the same frame multiple times (even when you need it for different frame requests), chances are you will get it for free anyway, so your own caching is just pure overhead.
- As a general extension to the previous rule, try not to keep any state between frames. In the optimal case your filter class should only have read-only members which are initialized during construction. Surely this is not always possible with every algorithm, but most times it is, and this is what you should strive for.
- As stated before, for best multithreading always try to implement algorithms which require no state between frames. Whenever this is violated, be sure to group reads and writes to the state (do not spread them), and guard them in critical sections (as few and as short as possible). For example, it is a good practice to copy all your writable class variables (in a critical section) at the start of each frame into local stack variables, compute the whole frame outside of the critical section (updating the local variables that captured the global state as needed), then write them back together at the end of your frame in another critical section. Do not request an automatic lock around your whole filter from Avisynth, because it will serialize your filter's execution.
- If you have class variables that must be writable in every frame, you will also have to keep in mind that Avisynth does not guarantee that frames will be processed in their natural order. Just a reminder.
- Only store variables in classes and in method stacks. Per-frame heap allocations should be avoided, because they can act as implicit synchronization points between threads. And most importantly, never store anything in static variables or in the (global) namespace scope. Read the previous sentence a few more times.
- Do not store the IScriptEnvironment pointer anywhere yourself (except locally on the stack), and never reuse those pointers outside of the methods where they were supplied to you. Not even between different executions of the same method! There is a reason why you get that pointer separately for each method, which is that it may be different every time, especially in multithreaded scenarios. If you reuse it, the consequences will be different between every implementation, but you can get anything from race conditions to program crashes.

Robert Martens

25th October 2013, 17:50

Do not store the IScriptEnvironment pointer anywhere yourself (except locally on the stack), and never reuse those pointers outside of the methods where they were supplied to you. Not even between different executions of the same method! There is a reason why you get that pointer separately for each method, which is that it may be different every time, especially in multithreaded scenarios. If you reuse it, the consequences will be different between every implementation, but you can get anything from race conditions to program crashes.

I have to own up to being a dummy here and having done that, thinking the rule only applied to storing it as a global variable. I had made the--admittedly careless--assumption that storing the environment pointer as a non-static class variable would be safe since each instance of the host would have its own instance of my filter class. I take it I'm supposed to understand that multiple instances of a host application might try to access a single instance of my filter?

Only the slightest bit more consideration put into my problem and I've eliminated what I thought was the need to store the pointer, so it's no big deal for me to bring my code into compliance (luckily I work so slowly I haven't released a version with this problem included), but for the sake of curiosity I thought I'd ask.

Myrsloik

25th October 2013, 19:40

I have to own up to being a dummy here and having done that, thinking the rule only applied to storing it as a global variable. I had made the--admittedly careless--assumption that storing the environment pointer as a non-static class variable would be safe since each instance of the host would have its own instance of my filter class. I take it I'm supposed to understand that multiple instances of a host application might try to access a single instance of my filter?

Only the slightest bit more consideration put into my problem and I've eliminated what I thought was the need to store the pointer, so it's no big deal for me to bring my code into compliance (luckily I work so slowly I haven't released a version with this problem included), but for the sake of curiosity I thought I'd ask.

Always pass the environment pointer as an argument to your functions. It's a simple rule. Because Avisynth isn't multithreaded you'll never have two threads accessing your filter at once. However it's still a very bad practice. (not sure exactly what the MT side does to get around it)

Note that this issue affects everyone trying to use Avisynth plugins. For example RemoveGrain commits the biggest sin of all. It stores the environment pointer as a global variable when AvisynthPluginInit2 is called. This makes VapourSynth, which creates fake Avisynth environments only when needed, crash. A lot. Since the environment pointer will be different between plugin instances and registering functions.

I'd like to add another filter writing tip myself.

ONLY call AddFunction in AvisynthPluginInit2. Do not throw errors. DON'T DO ANYTHING ELSE THERE. Do initialization when one of your functions is called the first time.

It's the biggest reason for autoloading "breaking" in avisynth and a pile of other issues.

ultim

25th October 2013, 20:34

I take it I'm supposed to understand that multiple instances of a host application might try to access a single instance of my filter?

Almost. Only one host application will use your filter instance, but that single host application (the Avisynth library to be precise) can use different IScriptEnvironment objects depending on which thread your filter instance is running on. Avisynth-MT, for example, uses special IScriptEnvironments instances to implement thread-local storage (TLS), and I'm implementing something similar in Avs+ too. Apparently, VapourSynth makes use of fake IScriptEnvironments objects too. The thing about TLS objects are, well, that they are local to the thread, so if the given multithreading implementation (for example Avs-MT mode 1) creates just one filter instance for all threads, your filter will get different IScriptEnvironment pointers based on the thread it is running on. Because you cannot know in your filter what threading implementation of Avisynth is in use, it is best not to store the environment pointer between calls at all, and only then will you be safe in every case.

Because Avisynth isn't multithreaded you'll never have two threads accessing your filter at once.
Nonononono, forget what Myrsloik said here. This statement can be very misleading, because it is only true for some versions of Avisynth (like classic non-threaded version). The point of this discussion is to let plugins prepare for multithreaded versions (and/or multithreading modes). An important point of my tips is that if you can, you SHOULD assume that multiple threads will be accessing your filter at once, as it really can happen, even with existing implementations.

... in AvisynthPluginInit2. Do not throw errors.
A little bit off-topic (as far as multithreading issues go), but for best practices about exceptions, look at this tip (http://forum.doom9.org/showthread.php?p=1647262#post1647262).

Robert Martens

25th October 2013, 20:53

Thanks for the detail, that clears everything up! I've already eliminated the broken code and pushed the fix, now I just need to sort out one spot in my test executable where I'm catching an AvisynthError to grab its error message and I should be done with your list of Don'ts for the time being.

Stereodude

26th October 2013, 05:40

While I don't have anything technical to add, I would like to thank ultim and others for their efforts to modernize AVIsynth by dragging it kicking and screaming into this decade.

ajp_anton

29th October 2013, 17:45

If there's room for feature requests, can we skip some arguments when calling functions like this:
some_function(a,b,,d,e)
if I don't want to change the default value for the third argument, and don't want to explicitly use the names of the 4th and 5th arguments.

Gavino

29th October 2013, 20:21

If there's room for feature requests, can we skip some arguments when calling functions like this:
some_function(a,b,,d,e)
You can get that effect now (more or less) by using (from v2.60):
some_function(a,b,Undefined(),d,e)
or if you prefer
global _ = Undefined()
...
some_function(a,b,_,d,e)

I don't like the idea of allowing some_function(a,b,,d,e) because a user could easily put an extra comma by mistake and not realise he wasn't getting what he expected.

l33tmeatwad

30th October 2013, 14:56

Came across something interesting when testing things....if you have fft3dfilter (version 2.1.1) in a script twice and the bw & bh settings in one is 3x or more than the other then there is a green and red bar at the top. This odd glitch only happens in AviSynth+ and does not show up when using AviSynth 2.6 a5. Figured this was worth mentioning, I haven't tested this on another machine yet.

cwk

30th October 2013, 17:10

I also get the green bar when using the fft3dfilter, although I see it with just a single placement of the tool within a script. I am working around it for the time being by merging chroma back in at the end. I'll admit this means you can't use fft to filter your chroma.

TurboPascal7

30th October 2013, 21:15

Please post the minimal script you can reproduce the issue with (I couldn't). Also, what colorspace and what resolution you're working with?

cwk

30th October 2013, 22:43

I can see it with the following script

avisource("1_fields.avi")
converttoyv12()
fft3dfilter()

The source file is huffyuv, 688 x 476. I also see the lines if I resize to 704x480 or 720x480. fps is 59.940, if that matters.

TurboPascal7

30th October 2013, 22:57

Reproduced, thank you.
----
Okay, it's a bug in env->BitBlt.

fft3d calls it in CoverbufToPlanarPlane

for (h=0; h<dst_height; h++)
{
env->BitBlt(dstp, dst_pitch, coverbuf1, coverpitch, dst_width, 1); // copy pure frame size only
dstp += dst_pitch;
coverbuf1 += coverpitch;
}
That in BitBlt, passing the height == 1 condition, goes to
#ifdef X86_32
if (GetCPUFlags() & CPUF_INTEGER_SSE)
memcpy_amd(dstp, srcp, src_pitch*height); // SSE
else
#endif
memcpy(dstp, srcp, src_pitch*height); // fallback
}
which in any case ends up broken because you actually have to copy row_size bytes instead of src_pitch bytes in this case.

So either we could remove the height == 1 check, replace src_pitch with row_size (they're equal when height != 1) or fix fft3d to call BitBlt once for the whole frame instead of every line.

Original Avisynth actually does it right:
if (height == 1 || (src_pitch == dst_pitch && dst_pitch == row_size)) {
memcpy_amd(dstp, srcp, row_size*height);
} else {
asm_BitBlt_ISSE(dstp,dst_pitch,srcp,src_pitch,row_size,height);
}

ultim

31st October 2013, 23:47

Okay, it's a bug in env->BitBlt.

You might be right. Actually I'm greatly annoyed by Avisynth's BitBlt, IMHO it is a messed up implementation, but when I "fixed" it last time, it broke some plugins doing interleaved copy of non-interleaved frames. Even today I still don't know if the ability to do such copies is intentional, or is it an accident that plugins started to rely upon. Anyway, I changed "row_size*height" to "src_pitch*height" because I thought it is much more logical, thinking they are the same anyway because of the condition row_size==dst_pitch==src_pitch that guards it. Here is where I made the error: I accidently overlooked that this last condition need not be true if height==1. Fix in next release. And thx for looking into this for me.

edit: added clarification of what plugins got broken.

Guest

1st November 2013, 00:16

TurboPascal7

1st November 2013, 00:28

"row_size==dst_pitch==src_pitch"

Are you sure about that? I was under the impression that a Crop can change that. It's always been a common error with Avisynth plugins to assume that they are the same.
This is irrelevant. It's simply a condition (https://github.com/pylorak/avisynth/blob/master/avs_core/core/bitblt.cpp#L269). If it isn't true - the code will go the other way, that is all.

Guest

1st November 2013, 00:33

Thanks for the lesson on coding but as a condition "row_size==dst_pitch==src_pitch" is nonsense.

ultim

1st November 2013, 00:37

"row_size==dst_pitch==src_pitch"

Are you sure about that? I was under the impression that a Crop can change that. It's always been a common error with Avisynth plugins to assume that they are the same.

Of course that equality needs not be true, not just because of crop, also because of alignment. But the piece of code in question here is guarded by an "if" to be able to take a more optimal path. It does not actually require that row_size==dst_pitch==src_pitch, if that is unmet, it will take a slower path.

Edit: I do find it a nonsense that the code is therefor optimized for the minority of the cases, which is (only) part of the reason I hate the BitBlt implementation. Fixing that though is not trivial due to existing plugins relying on the current implementation.

TurboPascal7

1st November 2013, 00:40

neuron2, no problems, don't hesitate to ask. By the way, this condition is also present in the original avs core (https://github.com/jeeb/avisynth/blob/master/src/core/avisynth.cpp#L1904). So you're a bit late with the nonsense part.

Guest

1st November 2013, 00:42

LOL. I don't actually need any help or guidance from you in SW design or coding. The stated condition was nonsense in the post that I responded to. Let it go, friend, or take it to PM.