Doom9's Forum - View Single Post

TurboPascal7 · 24th November 2013, 21:53

A few notes on the porting effort, asm and future plans.

1) Myrsloik did not lie - a lot of asm in the core was quite terrible. We actually had to remove HorizontalReduceBy2 YUY2 ISSE implementation because it was slower than the C code. There were some quite good MMX routines though (SSE2 was awful everywhere but resizers). Resizers were good.

2) As mentioned in the first pull request, the general rule was "not slower than original on Nehalem+ CPUs". We did not test on any older CPUs. Expect performance to get a bit worse on Pentiums and I'm not sure about some memory-bound filters on Core 2. Please report if you experience a noticeable performance drop in the core filters on Core 2 level CPUs. We will not be spending a lot of time optimizing for pre-Nehalem CPUs though.

3) All filters now have C versions so you can run them on super ancient CPUs. It will also help non-x86 platform support.

4) All filters now have SSE2 versions. This for example means up to two times faster TemporalSoften. Some also got SSSE3 and SSE4.1 optimizations. You can find which one in the commit messages of the pull requests.

5) There are some behavior changes: TemporalSoften mode 1 is removed, mode parameter is simply ignored. Blur MMX parameter and Tweak SSE parameter are also ignored.

6) MMX optimization routines are dropped if there is a faster ISSE version. This affects only a few filters and some extremely old CPUs.

7) Code from FTurn is now integrated into the core (with some additional optimizations and new RGB32 routines), making the plugin obsolete.

8) We did not port MMX code of any audio filters. We won't do this any time soon, feel free to contribute.

9) Resizers are implemented as VerticalResizer().Transpose().VerticalResizer().Transpose() instead of two separate routines for vertical and horizontal resizing. Some rounding differences are possible, although not noticeable. This improves performance in most test cases and simplifies implementation quite a bit.
YUY2 resizer is also implemented as ConvertToYV16().Resize().ConvertToYUY2(). This does not affect performance in any way on the CPUs we were working on. Conversion is lossless and extremely fast.

Speaking of YUY2: support of this color format will be dropped in all external filters we port. We don't have unlimited time or will to work on something that no one uses. You can always process YUY2 with planar filters by converting to YV16 and back.

Now, what next's? Our part of the team will be slowly improving useful external filters to make them work with MT and other platforms in the future. Feel free if you want to help, but please write about it beforehand so we don't end up porting the same filter twice.

Also, I'd like to hear if any of the authors are willing to update their plugins themselves. This includes future MT registration, inline asm removal and later - other platforms support.

EDIT: A note for ICL users. In our tests, ICL14 tends to generate slower code for some filters, e.g. resizers. You might want to check it before publishing any dlls (which I still recommend you NOT to do). I don't know if the same applies to older versions and I don't know what's the older version you can compile avs+ with. For VS it's vc100 right now (2010).

24th November 2013, 21:53	#300 \| Link
TurboPascal7 Registered User Join Date: Jan 2010 Posts: 270	A few notes on the porting effort, asm and future plans. 1) Myrsloik did not lie - a lot of asm in the core was quite terrible. We actually had to remove HorizontalReduceBy2 YUY2 ISSE implementation because it was slower than the C code. There were some quite good MMX routines though (SSE2 was awful everywhere but resizers). Resizers were good. 2) As mentioned in the first pull request, the general rule was "not slower than original on Nehalem+ CPUs". We did not test on any older CPUs. Expect performance to get a bit worse on Pentiums and I'm not sure about some memory-bound filters on Core 2. Please report if you experience a noticeable performance drop in the core filters on Core 2 level CPUs. We will not be spending a lot of time optimizing for pre-Nehalem CPUs though. 3) All filters now have C versions so you can run them on super ancient CPUs. It will also help non-x86 platform support. 4) All filters now have SSE2 versions. This for example means up to two times faster TemporalSoften. Some also got SSSE3 and SSE4.1 optimizations. You can find which one in the commit messages of the pull requests. 5) There are some behavior changes: TemporalSoften mode 1 is removed, mode parameter is simply ignored. Blur MMX parameter and Tweak SSE parameter are also ignored. 6) MMX optimization routines are dropped if there is a faster ISSE version. This affects only a few filters and some extremely old CPUs. 7) Code from FTurn is now integrated into the core (with some additional optimizations and new RGB32 routines), making the plugin obsolete. 8) We did not port MMX code of any audio filters. We won't do this any time soon, feel free to contribute. 9) Resizers are implemented as VerticalResizer().Transpose().VerticalResizer().Transpose() instead of two separate routines for vertical and horizontal resizing. Some rounding differences are possible, although not noticeable. This improves performance in most test cases and simplifies implementation quite a bit. YUY2 resizer is also implemented as ConvertToYV16().Resize().ConvertToYUY2(). This does not affect performance in any way on the CPUs we were working on. Conversion is lossless and extremely fast. Speaking of YUY2: support of this color format will be dropped in all external filters we port. We don't have unlimited time or will to work on something that no one uses. You can always process YUY2 with planar filters by converting to YV16 and back. Now, what next's? Our part of the team will be slowly improving useful external filters to make them work with MT and other platforms in the future. Feel free if you want to help, but please write about it beforehand so we don't end up porting the same filter twice. Also, I'd like to hear if any of the authors are willing to update their plugins themselves. This includes future MT registration, inline asm removal and later - other platforms support. EDIT: A note for ICL users. In our tests, ICL14 tends to generate slower code for some filters, e.g. resizers. You might want to check it before publishing any dlls (which I still recommend you NOT to do). I don't know if the same applies to older versions and I don't know what's the older version you can compile avs+ with. For VS it's vc100 right now (2010). __________________ Me on GitHub \| AviSynth+ - the (dead) future of AviSynth Last edited by TurboPascal7; 25th November 2013 at 00:06.