Avisynth+ - Page 210

ajp_anton · 7th August 2018, 12:10

When going down in bit depth, dither=-1 means round, right? Or is it floor? Because I see a tendency of everything shifting ever so slightly "down".

If you do the following on a YV24 source (repeat a few times to magnify the effect)

Code:

convertbits(16)
converttorgb(matrix="rec709")
converttoyuv444(matrix="rec601")
convertbits(8)
convertbits(16)
converttorgb(matrix="rec601")
converttoyuv444(matrix="rec709")
convertbits(8)

, interleave it with the original, and histogram("levels"), you should see that on average, Y,U and V values are all going down. If dither=-1 rounds the values, shouldn't the average stay roughly the same?

edit:
dither_bits:
- Has no effect if dither=-1 (off).
- Must be an even number from 2 to bits, inclusive.
- In addition, must be >= (clip.BitsPerComponent-8).
I don't understand why any of these restrictions exist, except for maybe for implementation reasons. Not that I need this functionality, just wondering.

mkver · 7th August 2018, 17:53

Quote:

Originally Posted by qyot27

There was a capi-related commit fairly recently messing with the dllexport/dllimport stuff, but there was no description of what it was supposed to fix.

The reason seems to be a difference between GCC and MSVC regarding dllimport: If I undo pinterf's latest change to capi.h, GCC complains if the definition (in AVSInpaint.c) isn't also declared as dllimport. And apparently MSVC wants only the declaration to have the dllimport attribute, otherwise you'd get an error.

Quote:

Originally Posted by qyot27

It did indeed used to compile just fine with MinGW-w64 regardless of 32-bit* and 64-bit,

You mean, it worked before pinterf's commit? Because if I use this, then the linker doesn't find a lot of symbols, because several functions are not declared as stdcall. Your new version meanwhile works fine.

qyot27 · 8th August 2018, 03:15

Quote:

Originally Posted by mkver

You mean, it worked before pinterf's commit? Because if I use this, then the linker doesn't find a lot of symbols, because several functions are not declared as stdcall. Your new version meanwhile works fine.

I mean that the last time I really checked (a few months ago - April, maybe? Possibly earlier), I could cross-compile AviSynth+ with MinGW-w64/GCC as either 32-bit or 64-bit. I can't speak for plugins, since the only one I really have anything to do with is the FFMS2 C-plugin (the capi.h there merely checks for _WIN32, not MSVC; pretty sure that's an emergency, basal version of the fix I did with the AVSC_WIN32_GCC32 define). I hadn't yet tried to see if that change breaks FFMS2, but I generally had/have a feeling that it might.

Taking a cursory look at AVSInpaint.c, though...ack. That thing needs a cleanup. I can't tell if it's even using the C interface correctly (comparing, as I mentioned above, to the FFMS2 C plugin; that might be a special case, though, and I've not taken any time thus far to see if I can get AssRender building with GCC to verify with it).

manolito · 8th August 2018, 14:44

Quote:

Originally Posted by manolito

Now I am just curious how big the speed sacrifice using this non-SSE2 version vs. the standard version is. I still assume that the various filter plugins will be the bottleneck and not AviSynth itself...

So far nobody wanted to bite, so I did a couple of benchmarks myself...

Not representative statistically, I tried to keep it "real world" as much as I could. Interesting results, and I also have a few questions.

Test platform:
Lenovo T530, Core i5-3230M Ivy Bridge with 8 GB RAM. Pretty much middle class today (of course only entry level for Doom9 members). The CPU has 2 physical cores plus 2 virtual (Hyperthreading) cores.

AviSynth versions (32-bit only):
1: AVS 2.61 Alpha VC6 Build
2: AVS+ r2728 pinterf
3: AVS+ r2741 qyot27 Non-SSE2

Source file:
Downloaded HD clip @ 29.97 progressive.

I used 3 different scripts where the first 2 are common everday scripts, the third one uses DCT=1 with MVTools2 and is way too slow for everyday use.

Script #1:
ConvertToYV12()
DegrainMedian(mode=2)
LSFMod()
Spline36Resize(720,576)
ChangeFPS(25)

Script #2:
ConvertToYV12()
Spline36Resize(720,576)
mx_fps(25) # a modded version of FrameRateConverter by MysteryX

Script #3
Same as above except using
mx_fps(25, dct=1)

For AVS+ I tested both "Prefetch(2)" and "Prefetch(4)" at the end of the scripts. I also used "SetMTMode.avsi" which is linked at the AVS+ WIKI page.

And here comes my first question:
Do I copy this "SetMTMode.avsi" into the "plugins+" folder or into the "plugins" folder? I tried both and did not notice any difference, so I put it in the "plugins+" folder.

I deliberately did not just measure the results of the scripts in AVSMeter because I wanted fo find the overall conversion speeds. My encoder was FFmpeg which uses all 4 cores by default.

Results:

Code:

AVS 2.61 Alpha:
Script #1:    30 fps
Script #2:    15 fps
Script #3:    1.0 fps

Code:

AVS+ r2728 pinterf:
Script #1:    31 fps (no difference between Prefetch(2) and Prefetch(4))
Script #2:    24 fps (Prefetch(2)) and 28 fps (Prefetch(4))
Script #3:    1.6 fps (Prefetch(2)) and 1.5 fps (Prefetch(4))

Code:

AVS+ r2741 qyot27 Non-SSE2
Script #1:    31 fps (no difference between Prefetch(2) and Prefetch(4))
Script #2:    24 fps (Prefetch(2)) and 28 fps (Prefetch(4))
Script #3:    1.6 fps (for both Prefetch(2) and Prefetch(4))

Conclusion:
1. The difference between standard AVS and AVS+ is very obvious, mainly when a complex script like mx_fps (which uses MVTools2 and MaskTools2) gets used.

2. There is almost no difference between the official pinterf version of AVS+ and the Non-SSE2 version by qyot27. On one occasion the qyot27 version is even slightly faster.

Which leads me to my second question:
Could it be that the qyot27 version does use the SSE2 capability of the CPU if the CPU supports it? If not then I would say that using only SIMD versions up to MMX and SSE does not necessarily slow down the conversions, at least not with my tests.

Any thoughts?

Cheers
manolito

qyot27 · 8th August 2018, 17:00

Quote:

Originally Posted by manolito

Could it be that the qyot27 version does use the SSE2 capability of the CPU if the CPU supports it? If not then I would say that using only SIMD versions up to MMX and SSE does not necessarily slow down the conversions, at least not with my tests.

I said almost exactly that when talking about the how the /arch flag works.

Essentially, it works like this. Intrinsics or hand-written assembly code use SIMD instructions directly in discrete versions of a particular function (layer_sse4, layer_sse2, layer_avx, etc.). These are then included in a runtime CPU detection dispatcher which allows the program to select the appropriate one based on what the CPU supports. This always gets compiled, and the functions for the different paths are there no matter what. There are SSSE3 and SSE4 functions for some filters, there are AVX/AVX2 versions for others, all based on what can give the greatest boost/anyone bothered writing.

Most compilers, however, have the ability to optimize the plain C versions during the build process so that the final binary can emit SIMD instructions at any time. MSVC controls this through the /arch: parameter, GCC does it through -march, -mtune, or -m[SIMD] flags used together or alone. These flags make it so that said CPU or SIMD is *required*, because it will use that code even in the parts not covered by the intrinsics or assembly.

So take Mask for example. In AviSynth+, this has multiple versions of the function:

Code:

mask_sse2
mask_core_mmx (not sure if this is actually just a dependency of the mask_mmx function below)
mask_mmx
mask_c

mask_sse2 and mask_mmx are written with intrinsics. They will always be compiled to SSE2 and MMX code, respectively. The dispatcher chooses the appropriate one based on what the CPU supports. If there isn't any support for SSE2, it uses MMX. If it supports neither, it uses the plain C version.

When MSVC has /arch:IA32 set, mask_c (the entire program's plain C code, actually) will be built without any optimizations. Left at its default, though, mask_c (and all the rest of the plain C code) will be optimized by MSVC itself to emit SSE2 instructions when it gets run. Fine for CPUs that have SSE2 already, not fine for ones that don't. This auto-optimization-by-compiler is generally not as thorough or fine-tuned as you'd get from either intrinsics or hand-written assembly, which is why those are still needed to get significant boosts in speed, but for functions which haven't yet had intrinsics or asm written for them, the auto-optimization is the best you can do.

manolito · 8th August 2018, 22:00

Thanks, I think I finally got it...

Quote:

Originally Posted by qyot27

Most compilers, however, have the ability to optimize the plain C versions during the build process so that the final binary can emit SIMD instructions at any time.

So the difference is only for plain C code which gets either converted to SIMD instructions (if the CPU supports it) or not. There should be no performance hit whatsoever using your version vs. using pinterf's version.

So why doesn't pinterf implement your CPU branching routine?

Cheers
manolito

qyot27 · 8th August 2018, 23:04

Quote:

Originally Posted by manolito

So the difference is only for plain C code which gets either converted to SIMD instructions (if the CPU supports it) or not.

No. Think of it like a plain vanilla/yellow cake. You want chocolate with the cake. The SIMD instructions that come from the dedicated intrinsics functions are like chocolate frosting - it's on top, makes it taste better, but if you got a slice and didn't want any of the frosting, you could scrape it off. This is the option that's 'it gets used if the CPU supports it'.

/arch and optimizing even the C parts with SIMD is making the cake itself a marble, or straight-up chocolate, cake - if you want to avoid the chocolate, you can't. It's in there, whether you like it or not. Whether the CPU supports it or not (and if the CPU doesn't support it, it crashes with an Illegal instruction error).

I did absolutely nothing to branch AviSynth+'s CPU support. The only difference between the typical builds pinterf has been providing and the one I posted a little bit ago is that I switched /arch back to SSE before building it, the way it was on the original AviSynth+ repo before ultim went on hiatus again (as you can see, it was last updated in August 2016, which is why pinterf's repo is the current development hub everyone points to now):
https://github.com/AviSynth/AviSynth...eLists.txt#L48

vs.

https://github.com/pinterf/AviSynthP...eLists.txt#L74

All I did in my working branch (https://github.com/qyot27/AviSynthPl...eLists.txt#L74) was make it so that when I go to build AviSynth+, I don't have to open CMakeLists.txt in Notepad2-mod and change it back to SSE. Instead, I can now pass -DCPU_ARCH=SSE (or -DCPU_ARCH=IA32, -DCPU_ARCH=AVX, or -DCPU_ARCH=AVX2) to the CMake command line, like any other configuration option and avoid having to open source files in text editors first.

As for why it hasn't shown up outside of my branch, A) I just whipped up that patch earlier this week or last week, and B) I've not opened a pull request for the changes on that branch yet.

Groucho2004 · 8th August 2018, 23:48

Quote:

Originally Posted by qyot27

/arch and optimizing even the C parts with SIMD is making the cake itself a marble, or straight-up chocolate, cake

Mmmmh, marble cake...

manolito · 9th August 2018, 00:29

Quote:

Originally Posted by qyot27

/arch and optimizing even the C parts with SIMD is making the cake itself a marble, or straight-up chocolate, cake - if you want to avoid the chocolate, you can't. It's in there, whether you like it or not. Whether the CPU supports it or not (and if the CPU doesn't support it, it crashes with an Illegal instruction error).

So your build makes sure the cake itself does not become a marble cake (avoiding a crash when the CPU does not support it). By switching back /arch to SSE you avoid optimizing C parts with SIMD.

But then why the hell is your build just as fast or even a little faster on my test computer which certainly does have SSE2?

qyot27 · 9th August 2018, 01:45

Quote:

Originally Posted by manolito

So your build makes sure the cake itself does not become a marble cake (avoiding a crash when the CPU does not support it). By switching back /arch to SSE you avoid optimizing C parts with SIMD.

Roughly. It avoids using SSE2 SIMD. If you tried using that build on something older than a Pentium-III, it would crash. The only way to fully disable it is by using /arch:IA32, but how many people with a Pentium-II, Pentium Pro, or i486/i386 are going to be running at least Windows XP just to be able to run that AviSynth.dll? Much less actually be using it for anything other than academic 'because I can' points?

The point is that /arch specifies the minimum instruction set the CPU supports, and because of that, it allows the compiler to use SIMD at or below that minimum setting when optimizing the C parts of the code during the build process.

I mean, I could probably throw up a build with all the intrinsics disabled so you'd be forced to use the C versions and see directly how well MSVC optimizes stuff. I think it's just disabling a couple of defines, but I'm not sure.

Quote:

But then why the hell is your build just as fast or even a little faster on my test computer which certainly does have SSE2?

MSVC might optimize for MMX/SSE a bit better in spots for a 32-bit compared to SSE2, but largely it would be because on an Ivy Bridge, you wouldn't be using the C versions of anything much/at all (for either pinterf's build or mine). It might in some non-filter areas that linger in the background, possibly. If I had to guess (based on pinterf's comment in CMakeLists.txt), it is the high bit depth stuff where you would see the biggest difference between the two builds. I'm not sure how much of it has intrinsics now, so there may be a higher proportion of it that has to rely on the compiler doing the optimization on plain C code.

manolito · 9th August 2018, 02:22

Alright, this answers most of my questions, thanks...

Since all the high bitdepth and high colors stuff is not for me (I am just too old for all this UHD / HDR / 8K stuff, my viewing device is a 4:3 CRT TV set with natural colors I so far have never seen on an LCD. And I also refuse to converge computer (for working) and TV (for entertainment) stuff).
So all I am interested in for AVS+ is speed gain caused by MT.

Thanks again
manolito

jpsdr · 9th August 2018, 08:38

There was plasma screen (Pioneer Kuro) which were very good for color, SED/FED died before being born, but now there is OLED, which i think will provide good color for CRT people, as i was. Like you, i've never like LCD, but my Plasma Kuro Pioneer gave me satisfaction, and the day i'll have to replace it (because it will happens, the later i hope), i think OLED will satisfy me.

pinterf · 9th August 2018, 10:49

Quote:

Originally Posted by qyot27

If I had to guess (based on pinterf's comment in CMakeLists.txt), it is the high bit depth stuff where you would see the biggest difference between the two builds. I'm not sure how much of it has intrinsics now, so there may be a higher proportion of it that has to rely on the compiler doing the optimization on plain C code.

Yeah, as I wrote, most of the 10+ bits stuff was in pure C, nowadays most of them are optimized in SIMD intrinsics. Probably 32 bit can go back to the /sse option, because who really need speed (in general and especially for 10+ bit depth option) those are already using x64 toolchain, I guess.

Atak_Snajpera · 9th August 2018, 12:07

I'm just curious why Prefetch with value equal to number of physical cores is faster than with number of logical processors?

It does not matter if it is Xeon 8C/16T or Ryzen 8C/16T. Result is always the same.

Script

Code:

#VideoSource
LoadPlugin("C:\Users\Dave\Documents\Delphi_Projects\RipBot264\_Compiled\Tools\AviSynth plugins\ffms\ffms_latest\x64\ffms2.dll")
video=FFVideoSource("C:\Temp\RipBot264temp\job1\video.mkv",cachefile = "C:\Temp\RipBot264temp\job1\video.mkv.ffindex")
#Deinterlace

#Resize
LoadPlugin("C:\Users\Dave\Documents\Delphi_Projects\RipBot264\_Compiled\Tools\AviSynth plugins\Plugins_JPSDR\Plugins_JPSDR.dll")
video=Spline36ResizeMT(video,1920,1080,SetAffinity=false).Sharpen(0.2)

#Tonemap
Loadplugin("C:\Users\Dave\Documents\Delphi_Projects\RipBot264\_Compiled\Tools\AviSynth plugins\avsresize\avsresize.dll")
Loadplugin("C:\Users\Dave\Documents\Delphi_Projects\RipBot264\_Compiled\Tools\AviSynth plugins\DGTonemap\x64\DGTonemap.dll")
video=z_ConvertFormat(video,pixel_type="RGBPS",colorspace_op="2020ncl:st2084:2020:l=>rgb:linear:2020:l", dither_type="none").DGHable
video=z_ConvertFormat(video,pixel_type="YV12",colorspace_op="rgb:linear:2020:l=>709:709:709:l",dither_type="ordered")

#Prefetch
video=Prefetch(video,X)

#Return
return video

Prefetch(16)

Code:

AVSMeter 2.8.1 (x64) - Copyright (c) 2012-2018, Groucho2004
AviSynth+ 0.1 (r2728, MT, x86_64) (0.1.0.0)

Number of frames:                 7935
Length (hh:mm:ss.ms):     00:02:12.382
Frame width:                      1920
Frame height:                     1080
Framerate:                      59.940 (60000/1001)
Colorspace:                       YV12

Frames processed:               7935 (0 - 7934)
FPS (min | max | average):      0.147 | 1000000 | 30.50
Memory usage (phys | virt):     2099 | 2124 MiB
Thread count:                   65
CPU usage (average):            53%

Time (elapsed):                 00:04:20.122

Prefetch(8)

Code:

AVSMeter 2.8.1 (x64) - Copyright (c) 2012-2018, Groucho2004
AviSynth+ 0.1 (r2728, MT, x86_64) (0.1.0.0)

Number of frames:                 7935
Length (hh:mm:ss.ms):     00:02:12.382
Frame width:                      1920
Frame height:                     1080
Framerate:                      59.940 (60000/1001)
Colorspace:                       YV12

Frames processed:               7935 (0 - 7934)
FPS (min | max | average):      0.339 | 944590 | 33.17
Memory usage (phys | virt):     1591 | 1616 MiB
Thread count:                   57
CPU usage (average):            53%

Time (elapsed):                 00:03:59.189

The same happens with MDegrain2 or QMTC.

Myrsloik · 9th August 2018, 13:49

This is a general answer that applies to most things.

You can easily saturate the total memory bandwidth with fewer than the logical number of threads. Especially (A)VS which processes full frames instead of tiles/lines quickly reach that level. And once you have more than the physical number of cores as threads you have reduced cache too... which means each thread is even more likely to have to access and wait even more for RAM.

It's possible that the default number of threads should be something like max(physical cores, min(logical threads, 8)) for x86.

manolito · 9th August 2018, 16:40

Quote:

Originally Posted by Myrsloik

It's possible that the default number of threads should be something like max(physical cores, min(logical threads, 8)) for x86.

Not true for my Core i5-3230M Ivy Bridge with 8 GB RAM. According to this formula I should use Prefetch(2), but my tests (latest AVS+ 32-bit) showed that in most cases Prefetch(4) is significantly faster.

Myrsloik · 9th August 2018, 16:42

Quote:

Originally Posted by manolito

Not true for my Core i5-3230M Ivy Bridge with 8 GB RAM. According to this formula I should use Prefetch(2), but my tests showed that in most cases Prefetch(4) is significantly faster.

My formula gives 4. I don't see the problem here. The whole thing was pulled out of my butt so no guarantees thats it's optimal.

9th August 2018, 16:54

Quote:

Originally Posted by manolito

Not true for my Core i5-3230M Ivy Bridge with 8 GB RAM. According to this formula I should use Prefetch(2), but my tests (latest AVS+ 32-bit) showed that in most cases Prefetch(4) is significantly faster.

I don't think you applied the formula correctly.

The formula in your case would work through the following steps:

max(2 physical cores, min(4 logical threads, 8 threads))

Min of 4 and 8 is 4.

max(2 physical cores, 4 logical threads)

Max between 2 and 4 would be 4. So, his back-of-the-envelope formula gave you exactly what you claim was the faster number.

Atak_Snajpera · 9th August 2018, 17:19

That formula would return 8 for 8700k/Ryzen 2600 instead of 6. I have disabled 2 cores on my Xeon E5-2690 in BIOS and again test showed that prefetch equal to number of cores is better.

Prefetch(8)

Code:

AVSMeter 2.8.1 (x64) - Copyright (c) 2012-2018, Groucho2004
AviSynth+ 0.1 (r2728, MT, x86_64) (0.1.0.0)

Number of frames:                 7935
Length (hh:mm:ss.ms):     00:02:12.382
Frame width:                      1920
Frame height:                     1080
Framerate:                      59.940 (60000/1001)
Colorspace:                       YV12

Frames processed:               7935 (0 - 7934)
FPS (min | max | average):      0.131 | 944593 | 22.98
Memory usage (phys | virt):     1422 | 1446 MiB
Thread count:                   45
CPU usage (average):            60%

Time (elapsed):                 00:05:45.265

Prefetch(6)

Code:

AVSMeter 2.8.1 (x64) - Copyright (c) 2012-2018, Groucho2004
AviSynth+ 0.1 (r2728, MT, x86_64) (0.1.0.0)

Number of frames:                 7935
Length (hh:mm:ss.ms):     00:02:12.382
Frame width:                      1920
Frame height:                     1080
Framerate:                      59.940 (60000/1001)
Colorspace:                       YV12

Frames processed:               7935 (0 - 7934)
FPS (min | max | average):      0.132 | 708445 | 25.73
Memory usage (phys | virt):     1281 | 1305 MiB
Thread count:                   43
CPU usage (average):            60%

Time (elapsed):                 00:05:08.385

Atak_Snajpera · 9th August 2018, 17:51

Another test but this time disabled 6 cores leaving only 2C/4T.
Prefetch(4)

Code:

AVSMeter 2.8.1 (x64) - Copyright (c) 2012-2018, Groucho2004
AviSynth+ 0.1 (r2728, MT, x86_64) (0.1.0.0)

Number of frames:                 7935
Length (hh:mm:ss.ms):     00:02:12.382
Frame width:                      1920
Frame height:                     1080
Framerate:                      59.940 (60000/1001)
Colorspace:                       YV12

Frames processed:               7935 (0 - 7934)
FPS (min | max | average):      0.092 | 944596 | 10.70
Memory usage (phys | virt):     784 | 808 MiB
Thread count:                   17
CPU usage (average):            89%

Time (elapsed):                 00:12:21.689

Prefetch(2)

Code:

AVSMeter 2.8.1 (x64) - Copyright (c) 2012-2018, Groucho2004
AviSynth+ 0.1 (r2728, MT, x86_64) (0.1.0.0)

Number of frames:                 7935
Length (hh:mm:ss.ms):     00:02:12.382
Frame width:                      1920
Frame height:                     1080
Framerate:                      59.940 (60000/1001)
Colorspace:                       YV12

Frames processed:               7935 (0 - 7934)
FPS (min | max | average):      3.012 | 314865 | 14.37
Memory usage (phys | virt):     620 | 644 MiB
Thread count:                   15
CPU usage (average):            85%

Time (elapsed):                 00:09:12.192

I've noticed that script with Prefetch value higher than number of physical cores has tendency to choke from time to time. (see min. fps)

7th August 2018, 12:10	#4181 \| Link
ajp_anton Registered User Join Date: Aug 2006 Location: Stockholm/Helsinki Posts: 805	When going down in bit depth, dither=-1 means round, right? Or is it floor? Because I see a tendency of everything shifting ever so slightly "down". If you do the following on a YV24 source (repeat a few times to magnify the effect) Code: convertbits(16) converttorgb(matrix="rec709") converttoyuv444(matrix="rec601") convertbits(8) convertbits(16) converttorgb(matrix="rec601") converttoyuv444(matrix="rec709") convertbits(8) , interleave it with the original, and histogram("levels"), you should see that on average, Y,U and V values are all going down. If dither=-1 rounds the values, shouldn't the average stay roughly the same? edit: dither_bits: - Has no effect if dither=-1 (off). - Must be an even number from 2 to bits, inclusive. - In addition, must be >= (clip.BitsPerComponent-8). I don't understand why any of these restrictions exist, except for maybe for implementation reasons. Not that I need this functionality, just wondering. Last edited by ajp_anton; 7th August 2018 at 12:19.

9th August 2018, 08:38	#4192 \| Link
jpsdr Registered User Join Date: Oct 2002 Location: France Posts: 2,316	There was plasma screen (Pioneer Kuro) which were very good for color, SED/FED died before being born, but now there is OLED, which i think will provide good color for CRT people, as i was. Like you, i've never like LCD, but my Plasma Kuro Pioneer gave me satisfaction, and the day i'll have to replace it (because it will happens, the later i hope), i think OLED will satisfy me. __________________ My github.

9th August 2018, 13:49	#4195 \| Link
Myrsloik Professional Code Monkey Join Date: Jun 2003 Location: Kinnarps Chair Posts: 2,555	This is a general answer that applies to most things. You can easily saturate the total memory bandwidth with fewer than the logical number of threads. Especially (A)VS which processes full frames instead of tiles/lines quickly reach that level. And once you have more than the physical number of cores as threads you have reduced cache too... which means each thread is even more likely to have to access and wait even more for RAM. It's possible that the default number of threads should be something like max(physical cores, min(logical threads, 8)) for x86. __________________ VapourSynth - proving that scripting languages and video processing isn't dead yet

9th August 2018, 02:22	#4191 \| Link
manolito Registered User Join Date: Sep 2003 Location: Berlin, Germany Posts: 3,079	Alright, this answers most of my questions, thanks... Since all the high bitdepth and high colors stuff is not for me (I am just too old for all this UHD / HDR / 8K stuff, my viewing device is a 4:3 CRT TV set with natural colors I so far have never seen on an LCD. And I also refuse to converge computer (for working) and TV (for entertainment) stuff). So all I am interested in for AVS+ is speed gain caused by MT. Thanks again manolito