View Single Post
Old 9th August 2018, 11:47   #6273  |  Link
FranceBB
Broadcast Encoder
 
FranceBB's Avatar
 
Join Date: Nov 2013
Location: Royal Borough of Kensington & Chelsea, UK
Posts: 2,904
I asked it years ago, but I'm gonna ask it again:

Any chance to see assembly optimisations for Main10 on x86 anytime soon in the future?

I know that x64 is what pretty much anyone use nowadays, but it would be useful to have manual assembly optimisation in x86 as well, not just for 8bit, but also for Main10, 'cause it would speed things up a lot.

Test performed with x265 2.8+58-d17bc77 x86 using the following system:

CPU: Intel i7 6700HQ 4c/8th 3.20GHz
RAM: 16 GB (8x2) DDR4
OS: Windows XP Professional x86 with PAE (unlocked HAL) + Microsoft Extended Support
OS: Windows 7 Professional x64

Clip encoded: 4K UHD 10bit 4:2:0 23.976fps source.
Common settings: --preset medium --level 5.0 --tune fastdecode --ref 2 --rc-lookahead 3 -b 2 --profile main10 --bitrate 25000 --deblock -4:-4 --no-open-gop --min-keyint 1 --keyint 24 --repeat-headers --rd 3

1) x265 Main10 plain C++ (GCC 8.2 Optimisation disabled) Win XP x86 = 0.15fps
2) x265 Main10 plain C++ (GCC 8.2 Optimisation SSE4.2) Win XP x86 = 0.44fps
3) x265 Main10 SSE4.2 asm (GCC 8.2 Optimisation SSE4.2) Win 7 x64 = 1.88fps
4) x265 Main10 AVX2 asm (GCC 8.2 Optimisation AVX2) Win 7 x64 = 2.60fps

As you can see from the results, GCC manages to speed up the code by optimising plain C++ code to SSE4.2 automatically, but it's nearly not as fast as the manual assembly optimisation written by x265 developers, which is more than 4 time faster, but unfortunately it's available for x64 only. I'm well aware that implementing manual SSE4.2 assembly optimisation in x86 wouldn't give the same speed boost as it does in x64 due to the different architectures, but it would definitely improve performances over plain C++ (which is all we have for Main10 in x86 right now).
I would post benchmarks of x265 compiled with Visual Studio 2017 as well, but unfortunately I didn't manage to compile the multilib. (8/10/12bit) versions for Win32 with Visual Studio 2017. I did manage to compile the 8bit version, though, but that's not really useful.

So... do you think assembly optimisations on x86 will be introduced for Main10 too anytime soon?

Thank you in advance.

Last edited by FranceBB; 9th August 2018 at 11:56.
FranceBB is offline   Reply With Quote