Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Video Encoding > High Efficiency Video Coding (HEVC)

Reply
 
Thread Tools Search this Thread Display Modes
Old 6th August 2018, 18:16   #6261  |  Link
Dclose
Registered User
 
Join Date: Aug 2014
Posts: 50
Quote:
Originally Posted by K.i.N.G View Post
I really like x265 but I seem to be unable to get rid of linear smearing/stretching artifacts when there are fast moving objects in a scene.
Is there a specific parameter targeted at improving this, without increasing the bit rate in other areas (those are fine)?

My settings are:
--crf 17 --preset veryslow --profile main10 --level-idc 5 --output-depth 10 --psy-rdoq 4 --aq-mode 3 --qg-size 64 --qcomp 0.7 --subme 5 --master-display "G(13250,34500)B(7500,3000)R(34000,16000)WP(15635,16450)L(40000000,50)" --colorprim bt2020 --colormatrix bt2020nc --transfer smpte2084 --max-cll "457,179" --hdr --hdr-opt --deblock -1:-1 --no-sao --no-strong-intra-smoothing

example:
1) imo, if you care about things that move, (and picture quality in general), you have to use sub-motion pixel subme 7. 5 is good, and is as low as I ever set that even on files I'm trying to finish fast, but 5 is easily visually inferior to 7 imo. 7 of course takes longer to encode though.

2) You have qg-size 64. I almost never encode 4k lately, but for 1080, my coding, quant, and tree unit settings are a low of 8 and high of 32, and Max Intra/Inter are maxed. 64 didn't look as sharp, and didn't have any noticeable advantages, even during a fairly recent test I did of them.

3) AQ mode. "auto" (mode 2?) was too inconsistent in quality for me. A main thing is faces tend to lack quality. And faces tend to be the main place on the screen to look at. If mode 3 is the "experimental/dark area" mode, the file sizes were too inconsistent for me. That mode tended to throw a lot of bitrate at the file and too often made the sizes huge. I use normal mode now, for consistency of video quality and filesize. I haven't retested the others in a year or so, so maybe they have improved.

4) With later releases of x265, I stopped messing with the q-comp type of settings. I did a big test on them again a couple months ago and found the default settings are very good.

5) -1/-1 is a lot of deblocking. I use -5/-4 even on encodes most people would probably consider very low bitrate. I'm usually around crf 21-24 though, not 17, so maybe deblocking has less effect at 17 anyway. At crf 17, I would think some obvious setting is wrong somewhere for it to not look great.
Dclose is offline   Reply With Quote
Old 7th August 2018, 15:39   #6262  |  Link
Asmodian
Registered User
 
Join Date: Feb 2002
Location: San Jose, California
Posts: 4,406
Deblocking strength scales with the amount of compression, at lower CRF values deblocking is automatically weaker.
__________________
madVR options explained
Asmodian is offline   Reply With Quote
Old 8th August 2018, 00:34   #6263  |  Link
brumsky
Registered User
 
Join Date: Jun 2016
Posts: 116
GUIs?

I'm having issues with Ripbot264 and Staxrip when trying to encode a 4k video. Do they support UHD? They work fine for 1080p content....

If not, what should I use?

thanks,
Brumsky

Last edited by brumsky; 8th August 2018 at 00:37.
brumsky is offline   Reply With Quote
Old 8th August 2018, 05:41   #6264  |  Link
user1085
Registered User
 
Join Date: Apr 2018
Posts: 22
Ripbot works for me for 4k out of the box
Quote:
Originally Posted by brumsky View Post
I'm having issues with Ripbot264 and Staxrip when trying to encode a 4k video. Do they support UHD? They work fine for 1080p content....

If not, what should I use?

thanks,
Brumsky
user1085 is offline   Reply With Quote
Old 8th August 2018, 08:33   #6265  |  Link
LigH
German doom9/Gleitz SuMo
 
LigH's Avatar
 
Join Date: Oct 2001
Location: Germany, rural Altmark
Posts: 6,753
Always the same mistake: The resolution alone is not the relevant attribute of a video. There is still a wide variety of possible container and content formats which could be used to store video with such a resolution. Use MediaInfo to tell us relevant technical attributes.

And if you have issues, tell us about the nature of these issues as verbose as necessary. A minimum requirement is quoting an error message letter by letter, if there is any, possibly even providing a log file. "I'm having issues" is not a sufficient description.

If the conversion crashes for downloaded moviez, you are left at your own peril.

In any case, it would be off-topic in a thread related to the x265 encoder, the reason for issues is usually rather the decoding than the encoding. You may have created a separate thread instead because it happens for more than one converter application.
__________________

New German Gleitz board
MediaFire: x264 | x265 | VPx | AOM | Xvid

Last edited by LigH; 8th August 2018 at 08:38.
LigH is offline   Reply With Quote
Old 8th August 2018, 11:51   #6266  |  Link
LigH
German doom9/Gleitz SuMo
 
LigH's Avatar
 
Join Date: Oct 2001
Location: Germany, rural Altmark
Posts: 6,753
MSYS2 recently updated MinGW64 with GCC 8.2.0; due to some internal compiler errors, MinGW32 will stay with GCC 7.3.0, though, until these issues are solved.

x265 2.8+58-d17bc7714ed2 (Win32-GCC730 & Win64-GCC820)
__________________

New German Gleitz board
MediaFire: x264 | x265 | VPx | AOM | Xvid
LigH is offline   Reply With Quote
Old 8th August 2018, 16:15   #6267  |  Link
brumsky
Registered User
 
Join Date: Jun 2016
Posts: 116
@user1085

Thank I wanted to make sure it was supported out of box.

@LigH

LigH, I certainly agree that I did not provide enough information for proper troubleshooting. I know my question would be borderline, at best, for this thread. I just wanted to confirm that those application supported UHD out of the box. Now that I know it is I can proper troubleshoot the issue. I didn't want to waste a ton of time if it wasn't supported to begin with.

It is during decode, mostly ffms2.dll, while it is being indexed. I say mostly as I have had another error not related to ffms2.dll.


Thanks for the quick response and sorry for the "I'm having issues" post.
brumsky is offline   Reply With Quote
Old 8th August 2018, 17:59   #6268  |  Link
LigH
German doom9/Gleitz SuMo
 
LigH's Avatar
 
Join Date: Oct 2001
Location: Germany, rural Altmark
Posts: 6,753
@brumsky:

Indexing already requires scanning the whole source video. If that already fails, there is a chance that your source has a "hole" ...
__________________

New German Gleitz board
MediaFire: x264 | x265 | VPx | AOM | Xvid
LigH is offline   Reply With Quote
Old 9th August 2018, 11:47   #6269  |  Link
FranceBB
Broadcast Encoder
 
FranceBB's Avatar
 
Join Date: Nov 2013
Location: Royal Borough of Kensington & Chelsea, UK
Posts: 2,883
I asked it years ago, but I'm gonna ask it again:

Any chance to see assembly optimisations for Main10 on x86 anytime soon in the future?

I know that x64 is what pretty much anyone use nowadays, but it would be useful to have manual assembly optimisation in x86 as well, not just for 8bit, but also for Main10, 'cause it would speed things up a lot.

Test performed with x265 2.8+58-d17bc77 x86 using the following system:

CPU: Intel i7 6700HQ 4c/8th 3.20GHz
RAM: 16 GB (8x2) DDR4
OS: Windows XP Professional x86 with PAE (unlocked HAL) + Microsoft Extended Support
OS: Windows 7 Professional x64

Clip encoded: 4K UHD 10bit 4:2:0 23.976fps source.
Common settings: --preset medium --level 5.0 --tune fastdecode --ref 2 --rc-lookahead 3 -b 2 --profile main10 --bitrate 25000 --deblock -4:-4 --no-open-gop --min-keyint 1 --keyint 24 --repeat-headers --rd 3

1) x265 Main10 plain C++ (GCC 8.2 Optimisation disabled) Win XP x86 = 0.15fps
2) x265 Main10 plain C++ (GCC 8.2 Optimisation SSE4.2) Win XP x86 = 0.44fps
3) x265 Main10 SSE4.2 asm (GCC 8.2 Optimisation SSE4.2) Win 7 x64 = 1.88fps
4) x265 Main10 AVX2 asm (GCC 8.2 Optimisation AVX2) Win 7 x64 = 2.60fps

As you can see from the results, GCC manages to speed up the code by optimising plain C++ code to SSE4.2 automatically, but it's nearly not as fast as the manual assembly optimisation written by x265 developers, which is more than 4 time faster, but unfortunately it's available for x64 only. I'm well aware that implementing manual SSE4.2 assembly optimisation in x86 wouldn't give the same speed boost as it does in x64 due to the different architectures, but it would definitely improve performances over plain C++ (which is all we have for Main10 in x86 right now).
I would post benchmarks of x265 compiled with Visual Studio 2017 as well, but unfortunately I didn't manage to compile the multilib. (8/10/12bit) versions for Win32 with Visual Studio 2017. I did manage to compile the 8bit version, though, but that's not really useful.

So... do you think assembly optimisations on x86 will be introduced for Main10 too anytime soon?

Thank you in advance.

Last edited by FranceBB; 9th August 2018 at 11:56.
FranceBB is offline   Reply With Quote
Old 9th August 2018, 12:18   #6270  |  Link
NikosD
Registered User
 
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
Quote:
Originally Posted by FranceBB View Post
I'm well aware that implementing manual SSE4.2 assembly optimisation in x86 wouldn't give the same speed boost as it does in x64 due to the different architectures...

So... do you think assembly optimisations on x86 will be introduced for Main10 too anytime soon?
Nice post, but I think you have already given yourself the answer.

x64 doubles the number of registers and is a lot easier, not only faster, for a developer to implement assembly optimizations.

I don't think that in 2018 it's some kind of priority to optimize for x86.

The percentage of x86-only OSes and CPUs are close to 0.

Of course, nothing stops you from asking.
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1)
HEVC decoding benchmarks
H.264 DXVA Benchmarks for all
NikosD is offline   Reply With Quote
Old 9th August 2018, 12:49   #6271  |  Link
sneaker_ger
Registered User
 
Join Date: Dec 2002
Posts: 5,565
Quote:
Originally Posted by FranceBB View Post
So... do you think assembly optimisations on x86 will be introduced for Main10 too anytime soon?
You mean "re-introduce". Because in the past those existed but the developers deliberately removed them. Not because it wasn't faster but because they wanted to spend their dev time on other things.

So get an old version, new PC/OS or find someone who still develops it. I believe Ma had some branch for it but I don't know how old/recent it is.
sneaker_ger is offline   Reply With Quote
Old 9th August 2018, 14:47   #6272  |  Link
LigH
German doom9/Gleitz SuMo
 
LigH's Avatar
 
Join Date: Oct 2001
Location: Germany, rural Altmark
Posts: 6,753
And again the same answer: Very doubtful. The developers already decided to abandon this part of x265, because of reasons:
  • twice the efforts to make assembly routines with fewer and smaller CPU registers in 32 bit CPU mode
  • half the available RAM because 10 bit precision per color channel need 16 bit RAM instead of 8 bit for storage, and the limitation to 2 GB (or 4 GB for LAA processes) does not even allow encoding of FullHD (not to mention UHD)
_

Damn, something delayed my reply remarkably. I thought I posted it right after the question...
__________________

New German Gleitz board
MediaFire: x264 | x265 | VPx | AOM | Xvid

Last edited by LigH; 9th August 2018 at 14:49.
LigH is offline   Reply With Quote
Old 10th August 2018, 21:42   #6273  |  Link
Asmodian
Registered User
 
Join Date: Feb 2002
Location: San Jose, California
Posts: 4,406
Quote:
Originally Posted by LigH View Post
and the limitation to 2 GB (or 4 GB for LAA processes) does not even allow encoding of FullHD (not to mention UHD)
Interesting point, I had not thought of the memory footprint.

And LAA only applies to 32 bit on 64 bit systems, so the extra work optimizing 32 bit x265 for 10 bit does seem like a poor use of talent.
__________________
madVR options explained
Asmodian is offline   Reply With Quote
Old 13th August 2018, 13:19   #6274  |  Link
Barough
Registered User
 
Barough's Avatar
 
Join Date: Feb 2007
Location: Sweden
Posts: 480
x265 v2.8+59-b44d5f0e42f8 (32-bit GCC 7.3.0 / 64-bit GCC 8.2.0 8/10/12bit Multilib Windows Binaries)

Code:
https://bitbucket.org/multicoreware/x265/commits/branch/default
Barough is offline   Reply With Quote
Old 13th August 2018, 22:26   #6275  |  Link
Ma
Registered User
 
Join Date: Feb 2015
Posts: 326
@Magik Mark

I've looked at commits 2.8+48 (b0d31e2) and 2.8+49 (5d34bbf). In version +49 there is potentially dangerous change from one (atomic) 32-bit operation to two 16-bit operations. I've reverted these changes -- you can test if patched version 2.8+58 hangs or not (patch file inside)
http://www.msystem.waw.pl/x265/x265-...vs2017-AVX2.7z
Ma is offline   Reply With Quote
Old 14th August 2018, 08:19   #6276  |  Link
Magik Mark
Registered User
 
Join Date: Dec 2014
Posts: 666
Same problem ma
__________________
Asus ProArt Z790 - 13th Gen Intel i9 - RTX 3080 - DDR5 64GB Predator - LG OLED C9 - Yamaha A3030 - Windows 11 x64 - PotPlayerr - Lav - MadVR
Magik Mark is offline   Reply With Quote
Old 14th August 2018, 08:37   #6277  |  Link
Ma
Registered User
 
Join Date: Feb 2015
Posts: 326
Quote:
Originally Posted by Magik Mark View Post
Same problem ma
Thanks for info!

Did you check ver. 2.8+48 (form test.7z in post #6260)?
Ma is offline   Reply With Quote
Old 16th August 2018, 19:44   #6278  |  Link
Atak_Snajpera
RipBot264 author
 
Atak_Snajpera's Avatar
 
Join Date: May 2006
Location: Poland
Posts: 7,806
Ryzen Threadripper 2990wx uses 4 NUMA nodes and I would like to check if running 4 instances with manually adjusted --numa-pools could improve performance.
Can somebody verify if those are correct switches?

Code:
Instance 1 = --numa-pools "+,-,-,-" 
Instance 2 = --numa-pools "-,+,-,-"
Instance 3 = --numa-pools "-,-,+,-"
Instance 4 = --numa-pools "-,-,-,+"
Without any adjustments 5 instances give this

2990wx@3.4GHz(all core turbo) is only 20% faster than 1950@3.4GHz

Last edited by Atak_Snajpera; 16th August 2018 at 19:51.
Atak_Snajpera is offline   Reply With Quote
Old 16th August 2018, 23:14   #6279  |  Link
Sagittaire
Testeur de codecs
 
Sagittaire's Avatar
 
Join Date: May 2003
Location: France
Posts: 2,484
Quote:
Originally Posted by Atak_Snajpera View Post
Ryzen Threadripper 2990wx uses 4 NUMA nodes and I would like to check if running 4 instances with manually adjusted --numa-pools could improve performance.
Can somebody verify if those are correct switches?

Code:
Instance 1 = --numa-pools "+,-,-,-" 
Instance 2 = --numa-pools "-,+,-,-"
Instance 3 = --numa-pools "-,-,+,-"
Instance 4 = --numa-pools "-,-,-,+"
Without any adjustments 5 instances give this

2990wx@3.4GHz(all core turbo) is only 20% faster than 1950@3.4GHz
well 5 instance just became too low for 1080p source ...

32C/64T for 5 instance for 1080p is more than 6C/12T for each 1080p instance. Unfortunaly, x265 have threading problem at 8 thread (and more) for 1080p source.

If you want really saturate 64 thread CPU, you must use at least 8 instance for 1080p source or at least 2 instance for 2160p source. And perhaps that 8x 1080p instance will saturate RAM with particular CCX connexion (even with quad DDR4 channel).
__________________
Le Sagittaire ... ;-)

1- Ateme AVC or x264
2- VP7 or RV10 only for anime
3- XviD, DivX or WMV9

Last edited by Sagittaire; 16th August 2018 at 23:23.
Sagittaire is offline   Reply With Quote
Old 17th August 2018, 10:17   #6280  |  Link
Atak_Snajpera
RipBot264 author
 
Atak_Snajpera's Avatar
 
Join Date: May 2006
Location: Poland
Posts: 7,806
No it is not too low. Dual socket (2 NUMA) Intel Xeon E5-4660 v3 (56 threads total) still scales much better than single socket (4 NUMA) 2990WX.
It would probably scale even better if I set numa pools manually.

According to x265 documentation ( https://x265.readthedocs.io/en/default/threading.html )
Quote:
If you are running multiple encoders on a system with multiple NUMA nodes, it is recommended to isolate each of them to a single node in order to avoid the NUMA overhead of remote memory access.
Can somebody verify than I'm setting numa pools correctly in my previous post?

Last edited by Atak_Snajpera; 17th August 2018 at 10:22.
Atak_Snajpera is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 00:49.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.