View Full Version : x265 HEVC Encoder
LigH
1st August 2018, 09:30
Please compare with my latest 2.8+57 (https://forum.doom9.org/showthread.php?p=1847763#post1847763) and previous 2.8+47 (https://forum.doom9.org/showthread.php?p=1846673#post1846673) (GCC 7.3.0, generic) and Barough's 2.8+56 (https://forum.doom9.org/showthread.php?p=1847447#post1847447).
And please try to execute the encodes directly in a console window, to avoid mixing up error messages from x265 with error messages from StaxRip.
Motenai Yoda
1st August 2018, 20:35
A higher qcomp will allow distributing more bits to high-motion scenes, thus making them look better.
A lower qcomp will limit ditributing bits to high-motion scenes, allowing to assign them to low-motion scenes whose you notice more a quality loss
Magik Mark
1st August 2018, 23:41
Please compare with my latest 2.8+57 (https://forum.doom9.org/showthread.php?p=1847763#post1847763) and previous 2.8+47 (https://forum.doom9.org/showthread.php?p=1846673#post1846673) (GCC 7.3.0, generic) and Barough's 2.8+56 (https://forum.doom9.org/showthread.php?p=1847447#post1847447).
And please try to execute the encodes directly in a console window, to avoid mixing up error messages from x265 with error messages from StaxRip.
Hi Ligh!
Thanks for helping out. First of all I'm not familiar with with "console window". I exclusively encode using staxrip. I have noticed that build 47 below exhibits no problem at all. This made me think that the problem lies with new or modified switches. Unfortunately, the author of staxrip called-in quit, so no help from him at all
LigH
2nd August 2018, 07:49
Yes, stax76 is not so active anymore. To use x265 in a more frequently updated GUI, which also allows more customization, I'd suggest MeGUI.
It will also be important to know your CPU. Does it support AVX2 at all? And it is interesting to know when x265 crashes. Immediately after starting an encoding job, or after calculating for a while? The log in MeGUI is more verbose here. It would report all of its output, including the CPU ID part.
I'm getting this error since v2.8 + 49 in Staxrip:
I've tried to reproduce the problem (but no hangs):
ffmpeg -i ../original.mkv -v warning -f yuv4mpegpipe - | x265-49 --y4m - --bitrate 1500 --pass 1 --ssim-rd --aq-mode 3 --pools 28 NUL
y4m [info]: 1920x1080 fps 24000/1001 i420p8 sar 1:1 unknown frame count
raw [info]: output file: NUL
x265 [info]: HEVC encoder version 2.8+49-5d34bbf671f7
x265 [info]: build info [Windows][MSVC 1900][64 bit] 10bit
x265 [info]: using cpu capabilities: MMX2 SSE2Fast LZCNT SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2
x265 [info]: Main 10 profile, Level-4 (Main tier)
x265 [info]: Thread pool created using 28 threads
x265 [info]: Slices : 1
x265 [info]: frame threads / pool features : 4 / wpp(17 rows)
x265 [info]: Coding QT: max CU size, min CU size : 64 / 8
x265 [info]: Residual QT: max TU size, max depth : 32 / 1 inter / 1 intra
x265 [info]: ME / range / subpel / merge : hex / 57 / 2 / 2
x265 [info]: Keyframe min / max / scenecut / bias: 23 / 250 / 40 / 5.00
x265 [info]: Lookahead / bframes / badapt : 20 / 4 / 2
x265 [info]: b-pyramid / weightp / weightb : 1 / 1 / 0
x265 [info]: References / ref-limit cu / depth : 3 / on / on
x265 [info]: AQ: mode / str / qg-size / cu-tree : 3 / 1.0 / 32 / 1
x265 [info]: Rate Control / qCompress : ABR-1500 kbps / 0.60
x265 [info]: tools: rd=3 ssim-rd rskip signhide tmvp strong-intra-smoothing
x265 [info]: tools: lslices=6 deblock sao stats-write
x265 [info]: frame I: 11, Avg QP:24.83 kb/s: 18638.87
x265 [info]: frame P: 349, Avg QP:26.64 kb/s: 4358.80
x265 [info]: frame B: 1173, Avg QP:32.98 kb/s: 431.42
x265 [info]: Weighted P-Frames: Y:20.3% UV:9.7%
x265 [info]: consecutive B-frames: 3.9% 1.1% 12.5% 30.3% 52.2%
encoded 1533 frames in 64.16s (23.89 fps), 1456.17 kb/s, Avg QP:31.48
If I understood you correctly: 2.8+47 works, 28+49 hangs at beginning, this might be problem with x265 initialization. There are no new switches from 2.8+47 to 2.8+49.
Could you confirm that 2.8+47 works, 2.8+49 hangs, and what with version 2.8+48 -- please download only 10-bit VS2015 AVX2 builds (2.8+47, 2.8+48 and 2.8+49) and report back which works and which not: www.msystem.waw.pl/x265/test.7z
katzenjoghurt
5th August 2018, 15:26
Just replaced x265 in Staxrip with LigH's 2.8+57 version.
No error yet.
(Current encoding seems to be a bit sluggish but need to observe further - it could very well just be movie related)
I'm using StaxRip 1.7.0.6 from https://github.com/stax76/staxrip/blob/master/changelog.md.
Magik Mark
6th August 2018, 04:02
Ma & LigH,
Found the error in 2pass encoding:
Multipass analysis refinement along with multipass rate control
Multipass refinement of qp based on distortion data
If these two are deactivated everything is ok
Maybe the syntax has changed?
Thanks for the info!
It looks like a bug in x265.
You could/should find exact commit that hangs, for example 2.8+47 works/2.8+48 hangs or 2.8+48 works/2.8+49 hangs. And find in StaxRip log x265 command line and copy it in this thread.
Magik Mark
6th August 2018, 09:38
Build above 47 all hangs
iAvoe
6th August 2018, 11:11
@x265_Project it seems that you would be the only one who can answer my question, since it's becoming complex and I believe the answer can be very long... please have a look(In HEVC, only QP=4 is truly lossless quantization... what about 0~3?): forum.doom9.org/showthread.php?t=175638
Dclose
6th August 2018, 18:16
I really like x265 but I seem to be unable to get rid of linear smearing/stretching artifacts when there are fast moving objects in a scene.
Is there a specific parameter targeted at improving this, without increasing the bit rate in other areas (those are fine)?
My settings are:
--crf 17 --preset veryslow --profile main10 --level-idc 5 --output-depth 10 --psy-rdoq 4 --aq-mode 3 --qg-size 64 --qcomp 0.7 --subme 5 --master-display "G(13250,34500)B(7500,3000)R(34000,16000)WP(15635,16450)L(40000000,50)" --colorprim bt2020 --colormatrix bt2020nc --transfer smpte2084 --max-cll "457,179" --hdr --hdr-opt --deblock -1:-1 --no-sao --no-strong-intra-smoothing
example:
https://i.imgur.com/Nm3l7L2.png
1) imo, if you care about things that move, (and picture quality in general), you have to use sub-motion pixel subme 7. 5 is good, and is as low as I ever set that even on files I'm trying to finish fast, but 5 is easily visually inferior to 7 imo. 7 of course takes longer to encode though.
2) You have qg-size 64. I almost never encode 4k lately, but for 1080, my coding, quant, and tree unit settings are a low of 8 and high of 32, and Max Intra/Inter are maxed. 64 didn't look as sharp, and didn't have any noticeable advantages, even during a fairly recent test I did of them.
3) AQ mode. "auto" (mode 2?) was too inconsistent in quality for me. A main thing is faces tend to lack quality. And faces tend to be the main place on the screen to look at. If mode 3 is the "experimental/dark area" mode, the file sizes were too inconsistent for me. That mode tended to throw a lot of bitrate at the file and too often made the sizes huge. I use normal mode now, for consistency of video quality and filesize. I haven't retested the others in a year or so, so maybe they have improved.
4) With later releases of x265, I stopped messing with the q-comp type of settings. I did a big test on them again a couple months ago and found the default settings are very good.
5) -1/-1 is a lot of deblocking. I use -5/-4 even on encodes most people would probably consider very low bitrate. I'm usually around crf 21-24 though, not 17, so maybe deblocking has less effect at 17 anyway. At crf 17, I would think some obvious setting is wrong somewhere for it to not look great.
Asmodian
7th August 2018, 15:39
Deblocking strength scales with the amount of compression, at lower CRF values deblocking is automatically weaker.
brumsky
8th August 2018, 00:34
I'm having issues with Ripbot264 and Staxrip when trying to encode a 4k video. Do they support UHD? They work fine for 1080p content....
If not, what should I use?
thanks,
Brumsky
user1085
8th August 2018, 05:41
Ripbot works for me for 4k out of the boxI'm having issues with Ripbot264 and Staxrip when trying to encode a 4k video. Do they support UHD? They work fine for 1080p content....
If not, what should I use?
thanks,
Brumsky
LigH
8th August 2018, 08:33
Always the same mistake: The resolution alone is not the relevant attribute of a video. There is still a wide variety of possible container and content formats which could be used to store video with such a resolution. Use MediaInfo to tell us relevant technical attributes.
And if you have issues, tell us about the nature of these issues as verbose as necessary. A minimum requirement is quoting an error message letter by letter, if there is any, possibly even providing a log file. "I'm having issues" is not a sufficient description.
If the conversion crashes for downloaded moviez, you are left at your own peril.
In any case, it would be off-topic in a thread related to the x265 encoder, the reason for issues is usually rather the decoding than the encoding. You may have created a separate thread instead because it happens for more than one converter application.
LigH
8th August 2018, 11:51
MSYS2 recently updated MinGW64 with GCC 8.2.0; due to some internal compiler errors, MinGW32 will stay with GCC 7.3.0, though, until these issues are solved.
x265 2.8+58-d17bc7714ed2 (Win32-GCC730 & Win64-GCC820) (https://www.mediafire.com/file/k2qjvytar5odqjd/x265_2.8%2B58-d17bc7714ed2%28Win32-GCC730_Win64-GCC820%29.7z)
brumsky
8th August 2018, 16:15
@user1085
Thank I wanted to make sure it was supported out of box.
@LigH
LigH, I certainly agree that I did not provide enough information for proper troubleshooting. I know my question would be borderline, at best, for this thread. I just wanted to confirm that those application supported UHD out of the box. Now that I know it is I can proper troubleshoot the issue. I didn't want to waste a ton of time if it wasn't supported to begin with.
It is during decode, mostly ffms2.dll, while it is being indexed. I say mostly as I have had another error not related to ffms2.dll.
Thanks for the quick response and sorry for the "I'm having issues" post. :)
LigH
8th August 2018, 17:59
@brumsky:
Indexing already requires scanning the whole source video. If that already fails, there is a chance that your source has a "hole" ...
FranceBB
9th August 2018, 11:47
I asked it years ago, but I'm gonna ask it again:
Any chance to see assembly optimisations for Main10 on x86 anytime soon in the future?
I know that x64 is what pretty much anyone use nowadays, but it would be useful to have manual assembly optimisation in x86 as well, not just for 8bit, but also for Main10, 'cause it would speed things up a lot.
Test performed with x265 2.8+58-d17bc77 x86 using the following system:
CPU: Intel i7 6700HQ 4c/8th 3.20GHz
RAM: 16 GB (8x2) DDR4
OS: Windows XP Professional x86 with PAE (unlocked HAL) + Microsoft Extended Support
OS: Windows 7 Professional x64
Clip encoded: 4K UHD 10bit 4:2:0 23.976fps source.
Common settings: --preset medium --level 5.0 --tune fastdecode --ref 2 --rc-lookahead 3 -b 2 --profile main10 --bitrate 25000 --deblock -4:-4 --no-open-gop --min-keyint 1 --keyint 24 --repeat-headers --rd 3
1) x265 Main10 plain C++ (GCC 8.2 Optimisation disabled) Win XP x86 = 0.15fps
2) x265 Main10 plain C++ (GCC 8.2 Optimisation SSE4.2) Win XP x86 = 0.44fps
3) x265 Main10 SSE4.2 asm (GCC 8.2 Optimisation SSE4.2) Win 7 x64 = 1.88fps
4) x265 Main10 AVX2 asm (GCC 8.2 Optimisation AVX2) Win 7 x64 = 2.60fps
As you can see from the results, GCC manages to speed up the code by optimising plain C++ code to SSE4.2 automatically, but it's nearly not as fast as the manual assembly optimisation written by x265 developers, which is more than 4 time faster, but unfortunately it's available for x64 only. I'm well aware that implementing manual SSE4.2 assembly optimisation in x86 wouldn't give the same speed boost as it does in x64 due to the different architectures, but it would definitely improve performances over plain C++ (which is all we have for Main10 in x86 right now).
I would post benchmarks of x265 compiled with Visual Studio 2017 as well, but unfortunately I didn't manage to compile the multilib. (8/10/12bit) versions for Win32 with Visual Studio 2017. I did manage to compile the 8bit version, though, but that's not really useful.
So... do you think assembly optimisations on x86 will be introduced for Main10 too anytime soon?
Thank you in advance.
NikosD
9th August 2018, 12:18
I'm well aware that implementing manual SSE4.2 assembly optimisation in x86 wouldn't give the same speed boost as it does in x64 due to the different architectures...
So... do you think assembly optimisations on x86 will be introduced for Main10 too anytime soon?
Nice post, but I think you have already given yourself the answer.
x64 doubles the number of registers and is a lot easier, not only faster, for a developer to implement assembly optimizations.
I don't think that in 2018 it's some kind of priority to optimize for x86.
The percentage of x86-only OSes and CPUs are close to 0.
Of course, nothing stops you from asking.
sneaker_ger
9th August 2018, 12:49
So... do you think assembly optimisations on x86 will be introduced for Main10 too anytime soon?
You mean "re-introduce". Because in the past those existed but the developers deliberately removed them. Not because it wasn't faster but because they wanted to spend their dev time on other things.
So get an old version, new PC/OS or find someone who still develops it. I believe Ma had some branch for it but I don't know how old/recent it is.
LigH
9th August 2018, 14:47
And again the same answer: Very doubtful. The developers already decided to abandon this part of x265, because of reasons:
twice the efforts to make assembly routines with fewer and smaller CPU registers in 32 bit CPU mode
half the available RAM because 10 bit precision per color channel need 16 bit RAM instead of 8 bit for storage, and the limitation to 2 GB (or 4 GB for LAA processes) does not even allow encoding of FullHD (not to mention UHD)
_
Damn, something delayed my reply remarkably. I thought I posted it right after the question...
Asmodian
10th August 2018, 21:42
and the limitation to 2 GB (or 4 GB for LAA processes) does not even allow encoding of FullHD (not to mention UHD)
Interesting point, I had not thought of the memory footprint.
And LAA only applies to 32 bit on 64 bit systems, so the extra work optimizing 32 bit x265 for 10 bit does seem like a poor use of talent.
Barough
13th August 2018, 13:19
x265 v2.8+59-b44d5f0e42f8 (http://www.mediafire.com/file/pg6ady7j79vd7r9/) (32-bit GCC 7.3.0 / 64-bit GCC 8.2.0 8/10/12bit Multilib Windows Binaries)
https://bitbucket.org/multicoreware/x265/commits/branch/default
Ma
13th August 2018, 22:26
@Magik Mark
I've looked at commits 2.8+48 (b0d31e2) and 2.8+49 (5d34bbf). In version +49 there is potentially dangerous change from one (atomic) 32-bit operation to two 16-bit operations. I've reverted these changes -- you can test if patched version 2.8+58 hangs or not (patch file inside)
www.msystem.waw.pl/x265/x265-2.8+58-patched_vs2017-AVX2.7z
Magik Mark
14th August 2018, 08:19
Same problem ma
Ma
14th August 2018, 08:37
Same problem ma
Thanks for info!
Did you check ver. 2.8+48 (form test.7z in post #6260 (https://forum.doom9.org/showthread.php?p=1847951#post1847951))?
Atak_Snajpera
16th August 2018, 19:44
Ryzen Threadripper 2990wx uses 4 NUMA nodes and I would like to check if running 4 instances with manually adjusted --numa-pools could improve performance.
Can somebody verify if those are correct switches?
Instance 1 = --numa-pools "+,-,-,-"
Instance 2 = --numa-pools "-,+,-,-"
Instance 3 = --numa-pools "-,-,+,-"
Instance 4 = --numa-pools "-,-,-,+"
Without any adjustments 5 instances give this
https://p.xfastest.com/~sinchen/GIGABYTE-X399-AORUS-XTREME/GIGABYTE-X399-AORUS-XTREME-66.jpg
2990wx@3.4GHz(all core turbo) is only 20% faster than 1950@3.4GHz
Sagittaire
16th August 2018, 23:14
Ryzen Threadripper 2990wx uses 4 NUMA nodes and I would like to check if running 4 instances with manually adjusted --numa-pools could improve performance.
Can somebody verify if those are correct switches?
Instance 1 = --numa-pools "+,-,-,-"
Instance 2 = --numa-pools "-,+,-,-"
Instance 3 = --numa-pools "-,-,+,-"
Instance 4 = --numa-pools "-,-,-,+"
Without any adjustments 5 instances give this
https://p.xfastest.com/~sinchen/GIGABYTE-X399-AORUS-XTREME/GIGABYTE-X399-AORUS-XTREME-66.jpg
2990wx@3.4GHz(all core turbo) is only 20% faster than 1950@3.4GHz
well 5 instance just became too low for 1080p source ...:eek:
32C/64T for 5 instance for 1080p is more than 6C/12T for each 1080p instance. Unfortunaly, x265 have threading problem at 8 thread (and more) for 1080p source.
If you want really saturate 64 thread CPU, you must use at least 8 instance for 1080p source or at least 2 instance for 2160p source. And perhaps that 8x 1080p instance will saturate RAM with particular CCX connexion (even with quad DDR4 channel).
Atak_Snajpera
17th August 2018, 10:17
No it is not too low. Dual socket (2 NUMA) Intel Xeon E5-4660 v3 (56 threads total) still scales much better than single socket (4 NUMA) 2990WX.
It would probably scale even better if I set numa pools manually.
According to x265 documentation ( https://x265.readthedocs.io/en/default/threading.html )
If you are running multiple encoders on a system with multiple NUMA nodes, it is recommended to isolate each of them to a single node in order to avoid the NUMA overhead of remote memory access.
Can somebody verify than I'm setting numa pools correctly in my previous post?
zub35
17th August 2018, 11:58
x264 has a good optimization option
--tune film [--deblock -1:-1 --psy-rd <unset>:0.15]
why not have the same for x265 ?
--tune film [--no-sao --no-strong-intra-smoothing --psy-rd 4]
RieGo
17th August 2018, 14:09
x264 has a good optimization option
--tune film [--deblock -1:-1 --psy-rd <unset>:0.15]
why not have the same for x265 ?
--tune film [--no-sao --no-strong-intra-smoothing --psy-rd 4]
afaik a film preset is on the todo list... probably may take a while till they/we figure out all sane parameters.
now... why does everyone think it's a good idea to switch off Sample Adaptive Offset in-loop filter? i read about it and it sounds like a nice feature to improve efficiency - no matter what kind of video content is encoded.
i understand that there was supposely a little problem in the early stages of x265 with sao integration. but is this still a thing or is everybody just blindly turning off sao?
microchip8
17th August 2018, 14:36
afaik a film preset is on the todo list... probably may take a while till they/we figure out all sane parameters.
now... why does everyone think it's a good idea to switch off Sample Adaptive Offset in-loop filter? i read about it and it sounds like a nice feature to improve efficiency - no matter what kind of video content is encoded.
i understand that there was supposely a little problem in the early stages of x265 with sao integration. but is this still a thing or is everybody just blindly turning off sao?
SAO still blurs too much so many people disable it if they want to retain as much details as possible. However, at very low bitrates where other artifacts are more visible/present, the blur of SAO produces "better looking" images than an encode without it
RieGo
17th August 2018, 16:51
SAO still blurs too much so many people disable it if they want to retain as much details as possible. However, at very low bitrates where other artifacts are more visible/present, the blur of SAO produces "better looking" images than an encode without it
thanks.
i did some visual comparisons lately but wasn't able to detect any kind of differences at high bitrate - i didn't look at still images, only at video scenes.
at very low bitrate (300kbit/s) there was a lot of quality differences with different parameters, but I didn't look at no-sao...
so probably i'm just a bad quality judge. :D
benwaggoner
17th August 2018, 17:58
thanks.
i did some visual comparisons lately but wasn't able to detect any kind of differences at high bitrate - i didn't look at still images, only at video scenes.
at very low bitrate (300kbit/s) there was a lot of quality differences with different parameters, but I didn't look at no-sao...
so probably i'm just a bad quality judge. :D
SAO should do less as QP goes down, so what you see is how it should work.
NikosD
18th August 2018, 16:56
Can somebody verify if those are correct switches?
Instance 1 = --numa-pools "+,-,-,-"
Instance 2 = --numa-pools "-,+,-,-"
Instance 3 = --numa-pools "-,-,+,-"
Instance 4 = --numa-pools "-,-,-,+"
Can somebody verify than I'm setting numa pools correctly in my previous post?
Please, don't expect answers regarding AMD optimizations in this thread.
They are all Intel fans or worse fanboys.
Even the developers.
LigH
18th August 2018, 17:28
In general, generalizations are wrong. If I could afford a new PC, I would buy a Ryzen. But I could still not buy the insight in its NUMA structure.
FranceBB
18th August 2018, 20:32
If you want really saturate 64 thread CPU, you must use at least 2 instance for 2160p source.
Not just 64 thread CPU, at work I have two Intel Xeon E5-2660V4 14c/28th for a total of 28c/56th and I can't still saturate both CPUs with a 2160p 10bit HDR10 content encoded with preset --medium and bluray compatible specs.
They are all Intel fans or worse fanboys.
Some consumers are moving to AMD, but the majority of businesses are using Intel Xeon CPUs (my company included), so that's what they ask for optimizations.
They are simply following the market needs, nothing more.
Barough
20th August 2018, 16:15
x265 v2.8+66-88ee12651e30 (http://www.mediafire.com/file/087abao10la3nsd/) (32 & 64-bit 8/10/12bit Multilib Windows Binaries)
https://bitbucket.org/multicoreware/x265/commits/branch/default
benwaggoner
21st August 2018, 00:49
Not just 64 thread CPU, at work I have two Intel Xeon E5-2660V4 14c/28th for a total of 28c/56th and I can't still saturate both CPUs with a 2160p 10bit HDR10 content encoded with preset --medium and bluray compatible specs.
That's not surprising. Something like --preset slower would probably be better, but there's only so much threading that can be usefully done in a single instance, and Blu-ray restrictions reduce even that (b-frames can encode in parallel, but BD only allows 2 consecutive). Increasing -F will help, but high values can cause rate control issues.
jlpsvk
21st August 2018, 02:35
Not just 64 thread CPU, at work I have two Intel Xeon E5-2660V4 14c/28th for a total of 28c/56th and I can't still saturate both CPUs with a 2160p 10bit HDR10 content encoded with preset --medium and bluray compatible specs.
could you post your uhd bd compatible command line?
excellentswordfight
21st August 2018, 15:22
afaik a film preset is on the todo list... probably may take a while till they/we figure out all sane parameters.
now... why does everyone think it's a good idea to switch off Sample Adaptive Offset in-loop filter? i read about it and it sounds like a nice feature to improve efficiency - no matter what kind of video content is encoded.
i understand that there was supposely a little problem in the early stages of x265 with sao integration. but is this still a thing or is everybody just blindly turning off sao?
Using --no-sao for a tune film is imo valid. In my experience no-sao does improve fine detail alot with almost no negative effects for general "film" content with lower crf values. Preset slow together with no-sao is imo enough for detail retention now days. Not sure what setting does it, but I find preset Medium to be way softer then preset slow (imo there should only be a bitrate difference between them when doing a CRF encode, but it doesnt work like that I guess).
I have found sao to be usefull for both animation and low bitrate content though (as expected).
Not just 64 thread CPU, at work I have two Intel Xeon E5-2660V4 14c/28th for a total of 28c/56th and I can't still saturate both CPUs with a 2160p 10bit HDR10 content encoded with preset --medium and bluray compatible specs.
To add to this, I see around 70-80% utilization on dual Xeon E5-2680 v3 (48t) systems for 2160p content using preset slow. Imo that is a very reasonable ammount of multithread performance. For 1080p I wouldnt bother with anything more then 8-12C. Start using chunk-encoding if better multithread utilization is needed.
But I still think Atak question is valid, does 2990wx need any NUMA tweaking to perform correctly?
RieGo
21st August 2018, 16:27
Using --no-sao for a tune film is imo valid. In my experience no-sao does improve fine detail alot with almost no negative effects for general "film" content with lower crf values. Preset slow together with no-sao is imo enough for detail retention now days. Not sure what setting does it, but I find preset Medium to be way softer then preset slow (imo there should only be a bitrate difference between them when doing a CRF encode, but it doesnt work like that I guess).
I have found sao to be usefull for both animation and low bitrate content though (as expected).
thanks for your opinion :)
i will do some more visual tests with high and low bitrates and only film content. maybe i can finally understand all your motivation to turn off sao.
update: wow.
so I did a quick test @6000/1000/100 kbit/s with and without sao.
so you are 100% right. no-sao looks just much sharper and retains more details. even I can see it...
not just true on high bitrate but also on medium/low bitrate. but I can understand why it would make kinda sense to have a smooth low bitrate encoding.
so basically i'm sorry for not believing, need to change my presets now. lol
benwaggoner
21st August 2018, 21:51
update: wow.
so I did a quick test @6000/1000/100 kbit/s with and without sao.
so you are 100% right. no-sao looks just much sharper and retains more details. even I can see it...
not just true on high bitrate but also on medium/low bitrate. but I can understand why it would make kinda sense to have a smooth low bitrate encoding.
so basically i'm sorry for not believing, need to change my presets now. lol
Can you share the bitrates and/or command lines you were using?
FranceBB
22nd August 2018, 03:01
That's not surprising. Something like --preset slower would probably be better, but there's only so much threading that can be usefully done in a single instance, and Blu-ray restrictions reduce even that (b-frames can encode in parallel, but BD only allows 2 consecutive). Increasing -F will help, but high values can cause rate control issues.
Yes... With slower I might get something more but still, it's still acceptable.
I see around 70-80% utilization on dual Xeon E5-2680 v3 (48t) systems for 2160p content using preset slow. Imo that is a very reasonable ammount of multithread performance.
Yes, it kinda is.
could you post your uhd bd compatible command line?
Sure.
UHD HDR10 BD50:
x265.exe --y4m - --dither --preset medium --level 5.1 --tune fastdecode --no-high-tier --ref 4 --profile main10 --bitrate 75000 --deblock -1:-1 --hdr-opt --hrd --min-luma 64 --max-luma 940 --chromaloc 2 --range limited --videoformat component --colorprim bt2020 --transfer smpte2084 --colormatrix bt2020nc --master-display "G(13250,34500)B(7500,3000)R(34000,16000)WP(15635,16450)L(10000000,0.0050)" --max-cll 1000,400 --overscan show --no-open-gop --min-keyint 1 --keyint 24 --repeat-headers --rd 3 --vbv-maxrate 75000 --vbv-bufsize 75000 --asm=avx2 --wpp -o "H:\raw_video.hevc"
I know that some of you might be screaming "Ah!!" to that "brutal" clipping, but the uncompressed 16bit stream that x265 is gonna encode has already been brought in Tv Range with a proper LUT that tries to avoid to clip too much, so that's just for "safety reasons", especially 'cause otherwise QC refuses it, even if there's just a single scene in the video that is out of range. No, they don't stare at the video-scope all the time; such a process is automated by a machine that checks the file 1:1 and reports details about luma, chroma, whether there are freeze-frames, blocking of whatever type and so on. Sometimes it fails and it is spot-checked by a human, but still, they refuse the content if it's out of range.
Still, using --preset slow might help a bit, but I would have to specify parameters myself, especially 'cause I'm not using --uhd-bd and I would end up by limiting myself anyway.
Increasing --ref from 4 to 6 might also help.
As to the 75Mbit/s, the specs require the bitrate to stay below 82Mbit/s, but 75Mbit/s plus a bit of oscillation up and down and audio tracks is gonna be fine.
RieGo
22nd August 2018, 09:30
Can you share the bitrates and/or command lines you were using?
nothing fancy, just a simple "--pass x --bitrate 6000 --preset slow --pmode [--no-sao]"
LigH
22nd August 2018, 09:49
Was --pmode useful in your case? It does not cause a speedup in general, it depends on the circumstances, I read...
RieGo
22nd August 2018, 14:53
Was --pmode useful in your case? It does not cause a speedup in general, it depends on the circumstances, I read...
i *think* it improves my cpu saturation. but only on 1080p or lower using 24 threads. wasn't able to get a constant saturation without it.
with 4k content everything is fine even without pmode.
i didn't really make any extensive speed tests though.
benwaggoner
22nd August 2018, 21:55
i *think* it improves my cpu saturation. but only on 1080p or lower using 24 threads. wasn't able to get a constant saturation without it.
with 4k content everything is fine even without pmode.
i didn't really make any extensive speed tests though.
Pmode can easily increase CPU utilization AND reduce encoding speed if you don’t have a whole lot of unused cores when running without it. I’ve seen it speed up encoding 400x224 on a 32 logical core system, but never 1080p or above. But I’ve not tried on anything with >36 logical cores.
Pmode can also theoretically increase quality a bit, since a lot of its parallel work is stuff that would normally have gotten skipped due to early exit. Occasionally it’ll find something better than what was found before the early exit. I’ve never seen it really make a material difference compared to veryslow or placebo.
RieGo
22nd August 2018, 22:32
Pmode can easily increase CPU utilization AND reduce encoding speed if you don’t have a whole lot of unused cores when running without it. I’ve seen it speed up encoding 400x224 on a 32 logical core system, but never 1080p or above. But I’ve not tried on anything with >36 logical cores.
Pmode can also theoretically increase quality a bit, since a lot of its parallel work is stuff that would normally have gotten skipped due to early exit. Occasionally it’ll find something better than what was found before the early exit. I’ve never seen it really make a material difference compared to veryslow or placebo.
yes my feeling was that it might be slower with pmode, but as i said i never actually did speed tests, just looked at cpu usage lol. my bad...
maybe it's a good idea for me to just remove it.
but going to slower is not an option (for me) - slow -> slower almost increases encoding time 100%
vBulletin® v3.8.11, Copyright ©2000-2025, vBulletin Solutions Inc.