View Full Version : About threading and quality in HEVC
gamebox
26th March 2016, 21:01
After my first encoding test with X265, I'm curious to try it some more (possibly use it regularly too), but have some questions.
First - does choice of threading during X265 encoding influence decoding of the said video? That is - if I encode my movie on a dual-core CPU, does that mean it will tend to use 2 cores in decoding too (that is - split most of the decoding "effort" to the same number of cores again, regardless of a CPU used then)?
Second question is - what can I do get rid of unwanted "smoothness" of an X265 video? I encode using Slower preset, targeting bitrates of about 2Mbps for 720p. The image quality is fairly good, but the encode seems more "smoothed out" than X264 videos I create at 3-4Mbps. For X264 I use deblock setting of -1/-1, as it is enough to get rid of most of the blocking and ringing at quantizer levels of 20-22 I usually target. What should I set in X265 to get comparable visual results, or is the "smoothness" I see a problem of a very young codec in active development?
Thanks in advance guys :)
First - does choice of threading during X265 encoding influence decoding of the said video?
No, number of logical cores implies number of frame threads that affect encoding quality (you can avoid this effect by setting manually number of frame threads by '-F' option; more frame threads = less quality).
Second question is - what can I do get rid of unwanted "smoothness" of an X265 video?
You can use '--deblock -1' in x265 too. You should consider encoding videos at bigger resolution that improve detail retention. It is an open problem with unwanted "smoothness" in x265 -- maybe it will be solved in the future, maybe not.
gamebox
27th March 2016, 10:38
Ah, okay, thanks. :)
I wondered if the number of threads used during encoding influences decoder behavior to some extent too. Meaning - if a video is created using 4 threads, it's macroblocks, motion vectors, etc. will be largely "split" into 4 semi-separate "chunks" of inter-dependant data that would decode most naturally on a 4-core CPU again. It still seems natural for me that the same video encoded with different number of threads each time would decode differently too, as it would have different "relations" between video data it contains.
I need to look more carefully for a deblock option in my software, didn't notice it at first. I used MediaCoder as choice of software for X265 encoding on an old WinXP is rather limited. Still, this is more of a "curiosity test" for now - my CPU is only dual-core and impractically slow for a very high-quality encoding I want - I consider a quad-core the minimum for regular encoding. Also, the encoder crashes rather often - especially when I pause the encoding process, and try to resume it after many hours.
I wondered if the number of threads used during encoding influences decoder behavior to some extent too. Meaning - if a video is created using 4 threads, it's macroblocks, motion vectors, etc. will be largely "split" into 4 semi-separate "chunks" of inter-dependant data that would decode most naturally on a 4-core CPU again.
The output format is the same, decoding process is the same. There are 5 options related to the threads that affect output file:
--frame-threads (http://x265.readthedocs.org/en/default/cli.html?highlight=--wpp#cmdoption--frame-threads)
--wpp (http://x265.readthedocs.org/en/default/cli.html?highlight=--wpp#cmdoption--wpp)
--lookahead-slices (http://x265.readthedocs.org/en/default/cli.html?highlight=--wpp#cmdoption--lookahead-slices)
and default disabled:
--pmode (http://x265.readthedocs.org/en/default/cli.html?highlight=--wpp#cmdoption--pmode)
--pme (http://x265.readthedocs.org/en/default/cli.html?highlight=--wpp#cmdoption--pme)
I was trying to produce exactly the same output file with 64 logical core system and 2 logical core system and it's not easy. Even when I turn off 'lookahead-slices' and manually set 'frame-threads' to 1 the output files was slightly different:
$ x265-64 -F1 --lookahead-slices 0 720p50_parkrun_ter.y4m w-64-1.hevc
y4m [info]: 1280x720 fps 50/1 i420p8 sar 1:1 frames 0 - 503 of 504
raw [info]: output file: w-64-1.hevc
x265 [info]: HEVC encoder version 1.9+106-c8ec86965e54
x265 [info]: build info [Windows][GCC 5.3.0][64 bit] 10bit
x265 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX
x265 [info]: Main 10 profile, Level-4 (Main tier)
x265 [info]: Thread pool created using 64 threads
x265 [info]: frame threads / pool features : 1 / wpp(12 rows)
x265 [info]: Coding QT: max CU size, min CU size : 64 / 8
x265 [info]: Residual QT: max TU size, max depth : 32 / 1 inter / 1 intra
x265 [info]: ME / range / subpel / merge : hex / 57 / 2 / 2
x265 [info]: Keyframe min / max / scenecut : 25 / 250 / 40
x265 [info]: Lookahead / bframes / badapt : 20 / 4 / 2
x265 [info]: b-pyramid / weightp / weightb : 1 / 1 / 0
x265 [info]: References / ref-limit cu / depth : 3 / 1 / 1
x265 [info]: AQ: mode / str / qg-size / cu-tree : 1 / 1.0 / 32 / 1
x265 [info]: Rate Control / qCompress : CRF-28.0 / 0.60
x265 [info]: tools: rd=3 psy-rd=2.00 signhide tmvp strong-intra-smoothing
x265 [info]: tools: deblock sao
x265 [info]: frame I: 4, Avg QP:30.27 kb/s: 24449.00
x265 [info]: frame P: 122, Avg QP:34.20 kb/s: 13925.98
x265 [info]: frame B: 378, Avg QP:39.05 kb/s: 682.67
x265 [info]: Weighted P-Frames: Y:0.0% UV:0.0%
x265 [info]: consecutive B-frames: 3.2% 3.2% 8.7% 60.3% 24.6%
encoded 504 frames in 29.68s (16.98 fps), 4077.01 kb/s, Avg QP:37.81
$ x265-2 -F1 --lookahead-slices 0 720p50_parkrun_ter.y4m w-2-1.hevc
y4m [info]: 1280x720 fps 50/1 i420p8 sar 1:1 frames 0 - 503 of 504
raw [info]: output file: w-2-1.hevc
x265 [info]: HEVC encoder version 1.9+106-c8ec86965e54
x265 [info]: build info [Windows][GCC 5.3.0][64 bit] 10bit
x265 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX
x265 [info]: Main 10 profile, Level-4 (Main tier)
x265 [info]: Thread pool created using 2 threads
x265 [info]: frame threads / pool features : 1 / wpp(12 rows)
x265 [info]: Coding QT: max CU size, min CU size : 64 / 8
x265 [info]: Residual QT: max TU size, max depth : 32 / 1 inter / 1 intra
x265 [info]: ME / range / subpel / merge : hex / 57 / 2 / 2
x265 [info]: Keyframe min / max / scenecut : 25 / 250 / 40
x265 [info]: Lookahead / bframes / badapt : 20 / 4 / 2
x265 [info]: b-pyramid / weightp / weightb : 1 / 1 / 0
x265 [info]: References / ref-limit cu / depth : 3 / 1 / 1
x265 [info]: AQ: mode / str / qg-size / cu-tree : 1 / 1.0 / 32 / 1
x265 [info]: Rate Control / qCompress : CRF-28.0 / 0.60
x265 [info]: tools: rd=3 psy-rd=2.00 signhide tmvp strong-intra-smoothing
x265 [info]: tools: deblock sao
x265 [info]: frame I: 4, Avg QP:30.27 kb/s: 24449.00
x265 [info]: frame P: 122, Avg QP:34.20 kb/s: 13925.10
x265 [info]: frame B: 378, Avg QP:39.05 kb/s: 683.04
x265 [info]: Weighted P-Frames: Y:0.0% UV:0.0%
x265 [info]: consecutive B-frames: 3.2% 3.2% 8.7% 60.3% 24.6%
encoded 504 frames in 50.11s (10.06 fps), 4077.08 kb/s, Avg QP:37.81
If I additionally add '--no-wpp' option, the output is bit-identical.
MeteorRain
28th March 2016, 00:09
wpp has rather small impact on quality and thus we usually set it enabled.
frame-thread I usually set it to 1, and other options to be disabled.
wpp itself can make use of multiple threads, and should saturate most desktop level processors on encoding and decoding.
Check WPP animation here (http://www.parabolaresearch.com/blog/2013-12-01-hevc-wavefront-animation.html).
gamebox
3rd April 2016, 01:10
Thanks for the explanation guys. :)
I asked this because I currently use a dual-core system and plan to get a quad-core. I was afraid if videos encoded using 4 instead of 2 cores would have somehow more "fragmented" data that would naturally tend to "split" the decoding effort to more parallel threads and tasks, making such videos (slightly) harder to decode on a dual-core system than those encoded on a dual-core too. If I understand you right, my logic here was false.
benwaggoner
3rd April 2016, 17:59
Thanks for the explanation guys. :)
I asked this because I currently use a dual-core system and plan to get a quad-core. I was afraid if videos encoded using 4 instead of 2 cores would have somehow more "fragmented" data that would naturally tend to "split" the decoding effort to more parallel threads and tasks, making such videos (slightly) harder to decode on a dual-core system than those encoded on a dual-core too. If I understand you right, my logic here was false.
Even with four cores you can still use -F 1 and saturate at least a 720p encode, and probably lower. Higher --presets and --pmode can be good at eating up any leftover CPU. I regularly encode on 16 physical 32 logical core systems, and can get >>50% CPU utilization with -F 1 and --preset slower doing UHD encodes.
Motenai Yoda
4th April 2016, 00:22
@MeteorRain and @Ben how much would you estimate the efficiency/quality loss of using frame-threads or lookahead-slices?
(for a consumer cpu like 5820k or 5960x) with 1080p and 4k footage
MeteorRain
4th April 2016, 15:44
@MeteorRain and @Ben how much would you estimate the efficiency/quality loss of using frame-threads or lookahead-slices?
(for a consumer cpu like 5820k or 5960x) with 1080p and 4k footage
Sorry not much idea. But for a consumer grade CPU, usually you can max out the usage without using either of them.
vBulletin® v3.8.11, Copyright ©2000-2026, vBulletin Solutions Inc.