MonoS
29th December 2023, 18:35
Hi, I have a system with a Ryzen 7950x, in the past days I've started a 4K encode with the following settings
vspipe "test.vpy" - -c y4m | x265 --crf 17 --preset veryslow --master-display "G(8500,39850)B(6550,2300)R(35400,14600)WP(15635,16450)L(10000000,1)"
--hme --hme-range 16,32,48 --hme-search dia,umh,star --deblock -1:1 --sao --cbqpoffs -1 --crqpoffs -1 --min-keyint 1 --keyint 1440 --rskip 0
--no-early-skip --rd-refine --aq-mode 4 --colormatrix 9 --transfer 16 --colorprim 9 --selective-sao 2 --sao-non-deblock --limit-sao --subme 5
--qg-size 8 --tu-intra-depth 4 --tu-inter-depth 4 --rc-lookahead 60 --y4m --hdr10 --hdr10-opt --psy-rd 2 --psy-rdoq 4 --aq-strength 0.6 --asm avx512
--no-rect --no-amp - "test.hevc"
y4m [info]: 3840x2076 fps 24000/1001 i420p10 frames 0 - 277022 of 277023
x265 [info]: Using preset veryslow & tune none
raw [info]: output file: output.hevc
x265 [info]: HEVC encoder version 3.5+97-ga456c6e73+3-g87155154d
x265 [info]: build info [Windows][clang 14.0.4][64 bit] Kyouko 10bit+8bit+12bit
x265 [info]: using cpu capabilities: MMX2 SSE2Fast LZCNT SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2 AVX512
x265 [warning]: Turning on repeat-headers for HDR compatibility
x265 [info]: Main 10 profile, Level-5 (Main tier)
x265 [info]: Thread pool created using 32 threads
x265 [info]: Slices : 1
x265 [info]: frame threads / pool features : 6 / wpp(33 rows)
x265 [info]: Coding QT: max CU size, min CU size : 64 / 8
x265 [info]: Residual QT: max TU size, max depth : 32 / 4 inter / 4 intra
x265 [info]: HME L0,1,2 / range / subpel / merge : dia, umh, star / 48 / 5 / 5
x265 [info]: Keyframe min / max / scenecut / bias : 1 / 1440 / 40 / 5.00
x265 [info]: Cb/Cr QP Offset : -1 / -1
x265 [info]: Lookahead / bframes / badapt : 60 / 8 / 2
x265 [info]: b-pyramid / weightp / weightb : 1 / 1 / 1
x265 [info]: References / ref-limit cu / depth : 5 / off / off
x265 [info]: AQ: mode / str / qg-size / cu-tree : 4 / 0.6 / 8 / 1
x265 [info]: Rate Control / qCompress : CRF-17.0 / 0.60
x265 [info]: tools: rd=6 psy-rd=2.00 rdoq=2 psy-rdoq=4.00 rd-refine signhide
x265 [info]: tools: tmvp b-intra strong-intra-smoothing deblock(tC=-1:B=1)
x265 [info]: tools: sao-non-deblock selective-sao
The CPU utilization reach 100% only some of the time, with almost one quarter of the time spent in a lower utilization range.
https://i.ibb.co/HKZspRv/overral.png (https://ibb.co/HKZspRv)
https://i.ibb.co/MCqBKvv/per-core.png (https://ibb.co/MCqBKvv)
I've tried a lot of different settings, here x265 CPU utilization issue.zip (https://www.mediafire.com/file/10zxvkdj1gv2jj3/x265+CPU+utilization+issue.zip/file) you can find a couple of file
x265_bench.CSV and ffmpeg_x265_bench.CSV: Raw CSV generated by HWInfo during the encodes with a 1s refresh interval
prove.txt: information about the different tests i've made with start and end time (to cross reference with the HWInfo's CSVs), the CMD used, the speed and the resulting encode statistics
x265_bench_finale.ods: Worksheet with, on the first sheet the cleaned data from the CSVs, then all the other are the statistics for some of the test, cell C11 is the average of CPU utilization during the encode, Column C and D of the graph are respectively the "Core Usage average" and "5s average of the core usage average"
For the input i'm using simple VS script just for indexing and cropping, by itself it runs at 160fps, so i'm sure is not the bottleneck, just to be sure you'll find a test with FFMPEG for the input pipe.
What could be the issue? Is there a particular setting which hinder parallelization or maybe x265 have some problem parallelizing across that many threads?
I've noticed that veryslow by itself is capable of saturating my setup, with a 96% usage average, but when disabling rect and amp usage drop to 82% like in all the other tests I've calculated.
vspipe "test.vpy" - -c y4m | x265 --crf 17 --preset veryslow --master-display "G(8500,39850)B(6550,2300)R(35400,14600)WP(15635,16450)L(10000000,1)"
--hme --hme-range 16,32,48 --hme-search dia,umh,star --deblock -1:1 --sao --cbqpoffs -1 --crqpoffs -1 --min-keyint 1 --keyint 1440 --rskip 0
--no-early-skip --rd-refine --aq-mode 4 --colormatrix 9 --transfer 16 --colorprim 9 --selective-sao 2 --sao-non-deblock --limit-sao --subme 5
--qg-size 8 --tu-intra-depth 4 --tu-inter-depth 4 --rc-lookahead 60 --y4m --hdr10 --hdr10-opt --psy-rd 2 --psy-rdoq 4 --aq-strength 0.6 --asm avx512
--no-rect --no-amp - "test.hevc"
y4m [info]: 3840x2076 fps 24000/1001 i420p10 frames 0 - 277022 of 277023
x265 [info]: Using preset veryslow & tune none
raw [info]: output file: output.hevc
x265 [info]: HEVC encoder version 3.5+97-ga456c6e73+3-g87155154d
x265 [info]: build info [Windows][clang 14.0.4][64 bit] Kyouko 10bit+8bit+12bit
x265 [info]: using cpu capabilities: MMX2 SSE2Fast LZCNT SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2 AVX512
x265 [warning]: Turning on repeat-headers for HDR compatibility
x265 [info]: Main 10 profile, Level-5 (Main tier)
x265 [info]: Thread pool created using 32 threads
x265 [info]: Slices : 1
x265 [info]: frame threads / pool features : 6 / wpp(33 rows)
x265 [info]: Coding QT: max CU size, min CU size : 64 / 8
x265 [info]: Residual QT: max TU size, max depth : 32 / 4 inter / 4 intra
x265 [info]: HME L0,1,2 / range / subpel / merge : dia, umh, star / 48 / 5 / 5
x265 [info]: Keyframe min / max / scenecut / bias : 1 / 1440 / 40 / 5.00
x265 [info]: Cb/Cr QP Offset : -1 / -1
x265 [info]: Lookahead / bframes / badapt : 60 / 8 / 2
x265 [info]: b-pyramid / weightp / weightb : 1 / 1 / 1
x265 [info]: References / ref-limit cu / depth : 5 / off / off
x265 [info]: AQ: mode / str / qg-size / cu-tree : 4 / 0.6 / 8 / 1
x265 [info]: Rate Control / qCompress : CRF-17.0 / 0.60
x265 [info]: tools: rd=6 psy-rd=2.00 rdoq=2 psy-rdoq=4.00 rd-refine signhide
x265 [info]: tools: tmvp b-intra strong-intra-smoothing deblock(tC=-1:B=1)
x265 [info]: tools: sao-non-deblock selective-sao
The CPU utilization reach 100% only some of the time, with almost one quarter of the time spent in a lower utilization range.
https://i.ibb.co/HKZspRv/overral.png (https://ibb.co/HKZspRv)
https://i.ibb.co/MCqBKvv/per-core.png (https://ibb.co/MCqBKvv)
I've tried a lot of different settings, here x265 CPU utilization issue.zip (https://www.mediafire.com/file/10zxvkdj1gv2jj3/x265+CPU+utilization+issue.zip/file) you can find a couple of file
x265_bench.CSV and ffmpeg_x265_bench.CSV: Raw CSV generated by HWInfo during the encodes with a 1s refresh interval
prove.txt: information about the different tests i've made with start and end time (to cross reference with the HWInfo's CSVs), the CMD used, the speed and the resulting encode statistics
x265_bench_finale.ods: Worksheet with, on the first sheet the cleaned data from the CSVs, then all the other are the statistics for some of the test, cell C11 is the average of CPU utilization during the encode, Column C and D of the graph are respectively the "Core Usage average" and "5s average of the core usage average"
For the input i'm using simple VS script just for indexing and cropping, by itself it runs at 160fps, so i'm sure is not the bottleneck, just to be sure you'll find a test with FFMPEG for the input pipe.
What could be the issue? Is there a particular setting which hinder parallelization or maybe x265 have some problem parallelizing across that many threads?
I've noticed that veryslow by itself is capable of saturating my setup, with a 96% usage average, but when disabling rect and amp usage drop to 82% like in all the other tests I've calculated.