Log in

View Full Version : x265 HEVC Encoder


Pages : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 [195] 196 197

Z2697
17th January 2025, 04:05
Benchmarking with different builds, I've noticed that Patman86's x265-4.1+79+12-81640d428 ICC standard/AVX/AVX2 builds are identical but 1-2 bytes.

It seems a bit strange to me and I've opened a issue on GitHub.

Please be aware of that in the mean time.

You should compare with both setting no-info, and do a proper bytes compare instead of just telling from the size

Never mind, so you were talking about the executable, I thought you were talking about the encoded hevc stream.

Z2697
17th January 2025, 04:23
Hmm. Perhaps higher QP makes for fewer early exists so doing more in parallel helps? Can you share the command line? I imagine stuff like TU size could have different impact.


--preset slow
--rd 6
--ctu 32
--no-rect
--no-sao
--no-strong-intra-smoothing
--no-open-gop
--b-intra
--weightb
--aq-mode 1
--aq-strength 0.8
--qcomp 0.7
--pbratio 1.2
--bframes 3
--cbqpoffs -2
--crqpoffs -2
--deblock -3,-3
--rc-lookahead 80

benwaggoner
17th January 2025, 19:51
--preset slow
--rd 6
--ctu 32
--no-rect
--no-sao
--no-strong-intra-smoothing
--no-open-gop
--b-intra
--weightb
--aq-mode 1
--aq-strength 0.8
--qcomp 0.7
--pbratio 1.2
--bframes 3
--cbqpoffs -2
--crqpoffs -2
--deblock -3,-3
--rc-lookahead 80

Huh. What are you trying to encode/optimize for with those settings? And this is with 1080p --crf, right?

Nothing pops out as impacting performance in particular. However, as CABAC is single threaded per frame, just reducing the bitrate itself will improve performance on many systems.

Z2697
17th January 2025, 20:43
Huh. What are you trying to encode/optimize for with those settings? And this is with 1080p --crf, right?

Nothing pops out as impacting performance in particular. However, as CABAC is single threaded per frame, just reducing the bitrate itself will improve performance on many systems.

I don't usually optimize x265 parameters for every different sources based on their "style" or characteristic, unless the image quality of the source is very characterized.
These are just... "generic" settings.
I choose my encoding parameters based on the performance (FPS & RD), and the "dark biased AQ" based on... I see fit. I frankensteined AQ 1 and drak-bias as new mode in my mod build, yes :o
(AQ is not really about the RD performance, of course. The FPS performance also don't seem to affected much by AQ except edge based ones.)

However, the major portion of the things I encode is anime. When I do non-test encoding I use slower parameters. (e.g. hme)

And yes, the test is 1080p CRF.

Sagittaire
19th January 2025, 16:30
It's likely that Zen 5 is not much of an advance in x265 encoding compared to other workloads.

TPU uses x265 with preset slow at 4K resolution. It fully saturates my 5900X and I'm guessing it fully saturates a 9900X as well. Yet, the 9900X is only 25% faster than 5900x while the 9950x is 27% faster than 5950x in he TPU benchmark and it sounds about right.

Power consumption is on a different level though. The 9900X consumes around 170W fully loaded while an M4 Pro consumes less than 50W.
Unfortunately X86 is years behind in this regards and also in single core performance.

No, x265 at 4K with x265 at slow preset don't saturate 32 threads and by far. And saturate all thread mean 100% CPU charge at power limit during all encoding time.

If TPU codec benchmark saturate all CPU, the difference between 5900X, 5950X, 9900X and 9950X could be equivalent to blender benchmark, for exemple. And it's not the case.

OvejaNegra
31st January 2025, 02:36
hi to all!
it's been a long time since i used x265, i had to change the megui version and of course i updated x265 to latest version.

i jumped from version 3.x to 4.1

Since i still have some old habits, i want to know if things have changed so i can update my presets. I have some
presets with some options that i tweaked for encoding speed VS qualiy using advices from other users.

As i said, it's been some time since i used x265 and maybe some old habits must change too.

Sorry in advance for my english, is not my native language.

Here are my doubts:

1 - I'm using rd 4 because rd 6 gave me some artifacts on sharp edges, i remember someone saying that 4 was safe for real life
content and 6 was better for animation (no artifacts with animation)

2 - I still use --aq-mode 1 --aq-strength xx because the other modes were a little inconsistent for me, any changes? (for real life
and animation)

3 - I'm using --max-merge 2, other modes gave me problem with edges on dark scenes (worst on animation).

4 - --rskip 2 --rskip-edge-threshold X is still recommended?

5 - --limit-refs 3 with more refs (5) is still adviced VS --limit-refs 0 and less refs (taking in account encoding time and
benefits of the reference frames)

6 - i'm using --ctu 32 because a bug with default value (on 4k content if i remember correctly) is it fixed? Should i use
64 for everything or just for 4k content (and 32 for the rest).

7 - I'm using --bframes 4 for everything, should i use 2 for live action and 4 for animation?

8 - I'm using --tu-intra-depth 3 --tu-inter-depth 3 always (live action and animation). It's ok? Should i use it only if i'm using

rect and amp?

9 - Is --limit-tu X advised with --tu-intra-depth 3 --tu-inter-depth 3 ?

10 - If i have time, i use --rect for real life content and --rect --amp for animation, is that ok?

11 - --limit-modes is still advised to use with --rect --amp if speed is required (vs not using rect amp at all)

12 - I'm using --rc-lookahead 250 or as big as i can (for abr and crf), is still advised?

13 - I always use --weightp --weightb --b-intra

14 - I use --psy-rd 2.0 --psy-rdoq 0.0 --rdoq-level 0 if it looks ok and --psy-rd 2.0 --psy-rdoq 2.0 --rdoq-level 2
if i need to retain more detail, anything new on that?

15 - Has anyone tested --tskip with animated content? Does it really helps? Should i use --tskip-fast?

16 - For real life content, i don't like / use sao, but i remember it being usefull for animation with low bitrate, any
change on that?

17 - What's the utility of --limit-sao and --sao-non-deblock? (more speed? some kind of intelligent mode? less strong effect?)

18 - --me 4 (star) is adviced? (for real life and animation)

19 - I use --subme 3 for everything ¿Should i use another mode? Should i use a different mode for animation?

20 - Any other advice? Something new that i should use / not use?

Thanks!!

Z2697
31st January 2025, 10:49
The problems you mentioned in 1, 3 and 6 feels unrealistic, can you elaborate?
While the "core encoding functions" haven't been changed for years, and you are pretty much will be getting exact same results (I have compared 3.5 to 4.1), I have some advise... or just my 2 cents.

5. limit-refs makes very little difference (~1% speed and ~0.05% bd-rate). ref isn't that big of a deal for regular contents neither.

6. ctu (64 or 32) isn't very impactful to the quality. many other things are actually 32x32 max in HEVC.

7. just don't use too many b frames, they will slow down the encoding with b-adapt and has no real benefit.

8. tu-depth is not "tied" to rect and amp.

9. limit-tu is a good trade-off, but maybe lower the depths to 2 is a better trade-off.

10. they are ok and will improve a little bit of quality. with tu-depths already increased it'll be tiny little bit.

15. tskip only works on 4x4 intra TUs. it's not gonna magically make anything look better. the target situation of its design is esoteric and not even work very well.

18. me 4 is not star.

OvejaNegra
31st January 2025, 13:04
The problems you mentioned in 1, 3 and 6 feels unrealistic, can you elaborate?
While the "core encoding functions" haven't been changed for years, and you are pretty much will be getting exact same results (I have compared 3.5 to 4.1), I have some advise... or just my 2 cents.

5. limit-refs makes very little difference (~1% speed and ~0.05% bd-rate). ref isn't that big of a deal for regular contents neither.

6. ctu (64 or 32) isn't very impactful to the quality. many other things are actually 32x32 max in HEVC.

7. just don't use too many b frames, they will slow down the encoding with b-adapt and has no real benefit.

8. tu-depth is not "tied" to rect and amp.

9. limit-tu is a good trade-off, but maybe lower the depths to 2 is a better trade-off.

10. they are ok and will improve a little bit of quality. with tu-depths already increased it'll be tiny little bit.

15. tskip only works on 4x4 intra TUs. it's not gonna magically make anything look better. the target situation of its design is esoteric and not even work very well.

18. me 4 is not star.


I'ts a little hard for me to find the information, as i said, i saw the artifacts, did some research, changed the values and
it was ok (for me):


https://forum.doom9.org/showthread.php?p=1892787#post1892787


"I don't go higher than max-merge 3. I've seen 4 and 5 produce worse results than 3 during fast motion in animated content. I don't know why."


Increasing max merge, gave something like edge ghosting (remember old LCDs?) on dark edges on dark background if i remember correctly


https://forum.doom9.org/showthread.php?p=1963393#post1963393

"CTU 64 will make a mess out of noisy flat backgrounds compared to CTU 32 (qg-size 32 in both cases since x265 doesn't use 64 by default for CTU 64)"


https://forum.doom9.org/showthread.php?p=1963328#post1963328


https://forum.doom9.org/showthread.php?p=1963352#post1963352



5 - Should i stay with limit-refs 3? (faster)

7 - 4 Looked like a balanced default to me, maybe 2 is better for real life content, but that's just speculation

9 - like: --tu-intra-depth 2 --tu-inter-depth 2 --limit-tu 0 instead of --tu-intra-depth 3 --tu-inter-depth 3 --limit-tu 4

10 - So, for general content and animation, is a better investiment --tu-intra-depth 2 --tu-inter-depth 2 and forget about rect amp?

15 - I remember members commenting about the possible bennefits of tskip for animation, maybe someone has some actual experience with it.

18 - Sorry: --me 3

thanks!

Z2697
31st January 2025, 18:56
I'ts a little hard for me to find the information, as i said, i saw the artifacts, did some research, changed the values and
it was ok (for me):


https://forum.doom9.org/showthread.php?p=1892787#post1892787





Increasing max merge, gave something like edge ghosting (remember old LCDs?) on dark edges on dark background if i remember correctly


https://forum.doom9.org/showthread.php?p=1963393#post1963393




https://forum.doom9.org/showthread.php?p=1963328#post1963328


https://forum.doom9.org/showthread.php?p=1963352#post1963352



5 - Should i stay with limit-refs 3? (faster)

7 - 4 Looked like a balanced default to me, maybe 2 is better for real life content, but that's just speculation

9 - like: --tu-intra-depth 2 --tu-inter-depth 2 --limit-tu 0 instead of --tu-intra-depth 3 --tu-inter-depth 3 --limit-tu 4

10 - So, for general content and animation, is a better investiment --tu-intra-depth 2 --tu-inter-depth 2 and forget about rect amp?

15 - I remember members commenting about the possible bennefits of tskip for animation, maybe someone has some actual experience with it.

18 - Sorry: --me 3

thanks!

Personally speaking, I think they are experiencing a version of "pigeonhole principle" where some parts look better and some parts look worse and they are biased to their initial hypothesis.

5. it's only 1% faster. but more performance is always good when the downside is negligible right? I wouldn't say you should or should not because it's about trade-off, I think there's no definitive answer.

7. I don't know, I honestly can't tell the difference.

9. sometimes the limit-tu will decrease quality more than "less depth", but we are talking about maybe 1% difference in bd-rate here, and I have biases like everyone.

P.S. in my tests I saw: with tu-depths = 1, the inter frames will not use any 4x4 TUs (I don't know whether or not this is a bug in x265), intra frames still use 4x4 TUs.
starting from tu-depths = 2, inter frames will start to use 4x4 TUs, this is where the big (relatively) differences come from. larger values makes little difference.

10. if you think about it, rect and amp are like "cheap version" of Quadtree split right? so when you allow more Quadtree split, they become much less effective in improving compression while having roughly same computational cost. (less effective, but still effective, yes. it's just harder to justify the computational cost, you may get beffter result spending those computational resource on some other parameter, for example, HME)

P.S. this is an oversimplification, the TUs are not equal to PUs. (TU = Transform Unit, PU = Prediction Unit)

15. 4x4 intra TUs are rare. the odds of them being "chosen to skip" is beyond rare. I haven't seen it imporve anything in my test and I hypothesize it will not have any significant effect on regular contents.
(Most video sources you can get are already "transformed", what "transform skip" will save you from? for truly lossless cases, just go check how many 4x4 intra TUs are there and how many of them are transform skipped, with tools like YUView (https://github.com/IENT/YUView))

I mean, don't get me wrong, tskip works well in its designed use case: sharp, detailed, high-contrast and mostly static contents, like screen recording with mostly texts on simple / pure color background. I just don't think it's worth to enable for most regualr contents.

P.S. by "rare" I mean on average of all frames, if you specifically look Intra frames only, of course they are not that rare.

18. UMH and STAR are close in performance, both speed and quality. A cheap alternative is HEX with some large merange (like, hundreds, for HEX that won't slow down much). If you want something better than STAR 57 in many presets, go with HME.
(--hme --hme-search star,umh,star)
(star in second level will make encoding very slow for some mystic reason.)
(If you want something fast, use --hme-search hex,hex,hex which is still better than traditional ME)
(you can adjust each level's range as well, but default is good enough)

(HME, in combination with lookahead-slices will produce non-deterministic result. If you want your encode with exact same settings output identical bitstream, use --lookahead-slices 0 with HME)
(there are other things that will make encode non-deterministic, most common example is VBV)

OvejaNegra
1st February 2025, 10:36
Thanks!

I'll do some test and i'll what can i improve.

:thanks:

Z2697
9th February 2025, 07:12
When using more than 1 slices, x265 creates intra blocks in prediction frames out of nowhere.
Let's do a simple test:
ffmpeg -lavfi color=gray:s=hd1080:d=10,noise=allf=u:alls=50 -x265-params slices=2 2sli.265
With the given command line, x265 is allowed to encode a completely static video, static noise is applied to add variance to the image, it does not has to be noise, nor does it has to be completely static, you just need some "adequately" high frequency information, a flat color image does not trigger the bug.
The result file is absurdly larger than single slice result.
Viewing the result file with YUView reveals that the encode has a lot of intra blocks near slice borders.
https://images2.imgbox.com/d2/ab/taiMdVlu_o.jpg

Setting frame-threads=1 seems to mitigate this issue.

The false intra mode in otherwise inter-predictable blocks will make compression ratio a lot worse, if you are planning or forced to use more slices (e.g. UHD Bluray compatible encoding), make sure to use frame-threads=1, slices will provide parallelism silimar to frame-threads.

LigH
17th February 2025, 15:13
You should certainly report this in the x265 bugtacker. Can't guarantee that devs read here.

Z2697
17th February 2025, 17:33
That's very right... Email sent.

cubicibo
18th February 2025, 11:18
IIRC slices (or tiles) are independent and cannot reference each other's coding units. If you encode high frequency content, your file size will double with two slices, triple with three, and so on. No surprise here. But you are right that going from one to two threads should not make the situation significantly worse. I am more concerned by the presence of intras at the bottom of each slice... are the additional threads in each slice set up to the farthest point from the others and made independent too (cannot reference CU from the same slice made by the other thread)?

rwill
18th February 2025, 17:38
The false intra mode in otherwise inter-predictable blocks will make compression ratio a lot worse, if you are planning or forced to use more slices (e.g. UHD Bluray compatible encoding), make sure to use frame-threads=1, slices will provide parallelism silimar to frame-threads.

They seem to limit vertical MV on current picture slice boundaries. If someone can point out the need in the H.265 standard please do lol...

Z2697
18th February 2025, 20:50
They seem to limit vertical MV on current picture slice boundaries. If someone can point out the need in the H.265 standard please do lol...

The problem is not the "natural" limitations that come with slices, the fact that this issue even happens in completely static videos means it has nothing to do with those limitations. The blocks don't even need to be motion compensated. They are not moved, they don't need to have MV.

rwill
19th February 2025, 03:38
The problem is not the "natural" limitations that come with slices, the fact that this issue even happens in completely static videos means it has nothing to do with those limitations. The blocks don't even need to be motion compensated. They are not moved, they don't need to have MV.

Of course they need a MV, the 0,0 one for no movement. In x265 there is code like this in frameencoder.cpp:


// Initialize restrict on MV range in slices
tld.analysis.m_sliceMinY = -(int32_t)(rowInSlice * m_param->maxCUSize * 4) + 3 * 4;
tld.analysis.m_sliceMaxY = (int32_t)((endRowInSlicePlus1 - 1 - row) * (m_param->maxCUSize * 4) - 4 * 4);


and subsequent in search.cpp:

mvmin.y = X265_MAX(mvmin.y, m_sliceMinY);
mvmax.y = X265_MIN(mvmax.y, m_sliceMaxY);


So they are actively blocking MV.Y to be 0 for LCUs at slice borders. This means that a CU cannot rest with no movement but has to predict from within the slice. But within the slice is not the same noise so its most likely encoded Intra. Seems to come from their design decision to run Slices in Parallel when they should just run Frame parallel.. but I might be wrong on that one.

*edit*
And to be honest I do not know of any "natural" limitations coming with slices related to motion vectors.

Z2697
19th February 2025, 05:58
Well... the "single row slice" does not have unnecessary intra blocks

(How to encode "single row slice" high resolution video when the default max nal units per frame is 16? (16*64=1024))
(Who asked? (https://github.com/Mr-Z-2697/x265-experimental/commit/d5546ab87fe376dc00d733564db7a1df85ee740d))

rwill
19th February 2025, 06:34
Well... the "single row slice" does not have unnecessary intra blocks

Yes, because there is also code like this:


// Handle single row slice
if (tld.analysis.m_sliceMaxY < tld.analysis.m_sliceMinY)
tld.analysis.m_sliceMaxY = tld.analysis.m_sliceMinY = 0;


Does not change the multi row slice mechanics though. So, just use single row slices then. Problem solved.

Z2697
19th February 2025, 06:45
This might work

tld.analysis.m_sliceMinY = x265_min(0, -(int32_t)(rowInSlice * m_param->maxCUSize * 4) + 3 * 4);
tld.analysis.m_sliceMaxY = x265_max(0, (int32_t)((endRowInSlicePlus1 - 1 - row) * (m_param->maxCUSize * 4) - 4 * 4));


Second thought, let actually follow why there's 3 * 4 and 4 * 4 in the expressions.

Introduced in a8d7d3d (https://bitbucket.org/multicoreware/x265_git/commits/a8d7d3d37455c2bb944bca0311a2c269d5cee703) as 2*4 and 3*4, later changed to 3*4 and 4*4 in eaeeab5 (https://bitbucket.org/multicoreware/x265_git/commits/eaeeab570b66d943e2b1723776565bcee479b3ae)... but I don't understand why.
I tried to remove these 2 constants and the encoding seems pretty normal, maybe other structures changed in a way that these don't matter anymore?

rwill
19th February 2025, 08:38
Second thought, let actually follow why there's 3 * 4 and 4 * 4 in the expressions.


Looks like interpolation filter extends to me.

jpsdr
26th February 2025, 14:04
Hi.
Is, for the GCC compile options, the 7975WX a znver4 or znver5 ?
According the technical informations i've found i would say znver4, but i want to be sure.

RARY
6th March 2025, 06:28
https://forum.doom9.org/showthread.php?p=2013895#post2013895

13 - I always use --weightp --weightb --b-intra

it may not be efficient for all the videos.
Weightp/weightb is beneficial for sequences with lighting changes (e.g., fade effects), as it reduces bitrate while maintaining quality.

For static lighting conditions, the impact is minimal and not content dependent. Also there may be slight FPS reduction.


6 - i'm using --ctu 32 because a bug with default value (on 4k content if i remember correctly) is it fixed? Should i use
64 for everything or just for 4k content (and 32 for the rest).

I couldn't find any issues like that and may be it would be helpful to look on it, if you mention the complete cli combination


1. I'm using rd 4 because rd 6 gave me some artifacts on sharp edges, i remember someone saying that 4 was safe for real life
content and 6 was better for animation (no artifacts with animation)


Command Bitrate Global PSNR SSIM (dB) Elapsed Time FPS VMAF
-------------------------------------------------------------------------------------------------------
anime
Big_Buck_Bunny_1080P.yuv rd 4 28 1343.25 42.162 15.135 9.19 27.19 89.480158
Big_Buck_Bunny_1080P.yuv rd 6 28 1421.08 42.288 15.254 18.39 13.60 90.415959
sita_1920x1080_30.yuv rd 4 28 3153.66 41.489 18.519 35.10 28.49 97.693194
sita_1920x1080_30.yuv rd 6 28 3016.33 42.224 19.724 73.57 13.59 98.23908

real
crowd_run_1080p50.yuv rd 4 28 15702.50 32.190 9.313 24.44 20.46 83.335517
crowd_run_1080p50.yuv rd 6 28 16542.31 32.521 9.585 126.74 3.95 85.747593
Island_1920x1080_420p_8bit_24fps.y4m rd 4 28 3808.73 39.609 11.053 111.47 21.72 7.735233
Island_1920x1080_420p_8bit_24fps.y4m rd 6 28 3997.55 39.677 11.051 236.79 10.22 7.68822


In my tests, VMAF increased slight linearly from rd4 -> rd 6 for both real-life and anime content, and I did not notice any artifacts. If you can specify the conditions or settings where you observed artifacts, I can check further.




11 - --limit-modes is still advised to use with --rect --amp if speed is required (vs not using rect amp at all)


Using --limit-modes with --rect --amp achieves similar quality (PSNR, SSIM) and bitrate compared to using --rect --amp alone, but with a marginal FPS improvement

If speed is concerned, --no-limit-modes --no-rect --no-amp may be the choice, as It almost doubles the encoding speed compared to using --limit-modes --rect –amp at some cost.






Video | Bitrate (kb/s) | FPS | PSNR (dB) | SSIM (dB) | Time (s)
------------------------------------------------------------------------------------------------------

Netflix_Tango (4K, 60fps) - rect, amp | 6867.63 | 4.78 | 43.288 | 15.935 | 61.51
Netflix_Tango (4K, 60fps) - limit-modes | 6856.90 | 4.70 | 43.284 | 15.934 | 62.61
Netflix_Tango (4K, 60fps) - no-limit-modes | 6925.84 | 7.18 | 43.231 | 15.886 | 40.95



3 - I'm using --max-merge 2, other modes gave me problem with edges on dark scenes (worst on animation).

This may not be the case for any content type (anime, real, dark scenes, etc.). Based on my experiments with x265, the VMAF score increases linearly as the `--max-merge` value increases from 1 to 5.


Command Bitrate Global PSNR SSIM (dB) Elapsed Time FPS VMAF
-------------------------------------------------------------------------------------------------------
dark_high-contrast/firebelly-torches-dv.yuv
--max-merge 2 28 465.87 40.058 11.767 15.31 65.33 86.66916
--max-merge 3 28 463.38 40.073 11.779 15.45 64.74 86.698442
--max-merge 4 28 461.17 40.094 11.792 15.46 64.67 86.721314
--max-merge 5 28 462.03 40.101 11.799 15.92 62.80 86.712682

dark-scene_high-contrast/gothism-dv.yuv
--max-merge 2 28 79.11 42.959 13.031 6.34 127.87 87.432972
--max-merge 3 28 78.72 42.977 13.044 6.30 128.70 87.478732
--max-merge 4 28 78.97 42.990 13.051 6.47 125.42 87.392775
--max-merge 5 28 79.18 42.986 13.054 6.30 128.80 87.386751

dark-scene_high-contrast/forest_jester-dv.yuv
--max-merge 2 28 1419.55 35.167 10.827 5.42 54.25 92.135595
--max-merge 3 28 1415.42 35.205 10.892 5.34 55.09 92.286436
--max-merge 4 28 1410.23 35.229 10.932 5.43 54.14 92.392379
--max-merge 5 28 1409.85 35.236 10.940 5.56 52.86 92.430227

Boulder
6th March 2025, 07:01
PSNR and SSIM are not very reliable metrics when it comes to perceptual quality. I'd rather use SSIMU2 which is a little closer, it is far from perfect like all metrics basically, but it works quite well when comparing parameters.

RARY
6th March 2025, 12:24
PSNR and SSIM are not very reliable metrics when it comes to perceptual quality. I'd rather use SSIMU2 which is a little closer, it is far from perfect like all metrics basically, but it works quite well when comparing parameters.

But the VMAF scores will generally reflect these degradations( visible compression artifacts, banding, or blurring,) better than PSNR or SSIM and but no artifacts were observed with the given settings.

Boulder
6th March 2025, 12:48
But the VMAF scores will generally reflect these degradations( visible compression artifacts, banding, or blurring,) better than PSNR or SSIM and but no artifacts were observed with the given settings.

VMAF has plenty of flaws itself.

excellentswordfight
6th March 2025, 16:03
Hi.
Is, for the GCC compile options, the 7975WX a znver4 or znver5 ?
According the technical informations i've found i would say znver4, but i want to be sure.
znver4, as 7975WX has zen 4 cores.

Z2697
6th March 2025, 17:45
VMAF has plenty of flaws itself.

That's absolutely true, we should be very careful with VMAF scores, just as with any other metrics.
But with a carefully controlled test, the VMAF scores have a pretty high reference value.

jpsdr
6th March 2025, 19:38
@excellentswordfight
Ok, thanks.

benwaggoner
6th March 2025, 23:29
That's absolutely true, we should be very careful with VMAF scores, just as with any other metrics.
But with a carefully controlled test, the VMAF scores have a pretty high reference value.
I find p1204.3, which combines full reference and bitstream analysis, to have the highest subjective correlation of currently available metrics. It includes chroma, even. VMAF is luma only, which causes a big blind spot, particularly with HDR content.

Of course, like VMAF and any other machine learning based metric, the quality of the results is highly dependent on training on the right ground truth data, and scores using different ML models will be different.

Stereodude
7th March 2025, 15:36
Is anyone making AVIsynth aware/capable builds of x265 anymore, or is there a trick to encode an .avs file without piping with the latest x265 builds Barough is posting?

jpsdr
7th March 2025, 19:23
@Stereodude
You can find what you want on my github.

Stereodude
7th March 2025, 21:55
@Stereodude
You can find what you want on my github.
Thanks, I see I have lots of choices. :sly:

Barough
8th March 2025, 00:03
x265 v4.1+110-0e0eee5
Built on March 07, 2025, GCC 14.2.0
Win32/64 / 8bit+10bit+12bit

https://bitbucket.org/multicoreware/x265_git/commits/branch/master

DL :
https://www.mediafire.com/file/aiuh2h9m2cafyhu

jpsdr
8th March 2025, 08:55
Thanks, I see I have lots of choices. :sly:
Things can change between users, but for me, LLVM is the fastest, and as i have a Broadwell, the Broadwell build. Don't have my big AMD yet, so can't tell about Zen4, but within 2 months...
But speed difference is not bigger than 2~3%.
After, one day someone said that gcc build produces better result than LLVM build...
I can't understand how it could be even possible :confused:
In my case i'm using LLVM and never noticed anything.

Boulder
8th March 2025, 09:13
With LLVM and Zen, at least with SVT-AV1 you need to be careful with the "march" setting. Znver3 seems to be broken and it bloats the executable and is slower than znver2 for a Zen3 CPU. I've also noticed that using march=native is not the best option for some reason.

Z2697
8th March 2025, 17:42
No matter what compiler and flags are used, the performance of x265 will not change a lot, unless you are using no-asm.
x265 has tons of assembly optimizations that's not gonna be affected by the compiler.
(fun fact: however, if you disable too many x86 flags in compiler options, the result x265 binary is broken, it will produce giant bitstreams regardless of the rate control settings)

But Clang does have somewhat broken Ryzen support and generic AVX-512 optimization in one or more recent version(s), I don't know if it's fixed in the latest release or not.
I shall repeat that it's related to the optimization of the "plain C/C++" code, not gonna affect x265's performance in assembly optimized routines.
(take https://github.com/L4cache/hmp3/releases/tag/5.2.4 as an example, it's asm optimization code is outdated and those builds were relying entirely on the compiler optimization, and the clang znver4 performance is strangely poor)

jpsdr
9th March 2025, 10:19
I'll have my gig in around 2 months, i'll make tests at this time. Thanks for theses informations.

chenm001
10th March 2025, 06:24
This might work

tld.analysis.m_sliceMinY = x265_min(0, -(int32_t)(rowInSlice * m_param->maxCUSize * 4) + 3 * 4);
tld.analysis.m_sliceMaxY = x265_max(0, (int32_t)((endRowInSlicePlus1 - 1 - row) * (m_param->maxCUSize * 4) - 4 * 4));


Second thought, let actually follow why there's 3 * 4 and 4 * 4 in the expressions.

Introduced in a8d7d3d (https://bitbucket.org/multicoreware/x265_git/commits/a8d7d3d37455c2bb944bca0311a2c269d5cee703) as 2*4 and 3*4, later changed to 3*4 and 4*4 in eaeeab5 (https://bitbucket.org/multicoreware/x265_git/commits/eaeeab570b66d943e2b1723776565bcee479b3ae)... but I don't understand why.
I tried to remove these 2 constants and the encoding seems pretty normal, maybe other structures changed in a way that these don't matter anymore?

Please allow me to explain
It is QPEL, so need *4, the interpolate is 8-taps, so (-3, 4), this is reason 3*4 and 4*4

Z2697
10th March 2025, 11:38
Please allow me to explain
It is QPEL, so need *4, the interpolate is 8-taps, so (-3, 4), this is reason 3*4 and 4*4

How important is this? Will it break some encoding if not applied, or is it just for the deterministic output?

If it's the latter, I'm afraid that it's not working (maybe due to some later changes), in tests I ran, the encoding options that was deterministic became non-deterministic after I added the slices option.
The option that can make it deterministic is frame-threads=1, and the MV limit isn't even applied when frame-threads==1.

Edit: https://files.catbox.moe/f3ompg.7z here's a build of that exact commit (eaeeab5), let's test how it was back then ;)

Update:
I just ran some quick test of the abovementioned old version against latest git version, and the old version was indeed "more consistent".
Some inconsistencies still occur in some sources, not listed in the hash result below. So it was never "actually works".
The same "unnecessary intra blocks" problem was present in the old version.
So this deterministic fix does lost it's effect along the way, maybe we can remove it for better compression, and for deterministic output, default frame-threads to 1 when slices > 1?
Like this: https://github.com/Mr-Z-2697/x265-experimental/tree/E-2025-02-19
SHA-1
98a2601eb9d1a5539c8af941497bf3ae80fd02ef *new_sli1_enc1.265
98a2601eb9d1a5539c8af941497bf3ae80fd02ef *new_sli1_enc2.265
7cebdd053631750cfe4999708da60aac1612f00b *new_sli2_enc1.265
40757f3f60097d1dc54513a59061bbe5d7022549 *new_sli2_enc2.265
bbc466300d7e310d716ff8ac2ca9b93b07fcca17 *old_sli1_enc1.265
bbc466300d7e310d716ff8ac2ca9b93b07fcca17 *old_sli1_enc2.265
e5ed62a2c42e804954242bdf870081990942bff5 *old_sli2_enc1.265
e5ed62a2c42e804954242bdf870081990942bff5 *old_sli2_enc2.265

rwill
10th March 2025, 17:39
How important is this? Will it break some encoding if not applied, or is it just for the deterministic output?

If it's the latter, I'm afraid that it's not working (maybe due to some later changes), in tests I ran, the encoding options that was deterministic became non-deterministic after I added the slices option.
The option that can make it deterministic is frame-threads=1, and the MV limit isn't even applied when frame-threads==1.

Well I don't know anything about the x265 threading model and its implications but you can check for encoder/decoder mismatches with x265's --recon or --hash 1. You need a decoder that writes out unmodified raw .yuv or supports picture hashing then.

benwaggoner
10th March 2025, 18:39
Well I don't know anything about the x265 threading model...
For the curious: https://x265.readthedocs.io/en/master/threading.html

benwaggoner
10th March 2025, 18:42
No matter what compiler and flags are used, the performance of x265 will not change a lot, unless you are using no-asm.
There were some tests here some years back showing profile-driven optimizations also helped a bit for the workload profiled. But as you say, the more asm, the less compiler optimizations matter.

rwill
10th March 2025, 20:56
For the curious: https://x265.readthedocs.io/en/master/threading.html

Sadly this page does not mention slices at all...

Hellboy.
10th March 2025, 23:48
For the curious: https://x265.readthedocs.io/en/master/threading.html

I have no idea how x265 works, but reading this it seems that
--frame-threads=1 is better ?

Z2697
11th March 2025, 12:03
I have no idea how x265 works, but reading this it seems that
--frame-threads=1 is better ?

Yes, you can also disable thread pools for even better quality!
(and watch the fps go down but let's not talk about that for now:cool:)

disable frame-threads you get 2% lower bd-rate.
disable thread pools you get 0.5% or less improvement (on top of the previous 2%).

benwaggoner
11th March 2025, 22:29
I have no idea how x265 works, but reading this it seems that
--frame-threads=1 is better ?
Yes. It used to make a big difference. The gain became smaller over time; I've not tested how much for a couple of years now.

If you're encoding multiple streams on the same hardware simultaneously, you get better net throughput by turning off frame threading and, if the encodes/cores ratio is high enough, reducing threading in general.

RARY
14th March 2025, 02:48
PSNR and SSIM are not very reliable metrics when it comes to perceptual quality. I'd rather use SSIMU2 which is a little closer, it is far from perfect like all metrics basically, but it works quite well when comparing parameters.

Any reference implementations for SSIMU2 ?

Boulder
14th March 2025, 05:38
Any reference implementations for SSIMU2 ?

I don't know if there's any for Avisynth, but https://github.com/Line-fr/Vship is a good and fast one for Vapoursynth.

StvG
14th March 2025, 19:10
I don't know if there's any for Avisynth, but https://github.com/Line-fr/Vship is a good and fast one for Vapoursynth.

https://github.com/Asd-g/AviSynthPlus-ssimulacra (not GPU).
https://github.com/Asd-g/AviSynthPlus-Butteraugli (not GPU).