Log in

View Full Version : x265 HEVC Encoder


Pages : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 [174] 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197

vpupkind
20th October 2022, 15:49
...on consumer hardware.
Things are of course different in perfectly cooled down server rooms.

Correct. Temperatures skyrocket high and make the CPU throttle thus going down up to a point that it's no longer feasible and the speed gain is nullified.
This of course doesn't apply to server rooms where temperature and humidity is perfectly controlled and CPUs can keep a high enough clock under pressure like it happens with my encodes.



I actually ran this on a bunch of 18-core Sky Lake and Cascade Lake Gold server CPUs. The problem is the internal CPU throttling. We locked it in the P0 state (no throttling overall), but use of AVX-512 still automagically reduced the whole CPU's frequency. Cascade Lake reduced the impact to a subset of cores physically close to the one using AVX-512.

Ritsuka
20th October 2022, 16:56
So, I finally picked up a 16" Apple MacBook Pro with a M1 Pro processor today.

What's the best way to get a well-optimized x265 binary to run on this hardware?

Unfortunately I think you will have to compile it yourself, and maybe apply a couple of patches from https://github.com/HandBrake/HandBrake/tree/master/contrib/x265

qyot27
20th October 2022, 17:32
So, I finally picked up a 16" Apple MacBook Pro with a M1 Pro processor today.

What's the best way to get a well-optimized x265 binary to run on this hardware?
I would have said 'MacPorts or Homebrew', except that it seems neither of them apply the NEON acceleration patch for x265 that Apple submitted to Handbrake. And I don't know if the contents of said patch would otherwise be superseded by whatever NEON stuff has been committed upsteam (if any has). I've not tried the patch on my M1 Mac Mini, so I can't say what the difference in performance is (I also don't remember if I bothered building x265 there, either).

There is this build script that handles several things - including the Apple patch - for building FFmpeg: https://github.com/Vargol/ffmpeg-apple-arm64-build. As an aside, the 'avisynth' branch on that repo confuses me, because it's up-to-date with the master branch and has no additional changes, and the master branch doesn't have it enabled, even though that's fully possible.

I actually ran this on a bunch of 18-core Sky Lake and Cascade Lake Gold server CPUs. The problem is the internal CPU throttling. We locked it in the P0 state (no throttling overall), but use of AVX-512 still automagically reduced the whole CPU's frequency. Cascade Lake reduced the impact to a subset of cores physically close to the one using AVX-512.
To expand on this,
https://en.wikipedia.org/wiki/AVX-512#Performance
https://en.wikipedia.org/wiki/Advanced_Vector_Extensions#Downclocking

AVX downclocking was present as actual modes in several generations, based on the width of the executed instructions. To wit, GCC and Clang prefer a vector width of 256 when using AVX-512, which would largely sidestep the issue. From the snippets I've read on the topic, this also seems to be the way Zen4 implements AVX-512 in hardware.

Skylake had three levels, Ice Lake had only two. But as of Rocket Lake, those explicit downclocking modes are gone. AVX-512 will not downclock on modern generations just because 512-wide vectors get used, but only because doing so may or may not hit standard thermal or power limits, same as any other intensive process.

benwaggoner
20th October 2022, 19:43
I would have said 'MacPorts or Homebrew', except that it seems neither of them apply the NEON acceleration patch for x265 that Apple submitted to Handbrake. And I don't know if the contents of said patch would otherwise be superseded by whatever NEON stuff has been committed upsteam (if any has). I've not tried the patch on my M1 Mac Mini, so I can't say what the difference in performance is (I also don't remember if I bothered building x265 there, either).

There is this build script that handles several things - including the Apple patch - for building FFmpeg: https://github.com/Vargol/ffmpeg-apple-arm64-build. As an aside, the 'avisynth' branch on that repo confuses me, because it's up-to-date with the master branch and has no additional changes, and the master branch doesn't have it enabled, even though that's fully possible.
And that worked just great on the first try! A lot easier than using autobuildsuite on Windows 10.

I'd still like to get a separate x265 binary not in ffmpeg so I can use identical syntax across platforms, but this is certainly enough for perf testing.

Ritsuka
20th October 2022, 20:12
x265 master branch already contains the Apple intrinsics patches, plus a lot of additional Neon optimizations provided by Amazon. No need to look for weird forks or branches. But there are still a couple of patches in the HandBrake repository that will make it run better.

LeXXuz
21st October 2022, 09:25
Pardon my ignorance, but why does AVX512 produce so much more heat than AVX2 mode in x265? :confused:

benwaggoner
22nd October 2022, 00:17
Wow, x265 got more commits nine hours ago than it got the rest of 2022!

https://bitbucket.org/multicoreware/x265_git/commits/

benwaggoner
22nd October 2022, 00:28
This is an interesting new command line added!

--[no-]mctf Enable GOP based temporal filter.

LigH
22nd October 2022, 15:53
Wow, x265 got more commits nine hours ago than it got the rest of 2022!

https://bitbucket.org/multicoreware/x265_git/commits/

I hope that fixed some of the issues I had to complain about. The last set of commits before that severely destroyed compilation or multilib linking in MSYS2 with GCC 12.2.
_

No, there is no fix yet; MABS does not build x265 anymore.

quietvoid
22nd October 2022, 16:54
The new SBRC patch forgot to free memory used from edge detection buffers, so it probably leaks when using --sbrc without AQ mode 4.
It's essentially the same as a 2 year old patch for auto-AQ varying by frame average brightness and edge density.

jpsdr
23rd October 2022, 19:09
@quietvoid
Can you point where in what file ?

I've created an x265 repo in my github with a branch with the avs support patchs.
No build yet, but will soon. A few month ago i've success building with VS, so, if nothing bad happened, still should be able to... :D

quietvoid
23rd October 2022, 19:51
@quietvoid
Can you point where in what file ?


The allocations here https://github.com/jpsdr/x265/blob/x265_mod/source/common/frame.cpp#L134
Have a different condition compared to the freeing here https://github.com/jpsdr/x265/blob/x265_mod/source/common/frame.cpp#L345

It's missing param->rc.frameSegment to free.

Also, you changed the upstream behaviour for sbrc without renaming the edge AQ fallback mode: https://github.com/jpsdr/x265/blob/x265_mod/source/encoder/slicetype.cpp#L1469
It's supposed to be AQ 1, not 5.

jpsdr
23rd October 2022, 20:30
Ok, one of the patches added a constant with the same name, i must admit that i was confused when trying to solve the conflict when i "fetched and rebased".
Push commit on my branch. VS build failed... :(
Have to test more...

ShortKatz
24th October 2022, 06:58
This is an interesting new command line added!

--[no-]mctf Enable GOP based temporal filter.

I've also have seen this. Could somebody point out to me, for which use cases this might be interesting? When to use this?

ShortKatz
24th October 2022, 07:00
It's essentially the same as a 2 year old patch for auto-AQ varying by frame average brightness and edge density.

I use this auto-AQ patches still with all of my own builds of HandBrake. What do you mean about "varying by frame average brightness and edge density"? Are those patches incomplete then?

LigH
24th October 2022, 10:06
The auto-AQ feature has different modes, and the patch is for one specific mode.

quietvoid
24th October 2022, 14:20
I use this auto-AQ patches still with all of my own builds of HandBrake. What do you mean about "varying by frame average brightness and edge density"? Are those patches incomplete then?

They just renamed the option to `--sbrc`, as far as I can tell the behaviour should be the same in both old and new patches.
So nothing is really incomplete. Instead of specifying both --aq-mode 5 and --auto-aq, you just do --sbrc now.

Now it analyses the brightness and edge density for every frame, and switches the AQ according to thresholds (bright frame = mode 2, otherwise 3. high edge density = mode 4, otherwise mode 1).

Boulder
24th October 2022, 14:51
Are there any proper tests with the old patch? For any higher bitrate encode, I've always found mode 1 best but an auto switching mode could be ideal. Though HDR encodes are quite dark compared to SDR, which will make things harder.

ShortKatz
24th October 2022, 17:49
They just renamed the option to `--sbrc`, as far as I can tell the behaviour should be the same in both old and new patches.
So nothing is really incomplete. Instead of specifying both --aq-mode 5 and --auto-aq, you just do --sbrc now.

Now it analyses the brightness and edge density for every frame, and switches the AQ according to thresholds (bright frame = mode 2, otherwise 3. high edge density = mode 4, otherwise mode 1).

OK. Now I understand. They implemented a variation of the auto-aq patch now as --sbrc. Thats fine, then I don't need to apply the auto-aq patches anymore.

But I don't understand why this is "1" shouldn't this be "5"? This was "5" in the old auto-aq patch.
https://bitbucket.org/multicoreware/x265_git/src/8584bc7bd99262b8bd926476c866840fe0d9428a/source/x265.h#lines-584

LeXXuz
24th October 2022, 18:28
They just renamed the option to `--sbrc`, as far as I can tell the behaviour should be the same in both old and new patches.
So nothing is really incomplete. Instead of specifying both --aq-mode 5 and --auto-aq, you just do --sbrc now.

Now it analyses the brightness and edge density for every frame, and switches the AQ according to thresholds (bright frame = mode 2, otherwise 3. high edge density = mode 4, otherwise mode 1).

Has anyone an actual compiled build with that patch for me to try? I'd like to test that feature on some problematic files.

quietvoid
24th October 2022, 18:30
OK. Now I understand. They implemented a variation of the auto-aq patch now as --sbrc. Thats fine, then I don't need to apply the auto-aq patches anymore.

But I don't understand why this is "1" shouldn't this be "5"? This was "5" in the old auto-aq patch.
https://bitbucket.org/multicoreware/x265_git/src/8584bc7bd99262b8bd926476c866840fe0d9428a/source/x265.h#lines-584

It was a separate mode in the old patch, now it falls back to AQ mode 1 when the edge density is too low.
There's no more mode 4 + dark bias, as mode 5 was. At last in the upstream code, other forks still have mode 5.

quietvoid
24th October 2022, 18:37
Ok, one of the patches added a constant with the same name, i must admit that i was confused when trying to solve the conflict when i "fetched and rebased".
Push commit on my branch. VS build failed... :(
Have to test more...

Ah, I just looked at your commit and it should be m_param->rc.frameSegment instead.

jpsdr
24th October 2022, 20:15
Ah.... I've been too fast... ;)
Changed, thanks.

:thanks:

benwaggoner
24th October 2022, 21:07
It was a separate mode in the old patch, now it falls back to AQ mode 1 when the edge density is too low.
There's no more mode 4 + dark bias, as mode 5 was. At last in the upstream code, other forks still have mode 5.
Dark bias should really be its own independent parameter. Ideally a strength parameter, not just on/off.

Probably different for SDR and HDR content as well. In HDR it should tweak --hdr10-opt tendency to remove bits from dark regions.

benwaggoner
24th October 2022, 21:13
Are there any proper tests with the old patch? For any higher bitrate encode, I've always found mode 1 best but an auto switching mode could be ideal. Though HDR encodes are quite dark compared to SDR, which will make things harder.
"Dark" in the sense that the median code value is a lot further from median nits? HDR obviously can get way brighter than SDR (PQ can go up to 10K nits, versus a nominal 100 nits of SDR).

Psychovisual optimizations based around luminance aren't very fungible between SDR and HDR for that reason. --aq-mode 4 wastes bits like crazy with HDR, for example, as visible steps are much, much closer together near dark than with SDR.

quietvoid
24th October 2022, 21:35
Dark bias should really be its own independent parameter. Ideally a strength parameter, not just on/off.Well good thing adjustable bias strength is already a thing in the x265 forks.
It was added along with aq-mode 5, and it's just a multiplier on top of the AQ strength.

ShortKatz
24th October 2022, 22:04
It was a separate mode in the old patch, now it falls back to AQ mode 1 when the edge density is too low.
There's no more mode 4 + dark bias, as mode 5 was. At last in the upstream code, other forks still have mode 5.
Thanks for pointing this out. I found this which adds back mode 5.
https://gist.github.com/noizuy/fbd25590a4cd2ca3a1c635d3a01f59cf
I will give this a try.

LigH
24th October 2022, 22:55
Has anyone an actual compiled build with that patch for me to try? I'd like to test that feature on some problematic files.

I can't build multilib executables anymore since the last 14 patches on Friday, Oct. 21 :rolleyes:

Boulder
25th October 2022, 06:31
"Dark" in the sense that the median code value is a lot further from median nits?

Based on what I expect the encoder to "see" when it analyzes the stream. Somehow I believe that it uses the raw flat image you see if you open the script in VDub2 without any grading. So the material appears darker and more flat than it actually is upon playback on an HDR capable display.

benwaggoner
25th October 2022, 16:27
Based on what I expect the encoder to "see" when it analyzes the stream. Somehow I believe that it uses the raw flat image you see if you open the script in VDub2 without any grading. So the material appears darker and more flat than it actually is upon playback on an HDR capable display.
Gotcha.

It's not like SDR is more "real" pixels than HDR, it's just been the primary EOTF (Electro-optical transfer function) for the first decades of digital video. But gamma is a psychovisual optimization every bit as much as the PQ curve, just with simpler math that isn't optimal for a broader dynamic range.

It's really hard to work with HDR without a 10-bit HDR display and HDR-enabled tools, because what you see is not what you get. Encoders that are tuned for SDR aren't optimal for HDR, and vise versa. Optimal SDR needs lower QPs near black as code value steps are much more visible at the low end than the high end in SDR. HDR is much more perceptually uniform, and benefits from different optimizations like --hdr10-opt which would make near-dark look really bad if applied to SDR.

Boulder
25th October 2022, 18:03
Gotcha.

It's not like SDR is more "real" pixels than HDR, it's just been the primary EOTF (Electro-optical transfer function) for the first decades of digital video. But gamma is a psychovisual optimization every bit as much as the PQ curve, just with simpler math that isn't optimal for a broader dynamic range.

It's really hard to work with HDR without a 10-bit HDR display and HDR-enabled tools, because what you see is not what you get. Encoders that are tuned for SDR aren't optimal for HDR, and vise versa. Optimal SDR needs lower QPs near black as code value steps are much more visible at the low end than the high end in SDR. HDR is much more perceptually uniform, and benefits from different optimizations like --hdr10-opt which would make near-dark look really bad if applied to SDR.

Yeah, it's sometimes difficult to grasp the idea that the PC monitor can fool you big time with HDR encodes..

My thinking was more or less around the fact that for example aq-mode 3 has not been recommended for HDR encodes because it will start allocating bits into all the wrong places. Now if this new automatic method uses all modes as it seems necessary, things could go quite wrong. A movie like Blade Runner 2049 might be a good test since it has both ends of the spectrum.

benwaggoner
25th October 2022, 21:45
My thinking was more or less around the fact that for example aq-mode 3 has not been recommended for HDR encodes because it will start allocating bits into all the wrong places. Now if this new automatic method uses all modes as it seems necessary, things could go quite wrong. A movie like Blade Runner 2049 might be a good test since it has both ends of the spectrum.
Gotcha.

Lots of algorithms assume perceptual uniformity across code values, so the difference between 16 and 17 is as important as between 234 and 235. And while that's not true for SDR, it is true for PQ, so I've seen non-SDR optimized algorithms that work better with PQ than SDR. However, if an algorithm has a low-luma bias for SDR, it will work quite a bit worse with HDR.

Hopefully the algorithm either ignores dark bias, or is adaptive based on the color space being encoded.

In the early years of HDR encoding, we fundamentally encoded everything as if it were SDR, avoiding any SDR-optimized tuning, and it worked okay. Later we got stuff like --hdr10-opt which specifically optimizes for HDR.

Barough
26th October 2022, 08:13
x265 v3.5+57
Built on October 26, 2022, GCC 12.2.0

https://bitbucket.org/multicoreware/x265_git/commits/branch/master


DL :
https://www.mediafire.com/file/9ohiddzw2yftl7w

LigH
26th October 2022, 10:47
MABS included a patch which should be commited soon in the x265 repo.

Boulder
26th October 2022, 12:52
Later we got stuff like --hdr10-opt which specifically optimizes for HDR.

I'm afraid this optimization is just adjusting QP offsets, for example this small snippet of a code. From what I gather, any luma level related aq-mode would be doing similar things on top or before that, which cannot be good practice.

if (param->bHDR10Opt)
{
uint32_t sum = lumaSumCu(curFrame, blockX, blockY, param->rc.qgSize);
uint32_t lumaAvg = sum / (loopIncr * loopIncr);
if (lumaAvg < 301)
qp_adj += 3;
else if (lumaAvg >= 301 && lumaAvg < 367)
qp_adj += 2;
else if (lumaAvg >= 367 && lumaAvg < 434)
qp_adj += 1;
else if (lumaAvg >= 501 && lumaAvg < 567)
qp_adj -= 1;
else if (lumaAvg >= 567 && lumaAvg < 634)
qp_adj -= 2;
else if (lumaAvg >= 634 && lumaAvg < 701)
qp_adj -= 3;
else if (lumaAvg >= 701 && lumaAvg < 767)
qp_adj -= 4;
else if (lumaAvg >= 767 && lumaAvg < 834)
qp_adj -= 5;
else if (lumaAvg >= 834)
qp_adj -= 6;
}

jpsdr
26th October 2022, 13:42
Build a release of current version with Patman mods.

https://github.com/jpsdr/x265/releases

Stereodude
26th October 2022, 16:28
Build a release of current version with Patman mods.

https://github.com/jpsdr/x265/releases
Thanks for the builds. Sorry if this is a dumb question, but what's the difference between the standard build and the AVX2 builds? The standard one shows it has AVX2 enabled.

jpsdr
26th October 2022, 17:02
Standard is build with no specific arch instruction (for the C code part), so it will run on "any" CPU. The AVX2 build is build specifying AVX2 arch CPU instruction (for the C code part), so, it can't run on CPU without at least AVX2.

LeXXuz
26th October 2022, 19:22
Build a release of current version with Patman mods.

https://github.com/jpsdr/x265/releases

Thanks a lot for this. :thanks:

benwaggoner
26th October 2022, 20:00
I'm afraid this optimization is just adjusting QP offsets, for example this small snippet of a code. From what I gather, any luma level related aq-mode would be doing similar things on top or before that, which cannot be good practice.
Yes, it is exactly doing QP offsets for psychovisual improvements. IIRC, it's based on an IEEE paper from the mid 2010s. It's a pretty basic implementation with room for further enhancements, certainly. It can wind up raising QPs in the darks a little too much.

jpsdr
27th October 2022, 16:56
As i've said, i have some concern about possible flickering issue with sbrc. One possible "fix" would be some kind of "fuzzy" logic, but it's not always possible, another, easier to put is to have an hysteresis.

In my mod branch, i've made a patch to allow the possibility of an hysteresis in sbrc decision. Limit/range of hysteresis is for now totaly "out of the blue", no idea what to put (relative like i've done, static...).
No build made yet, your thoughts about it : https://github.com/jpsdr/x265/commit/14ec6023f9e564f516687c1b0220b89f8e78f255

Also, your thoughts about another possible sbrc tweak : add (another...) switch to allow X265_AQ_EDGE : X265_AQ_EDGE_BIASED instead of X265_AQ_EDGE : X265_AQ_EDGE_BIASED_SBRC.

Waiting your feedback... :D

jpsdr
27th October 2022, 17:04
Just thought !!!
Can sbrc be ported back to x264 ??? I don't think this feature exist.
If i'm so over it, it's because i found the idea (with some minor adjustments :D) realy interesting !

ShortKatz
27th October 2022, 17:05
Is there something special with --mcstf ? I'm getting a crash if I try to use it with my custom build HandBrake.

Edit: OK, all-clear. That is just a bug in the current code.
This patch does fix the crash: https://mailman.videolan.org/pipermail/x265-devel/2022-October/013521.html

Edit2: I think I will never use --mcstf for whole movies. This option does increase the encoding time for me 7 times.

benwaggoner
27th October 2022, 22:57
Just thought !!!
Can sbrc be ported back to x264 ??? I don't think this feature exist.
If i'm so over it, it's because i found the idea (with some minor adjustments :D) realy interesting !
There is a TON of great x265 features that should be back ported to x264. I know MCW contributed a lot of patches from x265 back, but not a lot made it into main. I would so love to have CSV logging, for example. And I know that code was contributed to x265 at least seven years ago.

LigH
27th October 2022, 23:18
New upload: x265 3.5+57-4d2890fe4 (https://www.mediafire.com/file/0wegkfl8rhshflu/x265_3.5+57-4d2890fe4.7z/file)

[Windows][GCC 12.2.0][32/32XP/64 bit] 8bit+10bit+12bit

News since v3.5+39:
--[no-]sbrc Enables the segment based rate control, using its scene statistics. Default disabled

--[no-]mcstf Enable GOP based temporal filter. Default 0

LeXXuz
28th October 2022, 08:38
As i've said, i have some concern about possible flickering issue with sbrc. One possible "fix" would be some kind of "fuzzy" logic, but it's not always possible, another, easier to put is to have an hysteresis.


Haven't noticed anything so far. What kind of flickering would that be and where possible to look out for?

jpsdr
28th October 2022, 15:44
Possible situation...
If the calculated value is at the threshold value, check here (https://forum.doom9.org/showthread.php?p=1977277#post1977277).

jpsdr
28th October 2022, 18:11
I've also added the option to use AQ-MODE 5 instead of AQ-MODE 1 in sbrc, and... i've made a build, it's on my github.
So you can play with both experimental options i've added... ;)

LeXXuz
29th October 2022, 09:20
I've also added the option to use AQ-MODE 5 instead of AQ-MODE 1 in sbrc, and... i've made a build, it's on my github.
So you can play with both experimental options i've added... ;)

M'kay, sbrc was worth a shot. I've seen that flickering now. Unbearable.
Not sure if aq-mode 5 + sbrc would change anything on that.

I guess I will test a little more with just aq-mode 4 and 5 now. aq-mode 3 does improve faint details in very dark scenes slightly where aq-mode 2 fails, even with generous CRF values of 18 and below. The opening scene of Thor (2011) is one of these rare cases where x265 doesn't spend enough bits with aq-mode 1/2 for that scene and the result looks awful. Even CRF 16 with the Very Low preset +rskip and -sao didn't change anything on that.

But I think aq-mode 3 just wastes too much bits in general compared to the other aq-modes. Even clips with no dark scenes at all get much bigger than with mode 2, which I don't quite understand to be honest.

jpsdr
29th October 2022, 15:19
Sure flickering is because of sbrc ? I was just talking of a rare theoretical possible situation sbrc may produce...
But if it's because of that, the hysteresis option may help, it's for that case i've made it.