View Full Version : x265 HEVC Encoder
LunaRabbit
16th December 2024, 13:17
The parameter should still be --aq-auto x. Nothing has changed there, so the same parameters will apply (10 for SDR, 6 for HDR if you want to enable all the tweaks it has).
I must have cloned the wrong repo or something. As I couldn't get --aq-auto to work no matter what I did. I'll try again later today thanks.
Do you happen to know anything about the --frame-rc stuff I asked about and if it can be controlled from a qpfile? I'm having the same issue as the person above. The release notes are very sparse about how these new features work. I'll see if I can find an archive of the mailing list in the mean time.
Boulder
16th December 2024, 13:24
I must have cloned the wrong repo or something. As I couldn't get --aq-auto to work no matter what I did. I'll try again later today thanks.
Do you happen to know anything about the --frame-rc stuff I asked about and if it can be controlled from a qpfile? I'm having the same issue as the person above. The release notes are very sparse about how these new features work. I'll see if I can find an archive of the mailing list in the mean time.
jpsdr's repo default is 'master' but the mods are in the other branch 'x265_mod'.
Unfortunately I don't know anything about the --frame-rc parameter. Nothing in the online CLI docs either, which is not unexpected as documentation seems to be irrelevant :(
LunaRabbit
16th December 2024, 13:47
jpsdr's repo default is 'master' but the mods are in the other branch 'x265_mod'.
Unfortunately I don't know anything about the --frame-rc parameter. Nothing in the online CLI docs either, which is not unexpected as documentation seems to be irrelevant :(
Hey thanks for your help again. I was probably just being dumb and probably built the wrong source directory on my local system. I'm sure I'll figure it out when I have more time. I checked again and my last build doesn't have --aq-auto at all. So it's either an unmodded version (I think it's patman's) or I messed up and used the official repo.
If anyone is curious I tried patching the .diff you posting a few pages back against v4.1 from master. As expected, so much changed it can't apply automatically anymore. But I plan to go through the source code later today to see if I can get it working. Here is the output from patch;
patching file README.md
patching file build/MSYS_jpsdr/Build_Win32.sh
patching file build/MSYS_jpsdr/Build_Win32_AVX.sh
patching file build/MSYS_jpsdr/Build_Win32_AVX2.sh
patching file build/MSYS_jpsdr/Build_Win32_Broadwell.sh
patching file build/MSYS_jpsdr/Build_Win32_x86.sh
patching file build/MSYS_jpsdr/Build_mcf_10_Broadwell.sh
patching file build/VS2019_LLVM/Command_Line.txt
patching file build/VS2019_LLVM/Create_Solutions_x64.bat
patching file build/VS2019_LLVM/Create_Solutions_x86.bat
patching file doc/reST/cli.rst
patching file source/CMakeLists.txt
patching file source/abrEncApp.cpp
Hunk #2 NOT MERGED at 1173-1178.
patching file source/avisynth/avisynth.h
patching file source/avisynth/avisynth_c.h
patching file source/avisynth/avs/alignment.h
patching file source/avisynth/avs/arch.h
patching file source/avisynth/avs/capi.h
patching file source/avisynth/avs/config.h
patching file source/avisynth/avs/cpuid.h
patching file source/avisynth/avs/filesystem.h
patching file source/avisynth/avs/minmax.h
patching file source/avisynth/avs/posix.h
patching file source/avisynth/avs/types.h
patching file source/avisynth/avs/version.h
patching file source/avisynth/avs/win.h
patching file source/cmake/Version.cmake
patching file source/common/CMakeLists.txt
patching file source/common/common.cpp
patching file source/common/common.h
patching file source/common/event.cpp
patching file source/common/event.h
patching file source/common/frame.cpp
patching file source/common/frame.h
patching file source/common/lowres.cpp
patching file source/common/lowres.h
patching file source/common/param.cpp
Hunk #1 merged at 154-155.
Hunk #18 NOT MERGED at 1378-1392.
Hunk #27 NOT MERGED at 2449-2455.
Hunk #28 NOT MERGED at 2478-2488.
Hunk #29 NOT MERGED at 2530-2544,2546-2553.
Hunk #30 merged at 2607.
Hunk #31 NOT MERGED at 2638-2646.
Hunk #35 merged at 2907-2908.
Hunk #37 merged at 3003-3005.
patching file source/common/param.h
patching file source/common/threading.h
patching file source/common/x86/asm-primitives.cpp
patching file source/common/x86/loopfilter.asm
Hunk #1 already applied at 3870.
patching file source/encoder/api.cpp
Hunk #1 merged at 120.
patching file source/encoder/encoder.cpp
Hunk #1 merged at 292, NOT MERGED at 298-302.
Hunk #4 NOT MERGED at 2169-2184,2186-2197, merged at 2199-2202, merged at 2205, merged at 2210.
Hunk #5 NOT MERGED at 2536-2542.
Hunk #7 merged at 3748,3750-3751.
Hunk #11 NOT MERGED at 4618-4625, merged at 4627.
patching file source/encoder/frameencoder.cpp
Hunk #1 merged at 467.
patching file source/encoder/ratecontrol.cpp
patching file source/encoder/ratecontrol.h
patching file source/encoder/rdcost.h
patching file source/encoder/search.cpp
patching file source/encoder/slicetype.cpp
patching file source/encoder/slicetype.h
patching file source/input/avs.cpp
patching file source/input/avs.h
patching file source/input/input.cpp
Hunk #2 NOT MERGED at 41-53.
patching file source/input/input.h
patching file source/input/vpy.cpp
patching file source/input/vpy.h
patching file source/input/y4m.cpp
Hunk #1 NOT MERGED at 43-48.
patching file source/input/y4m.h
Hunk #1 NOT MERGED at 58-62.
patching file source/input/yuv.h
patching file source/output/gop.h
patching file source/output/gop_engine.hpp
patching file source/output/output.cpp
patching file source/vapoursynth/VSConstants4.h
patching file source/vapoursynth/VSHelper4.h
patching file source/vapoursynth/VSScript4.h
patching file source/vapoursynth/VapourSynth4.h
patching file source/x265.h
patching file source/x265cli.cpp
Hunk #4 NOT MERGED at 106-122.
Hunk #6 merged at 163-165.
Hunk #7 NOT MERGED at 306-328, merged at 330, merged at 332-336.
Hunk #9 merged at 565-572, NOT MERGED at 574-601, NOT MERGED at 604-611.
Hunk #11 NOT MERGED at 978-989.
Hunk #12 merged at 1112.
Hunk #13 NOT MERGED at 1201-1207.
patching file source/x265cli.h
Hunk #5 merged at 450.
Hunk #6 NOT MERGED at 505-511.
patching file source/x265res.manifest.in
So there aren't many lines that changed. I'm sure I can manually patch in what patch couldn't do automatically.
Boulder
16th December 2024, 13:57
Hey thanks for your help again. I was probably just being dumb and probably built the wrong source directory on my local system. I'm sure I'll figure it out when I have more time. I checked again and my last build doesn't have --aq-auto at all. So it's either an unmodded version (I think it's patman's) or I messed up and used the official repo.
If you are on Windows, jpsdr does offer pre-compiled binaries in GitHub.
Fixing the patch file can be a tedious job but just takes time to sit down and work through the errors one by one. The original changes are from so long time ago that the codebase has changed quite a lot since then.
Z2697
16th December 2024, 13:58
Here're 2 test results of "cutree-strength" I just ran.
I leave them as links because they are long images.
Please ignore the speed, I ran them in VM.
https://files.catbox.moe/4qugbj.png
https://files.catbox.moe/1tqlou.png
The curves from "corresponding" qcomp and "cutree-strength" values are almost completely aligned with each other, maybe not easy to see in CRF results since the final bitrate differs quite a lot, so I did a 2-pass test.
(qcomp 0.65 corresponds to cutree-strength 1.75 and qocmp 0.7 corresponds to cutree-strength 1.5)
Also keep in mind this is not a valid quality comparison across different "qcomp and cutree-strength pairs", the matrics usually "performs" poorly when it comes to "bits re-distribution". (my layman's understanding is: aq is bits re-distribution within frame, cutree/mbtree is bits re-distribution across frames)
Update:
I ran a CRF test with more data points but only 2 sets of data. Make the interpolated curve more accurate and the "alignment" easier to see.
And switched to a test build I did back in the day.
The points are from CRF 12 to 32.
https://files.catbox.moe/e5jpea.png
LunaRabbit
16th December 2024, 14:23
If you are on Windows, jpsdr does offer pre-compiled binaries in GitHub.
Yes I'm aware. I just prefer to build from source even for the Windows machine I still have kicking around. I'm used to is since I normally live in Emacs. :)
Whenever I get it working I'll update the .diff and maybe build some bins for anyone that wants to try it. Just trying to leave things nicer for the next person. I do appreciate having a diff to work from it's really helpful.
Here're 2 test results of "cutree-strength" I just ran.
I leave them as links because they are long images.
Please ignore the speed, I ran them in VM.
https://files.catbox.moe/4qugbj.png
https://files.catbox.moe/1tqlou.png
The curves from "corresponding" qcomp and "cutree-strength" values are almost completely aligned with each other, maybe not easy to see in CRF results since the final bitrate differs quite a lot, so I did a 2-pass test.
(qcomp 0.65 corresponds to cutree-strength 1.75 and qocmp 0.7 corresponds to cutree-strength 1.5)
Also keep in mind this is not a valid quality comparison across different "qcomp and cutree-strength pairs", the matrics usually "performs" poorly when it comes to "bits re-distribution". (my layman's understanding is: aq is bits re-distribution within frame, cutree/mbtree is bits re-distribution across frames)
Interesting thanks for testing it. Whenever I get it working I'll run some tests and see how it compares to my older settings with qcomp 0.7 and 0.8. Metrics are nice but all I really care about is if there is any gain that I can see in the material. I have a source I'm very familiar with that should serve as a good test for seeing if cutree-strength makes any difference. As I need to re-do the whole thing due to some improvements in the filters it requires as of late.
Z2697
16th December 2024, 14:25
frame-rc enables rate control mode (CRF, ABR or CQP) to be reconfigured per-frame, I think.
It enables the control, but how to actually use control... zones? qpfile? api calls? IDK.
Z2697
16th December 2024, 14:27
Yes I'm aware. I just prefer to build from source even for the Windows machine I still have kicking around. I'm used to is since I normally live in Emacs. :)
Whenever I get it working I'll update the .diff and maybe build some bins for anyone that wants to try it. Just trying to leave things nicer for the next person. I do appreciate having a diff to work from it's really helpful.
Interesting thanks for testing it. Whenever I get it working I'll run some tests and see how it compares to my older settings with qcomp 0.7 and 0.8. Metrics are nice but all I really care about is if there is any gain that I can see in the material. I have a source I'm very familiar with that should serve as a good test for seeing if cutree-strength makes any difference. As I need to re-do the whole thing due to some improvements in the filters it requires as of late.
But why not build it directly from the mod branch? Why "extract" the patch and apply to master branch then build? Seems unnecessary.
As for the metric I think it works OK to evaluate the similarity of the "corresponding qcomp and cutree-strength pairs", it's just not valid when comparing different pairs.
I used my eyes as well, of course. Just the summary report is easier to post.
LunaRabbit
16th December 2024, 14:56
frame-rc enables rate control mode (CRF, ABR or CQP) to be reconfigured per-frame, I think.
It enables the control, but how to actually use control... zones? qpfile? api calls? IDK.
That's what I'm wondering as well. If it can be enabled from the qpfile and just work. The docs do not mention anything about what kind of syntax is expected.
But why not build it directly from the mod branch? Why "extract" the patch and apply to master branch then build? Seems unnecessary.
As far as I'm aware there are no mod branches that have the cu-tree modifications working on v4.1 of x265. If you know of one please share.
I do appreciate the testing and providing the helpful chart. Didn't mean to imply otherwise. :)
Z2697
16th December 2024, 18:10
That's what I'm wondering as well. If it can be enabled from the qpfile and just work. The docs do not mention anything about what kind of syntax is expected.
As far as I'm aware there are no mod branches that have the cu-tree modifications working on v4.1 of x265. If you know of one please share.
I do appreciate the testing and providing the helpful chart. Didn't mean to imply otherwise. :)
I think it would be easier to cherry-pick the related commits of cutree-strength mod into jpsdr's x265_mod branch, the code behind it is way less.
jpsdr
16th December 2024, 20:00
About mbtree and x264.
Disabling mbtree is recommended when you target Blu-Ray, according mp3dom results and tests. But i think it's the only case (or of course, cases using very very similar encode parameters). The issue seems to be the small (1s) keyint value, not working well with mbtree. With standard [250] keyint value, mbtree is doing a proper job (still according mp3dom).
GeoffreyA
16th December 2024, 22:11
About mbtree and x264.
Disabling mbtree is recommended when you targer Blu-Ray, according mp3dom results and tests. But i think it's the only case (or of course, cases using very very similar encode parameters). The issue seems to be the small (1s) keyint value, not working well with mbtree. With standard [250] keyint value, mbtree is doing a proper job (still according mp3dom).
And DS's paper: https://archive.org/download/x264_mbtree/x264_mbtree.pdf
Z2697
25th December 2024, 05:57
Hmm, something doesn't feel right.
source/encoder/slicetype.cpp
void Lookahead::cuTree(Lowres **frames, int numframes, bool bIntra)
{
...
double totalDuration = 0.0;
for (int j = 0; j <= numframes; j++)
totalDuration += (double)m_param->fpsDenom / m_param->fpsNum;
double averageDuration = totalDuration / (numframes + 1);
I'm "investigating" (f* around and find out) the "inconsistency in bitstreams encoded by different compiler target flags produced binaries" which reportedly is associated with cutree.
Me, being a incompetent wannabe programmer, also kind of narrowed things down (or, rather, "confirmed") to cutree after, what, like 4 days? And is now moderately sure the quoted codes (in combination with some compiler / ISA optimization) caused the inconsistency.
(only averageDuration is used in the following codes)
As x265 itself does not have variable framerate awareness, this block code is unnecessary, I think, all these code will just result us back to "(double)m_param->fpsDenom / m_param->fpsNum", theoretically, if not the weirdness of FP math is happening, you know, the .1+.2 != .3 and non-associative stuff.
double averageDuration = (double) m_param->fpsDenom / m_param->fpsNum;
After replacing the loop with just this line, things seemed to be consistent. (But not consistent with "pre-this-modification" version)
Alternatively you can put "#pragma GCC novec" or other compiler's equivalent right before the for loop. (This retains the consistency with "pre-this-modification" "nocona" (default -march of GCC) / "SSE3" version) But this is compiler specific.
I presonally think the former "solution" is more elegant.
The "inconsistency between bitstreams encoded by different compilers produced binaries" still exists after this modification. (e.g. GCC vs Clang)
Some suspicious executables (10bit only)
https://files.catbox.moe/nhla9t.7z
Am I missing something? Please help me. plzzzzzzzz
rwill
25th December 2024, 07:26
One way to proceed would be to make a small repro case test application and then objdump -S that, checking what the differences in the generated code are.
GeoffreyA
25th December 2024, 07:48
Hmm, something doesn't feel right.
source/encoder/slicetype.cpp
void Lookahead::cuTree(Lowres **frames, int numframes, bool bIntra)
{
...
double totalDuration = 0.0;
for (int j = 0; j <= numframes; j++)
totalDuration += (double)m_param->fpsDenom / m_param->fpsNum;
double averageDuration = totalDuration / (numframes + 1);
I'm "investigating" (f* around and find out) the "inconsistency in bitstreams encoded by different compiler target flags produced binaries" which reportedly is associated with cutree.
Me, being a incompetent wannabe programmer, also kind of narrowed things down (or, rather, "confirmed") to cutree after, what, like 4 days? And is now moderately sure the quoted codes (in combination with some compiler / ISA optimization) caused the inconsistency.
(only averageDuration is used in the following codes)
As x265 itself does not have variable framerate awareness, this block code is unnecessary, I think, all these code will just result us back to "(double)m_param->fpsDenom / m_param->fpsNum", theoretically, if not the weirdness of FP math is happening, you know, the .1+.2 != .3 and non-associative stuff.
double averageDuration = (double) m_param->fpsDenom / m_param->fpsNum;
After replacing the loop with just this line, things seemed to be consistent. (But not consistent with "pre-this-modification" version)
The "inconsistency between bitstreams encoded by different compilers produced binaries" still exists after this modification. (e.g. GCC vs Clang)
Am I missing something? Please help me. plzzzzzzzz
You're right because all that loop does is multiply (Denom / Numer) by (numframes + 1), using repeated addition, and those values do not change within the loop. I wonder if there is some reason for such a superfluous piece of code. Perhaps some compiler issue back in the day, or the person was half asleep? :)
Z2697
25th December 2024, 08:35
You're right because all that loop does is multiply (Denom / Numer) by (numframes + 1), using repeated addition, and those values do not change within the loop. I wonder if there is some reason for such a superfluous piece of code. Perhaps some compiler issue back in the day, or the person was half asleep? :)
It looks like "they were planning on VFR support" to me, but eventually that didn't happen.
GeoffreyA
25th December 2024, 10:20
It looks like "they were planning on VFR support" to me, but eventually that didn't happen.
Looks like it.
higher
29th December 2024, 23:27
Yeah, past a certain resolution and frame threads, more cores will start winning over better cores. M4 Max likely wins for 1080p or lower, and possibly 4K if using only 1 frame thread.
Unfortunately the M4 Max is only in MacBook Pro, which is a whole lot more expensive and bigger for headless work. Mac Mini tops out with the M4 Pro currently.
The Mac version of Handbrake now includes x265 4.1 so I thought I make a little comparison between my desktop 5900X and 16" MBP with M4 Pro (on battery).
The 1 minute sample was cut from an UHD Blu-Ray (36th Precinct) and was encoded in 4K using the same version of Handbrake with identical settings (preset slow) on both platforms.
M4 Pro: 4m 20s
5900X: 5m 25s
Quite impresssive.
Sagittaire
29th December 2024, 23:48
The Mac version of Handbrake now includes x265 4.1 so I thought I make a little comparison between my desktop 5900X and 16" MBP with M4 Pro (on battery).
The 1 minute sample was cut from an UHD Blu-Ray (36th Precinct) and was encoded in 4K using the same version of Handbrake with identical settings (preset slow) on both platforms.
M4 Pro: 4m 20s
5900X: 5m 25s
Quite impresssive.
Well not really simply because 9950X is more 2X (at least) powerfull than 5900X. 9950X at stock will certainely produce something like ~2 m 30 s to encode this source with x265.
Ritsuka
30th December 2024, 08:56
Of course, but the 9950X has a 170W TDP, and the M4 Pro is what, 40 W at max, with 6 less performance cores than the 9950X.
And it seems all the latest arm64 optimizations are again stuck on the x265-devel mailing list.
higher
30th December 2024, 17:03
Well not really simply because 9950X is more 2X (at least) powerfull than 5900X. 9950X at stock will certainely produce something like ~2 m 30 s to encode this source with x265.
The 9950X is only 50% faster than 5900X at 4K resolution. I guess an M4 Max could almost match an 9950X while consuming a lot less power.
18790
Sagittaire
30th December 2024, 23:12
The 9950X is only 50% faster than 5900X at 4K resolution. I guess an M4 Max could almost match an 9950X while consuming a lot less power.
18790
Techpowerup benchmark like many other no codec specialist are not able to test correctly codec: If you want seriousely make codec benchmark, you don't use gui like handbrake and you use codec profil able to saturate correctly 16C/32T CPU.
I create codec benchmark for make that and 9950X at stock has 74% more performance than 5950X for x265. I don't test 5900X but 5950X have theoricaly 20% more perfomance than 5900X. In correct H265 benchmark (all CPU thread saturation) the 9950X will produce 110% more performance than 5900X.
When Techpowerup use correct CPU saturation benchmark like cinebench, blender or stockfish, you evaluate the correct CPU power
Stockfisk:
5900X: 14.52 Mips
9950X: 30.78 Mips (+111%)
Blender:
5900X: 114.9 s
9950X: 56 s (+105%)
V-Ray:
5900X: 21538
9950X: 48899 (+127%)
https://tpucdn.com/review/amd-ryzen-9-9950x/images/vray.png
ShortKatz
31st December 2024, 20:49
The Mac version of Handbrake now includes x265 4.1 so I thought I make a little comparison between my desktop 5900X and 16" MBP with M4 Pro (on battery).
The 1 minute sample was cut from an UHD Blu-Ray (36th Precinct) and was encoded in 4K using the same version of Handbrake with identical settings (preset slow) on both platforms.
M4 Pro: 4m 20s
5900X: 5m 25s
Quite impresssive.
I would be quite interested how much faster my M4 Max would be compared to the M4 Pro.
Barough
31st December 2024, 23:26
x265 v4.1+62-441e1e4
Built on December 31 2024, GCC 14.2.0
Win32/64 / 8bit+10bit+12bit
https://bitbucket.org/multicoreware/x265_git/commits/branch/master
DL :
https://www.mediafire.com/file/6nh9e7dfb72b3pi
excellentswordfight
2nd January 2025, 14:07
Techpowerup benchmark like many other no codec specialist are not able to test correctly codec: If you want seriousely make codec benchmark, you don't use gui like handbrake and you use codec profil able to saturate correctly 16C/32T CPU.
I create codec benchmark for make that and 9950X at stock has 74% more performance than 5950X for x265. I don't test 5900X but 5950X have theoricaly 20% more perfomance than 5900X. In correct H265 benchmark (all CPU thread saturation) the 9950X will produce 110% more performance than 5900X.
When Techpowerup use correct CPU saturation benchmark like cinebench, blender or stockfish, you evaluate the correct CPU power
Stockfisk:
5900X: 14.52 Mips
9950X: 30.78 Mips (+111%)
Blender:
5900X: 114.9 s
9950X: 56 s (+105%)
V-Ray:
5900X: 21538
9950X: 48899 (+127%)
These are two different methodologies, this is not a case of "correct" and "not correct" way of doing it. Single instance encoding is still a thing, and actually the most common case for most users, so benchmarking single instance is still very much relevant.
Most software does not have perfect parallelization scaling, and in most cases were that is the case, i.e. 3d-rendering and simulations etc, those loads usually gain more to be calculated on GPUs anyway. And although I think it makes perfect sense to test both cases here, cause you can just run two encodes at the same time even though you dont wanna start doing chunk-encoding to get "more" out of your CPU. Its not like we have a history of starting to run multiple parallel benchmark of a software cause we dont find the thread-scaling good enough when that the results dont see the full "potential" of the CPU, cause this argument can be made for most of them (audio encoding, compression, compiling etc).
Sagittaire
2nd January 2025, 20:46
These are two different methodologies, this is not a case of "correct" and "not correct" way of doing it. Single instance encoding is still a thing, and actually the most common case for most users, so benchmarking single instance is still very much relevant.
Most software does not have perfect parallelization scaling, and in most cases were that is the case, i.e. 3d-rendering and simulations etc, those loads usually gain more to be calculated on GPUs anyway. And although I think it makes perfect sense to test both cases here, cause you can just run two encodes at the same time even though you dont wanna start doing chunk-encoding to get "more" out of your CPU. Its not like we have a history of starting to run multiple parallel benchmark of a software cause we dont find the thread-scaling good enough when that the results dont see the full "potential" of the CPU, cause this argument can be made for most of them (audio encoding, compression, compiling etc).
Yes but time to encode wav to mp3 with lame is not really a problem.
Encoding video source can take several hours. And multipart or ABR Ladder to saturate CPU are simply well-known techniques in the professional world.
For exemple ABR Ladder is full option include directly in x265 codec.
Make multiscession encoding is option too and directly in handbrake.
Why buy a $600 CPU to do the fastest possible encoding in AOM AV1, when you can do it 4 times faster with a $200 CPU using the right encoding technique.
Z2697
3rd January 2025, 10:07
Yes but time to encode wav to mp3 with lame is not really a problem.
Encoding video source can take several hours. And multipart or ABR Ladder to saturate CPU are simply well-known techniques in the professional world.
For exemple ABR Ladder is full option include directly in x265 codec.
Make multiscession encoding is option too and directly in handbrake.
Why buy a $600 CPU to do the fastest possible encoding in AOM AV1, when you can do it 4 times faster with a $200 CPU using the right encoding technique.
In fact you can buy a whole "lowest end" mac mini m4 version with the money of a "just CPU" 9950X. (of course the performance is far away)
Since m4 pro only comes with severely overpriced memory, if you are planning on only use the CPU to do "work" (but why) that's just not worth it.
Or maybe it's the other way around, the basic model of mac mini m4 is underpriced? You know, like the razor and blades model?
I don't know, I don't own a mac.
(wait a minute, m4 pro has 2 variants? and they are very different errrr
it's a great cpu but apple is just confusing)
It's a great chip, but it doesn't come as just a chip, I just don't want to buy it this way. (and the "large scale" customers are likely don't want as well)
It seems like even a m4 in basic model mac mini has more transistors than 9950X (although with integrated memory and GPU), and with the best process node at the time, I'm not surprised that it's performant and efficient, and the best model (m4 max 12p+4e) can come close to 9950X with less power draw.
Physics works, how surprising.
I have to say this is very out of topic now.
Z2697
3rd January 2025, 17:12
Hell yeah let's just error out if input resolution exceeds 8192x4320
Barough
3rd January 2025, 22:25
x265 v4.1+78-5223ea7
Built on January 03 2025, GCC 14.2.0
Win32/64 / 8bit+10bit+12bit
https://bitbucket.org/multicoreware/x265_git/commits/branch/master
DL :
https://www.mediafire.com/file/86pd5zd03csrk67
tormento
12th January 2025, 17:50
Finally I have a working build with --frame-dup working and I'd like to play with it a bit, as I mostly encode animes.
What value of --dup-threshold should be ok? The default is 70 but it doesn't tell too much to me.
Is there a way to calculate the "difference" between two frames in a way similar to what x265 does?
FranceBB
12th January 2025, 20:11
Is there a way to calculate the "difference" between two frames in a way similar to what x265 does?
Well, in Avisynth I'd say:
YDifferenceFromPrevious()
UDifferenceFromPrevious()
VDifferenceFromPrevious()
and the corresponding
YDifferenceToNext()
UDifferenceToNext()
VDifferenceToNext()
tormento
12th January 2025, 20:14
Well, in Avisynth I'd say:
YDifferenceFromPrevious()
UDifferenceFromPrevious()
VDifferenceFromPrevious()
and the corresponding
YDifferenceToNext()
UDifferenceToNext()
VDifferenceToNext()
As far as I’ve read, it uses some sort of PSNR.
jpsdr
13th January 2025, 18:40
Finally I have a working build with --frame-dup working
You said that both mine and Patman's build didn't work.
Have you done some changes in the code to make it work ?
tormento
13th January 2025, 18:54
Have you done some changes in the code to make it work ?
I am talking about the latest Patman's build. I don't know what changes have been done.
benwaggoner
14th January 2025, 18:21
Finally I have a working build with --frame-dup working and I'd like to play with it a bit, as I mostly encode animes.
What value of --dup-threshold should be ok? The default is 70 but it doesn't tell too much to me.
Is there a way to calculate the "difference" between two frames in a way similar to what x265 does?
How did you get a working --frame-dup? Do you know what the fix was?
Since the setting hadn't been working until this, we've not had much experience with it. Experimentally, I'd play around with different values and look at how frames get classified in a bitstream analyzer (or just comparing the log file).
What you want to see is frames that are duplicated in the source are mostly set as duplicated frames in the bitstream, and that frames that aren't duplicates in the source are distinct frames in the output.
Anime tends to have more frames duplicates than not, and are pretty distinct between no dup and dup, so it's really the best case for this feature, and you can probably use a lot more aggressive settings than for other classes of content.
jpsdr
14th January 2025, 20:24
I take a look at the last Patman's commits, and didn't notice anything specific to a frame-dup fix. If it's fixed, it seems to be a side effect of something else (unless i missed something).
Z2697
15th January 2025, 05:28
How did you get a working --frame-dup? Do you know what the fix was?
Since the setting hadn't been working until this, we've not had much experience with it. Experimentally, I'd play around with different values and look at how frames get classified in a bitstream analyzer (or just comparing the log file).
What you want to see is frames that are duplicated in the source are mostly set as duplicated frames in the bitstream, and that frames that aren't duplicates in the source are distinct frames in the output.
Anime tends to have more frames duplicates than not, and are pretty distinct between no dup and dup, so it's really the best case for this feature, and you can probably use a lot more aggressive settings than for other classes of content.
"frame-dup" has been working all along, it's that it's not working as you'd expect from the name.
There're detailed explanations a few pages back in this thread.
I think the claims about how it's "not working" or "now working" by tormento is untrustworthy.
And what he actually means is that somehow on his computer the x265 crashed when enabling frame-dup, and is now not crashing, without any change to the code regadring frame-dup feature. Not how the feature itself is broken.
No offense. I mean the fact that the observation changed without any related interaction is not trustworthy. Not the guy (hopefully).
BTW, was there a x264 feature (or x264 mod) that does the similar thing? I think there was but I don't trust my memory.
tormento
15th January 2025, 08:54
Counter-order, guys.
It happens that I am not having --frame-dup working anymore. It's now crashing as the previous build did.
I am trying to reproduce the conditions under which it worked without errors, i.e. resolution and parameters.
Stay tuned but I fear it will be a long path, as it was a random test and I need to recall the conditions when it worked.
cubicibo
15th January 2025, 11:05
To have a better understanding of what this feature means: this feature removes frames based on PSNR thresholding, and signal picture timing SEIs to keep the correct... picture timing... yeah... which no commonly available decoder can recognize.
I don't question the looseness of the logic to detect dupe frames, but a decoder that does not implement Pic Timing SEI then relies solely on the decodied frame PTS, and the stream becomes VFR, with strictly identical output to the end user.
If a decoder does not support VFR, they should (shall?) support Pic Timing SEI. The decoded frame PTS timeline should appropriately reflect this. Gaps in the timeline should be filled by using the appropriate pic_struct entry to deliver CFR.
Even if it works as it should, the PSNR thresholding is not ideal to begin with, and the bits saved with removing near identical frames are, well, did you know picture timing SEIs cost bits?
It's not necessarily about bits, but better utilization of B and P frames. Also I am pretty sure 2K or 4K frame dupes are more costly than a few bytes.
EDIT: I dived in x265 codebase, and I think it does not set the frame PTS appropriately :mad:
higher
15th January 2025, 14:01
Techpowerup benchmark like many other no codec specialist are not able to test correctly codec: If you want seriousely make codec benchmark, you don't use gui like handbrake and you use codec profil able to saturate correctly 16C/32T CPU.
I create codec benchmark for make that and 9950X at stock has 74% more performance than 5950X for x265. I don't test 5900X but 5950X have theoricaly 20% more perfomance than 5900X. In correct H265 benchmark (all CPU thread saturation) the 9950X will produce 110% more performance than 5900X.
When Techpowerup use correct CPU saturation benchmark like cinebench, blender or stockfish, you evaluate the correct CPU power
It's likely that Zen 5 is not much of an advance in x265 encoding compared to other workloads.
TPU uses x265 with preset slow at 4K resolution. It fully saturates my 5900X and I'm guessing it fully saturates a 9900X as well. Yet, the 9900X is only 25% faster than 5900x while the 9950x is 27% faster than 5950x in he TPU benchmark and it sounds about right.
Power consumption is on a different level though. The 9900X consumes around 170W fully loaded while an M4 Pro consumes less than 50W.
Unfortunately X86 is years behind in this regards and also in single core performance.
Z2697
15th January 2025, 18:53
I don't question the looseness of the logic to detect dupe frames, but a decoder that does not implement Pic Timing SEI then relies solely on the decodied frame PTS, and the stream becomes VFR, with strictly identical output to the end user.
If a decoder does not support VFR, they should (shall?) support Pic Timing SEI. The decoded frame PTS timeline should appropriately reflect this. Gaps in the timeline should be filled by using the appropriate pic_struct entry to deliver CFR.
It's not necessarily about bits, but better utilization of B and P frames. Also I am pretty sure 2K or 4K frame dupes are more costly than a few bytes.
EDIT: I dived in x265 codebase, and I think it does not set the frame PTS appropriately :mad:
Does HEVC raw bitstream support VFR at all?
And as I already mentioned, x265 with container output mod or FFmpeg libx265 can get around with that VFR "hack".
As for the bits, x265 now signals that timing SEI for every frame, no matter it's duped (removed) or not.
It results in I think around 2.5kbps for 24fps video. How many duped frames should be removed to compensate that (average out), and how to decide the thresholding... well I just don't even bother.
But yeah, that will outweight the signaling overhead very soon.
Worst case it's just a few kbps, not really a big deal at all.
But since he is not seeing incorrect timing, the probability of that worst case happening is high.
cubicibo
15th January 2025, 19:46
VFR can be signaled both on the entire stream or at a CWS level. But specifying the actual picture output-presentation delay is overly complicated here.
Anyway, it does not matter for the current problem. pic_struct should be used with CFR. But frame entry time in decoder must be adapted with respect to the last pic struct instruction. I can't find any such code in x265, so VBV conformance must be way off.
Z2697
15th January 2025, 20:26
VFR can be signaled both on the entire stream or at a CWS level. But specifying the actual picture output-presentation delay is overly complicated here.
Anyway, it does not matter for the current problem. pic_struct should be used with CFR. But frame entry time in decoder must be adapted with respect to the last pic struct instruction. I can't find any such code in x265, so VBV conformance must be way off.
The timeing SEI we were talking contains and utilizes pic_struct, unless you are referring a different thing.
It shouldn't be necessary for "normal" CFR, only for "de-duped" CFR.
tormento
15th January 2025, 20:29
Well, it happens that I've found something about the --frame-dup crashing on my PC.
The very same video can be encoded when
1920×1080
1600×900
1280×720
960×540
but crashes miserably with
1440×810
1500×844
with error:
Video encoding returned exit code: -1073741819 (0xC0000005)
Any idea?
cubicibo
15th January 2025, 22:11
The timeing SEI we were talking contains and utilizes pic_struct, unless you are referring a different thing.
It shouldn't be necessary for "normal" CFR, only for "de-duped" CFR.
We're talking about the same thing, and I am telling you that x265 does not seem to make use of that field in the ratecontrol code. Frame following duplicated ones aren't shifted in time appropriately, buffer has less time to refill and hence fewer bits are allocated to the frames. More problematic, computed HRD fields are probably wrong.
Since there's a copy paste error on the CLI documentation for --pic-struct (claims to be needed for HLG), I will assume they never tested the feature or verified it was working correctly.
benwaggoner
16th January 2025, 00:29
It's likely that Zen 5 is not much of an advance in x265 encoding compared to other workloads.
It has full native AVX512 support, which could make using that flag improve performance more, and in more scenarios.
benwaggoner
16th January 2025, 00:30
VFR can be signaled both on the entire stream or at a CWS level. But specifying the actual picture output-presentation delay is overly complicated here.
Anyway, it does not matter for the current problem. pic_struct should be used with CFR. But frame entry time in decoder must be adapted with respect to the last pic struct instruction. I can't find any such code in x265, so VBV conformance must be way off.
Back in the VC-1 days we handled this by having the frame in the bitstream, containing just the frame_repeat tag. Didn't require any VFR, time shifts, etcetera.
Z2697
16th January 2025, 04:46
It has full native AVX512 support, which could make using that flag improve performance more, and in more scenarios.
If my test is anything to be believed, it's around 5% to 8% "free performance" (more or less), depending on the target quality somehow.
(5% with CRF-14, 6.3% with CRF-18, 6.6% with CRF-22 and 8% with CRF-26, the 4 quality targets I often use to draw RD curve (there's no difference in the RD curve in this case, of course))
tormento
16th January 2025, 16:09
Benchmarking with different builds, I've noticed that Patman86's x265-4.1+79+12-81640d428 ICC standard/AVX/AVX2 builds are identical but 1-2 bytes.
It seems a bit strange to me and I've opened a issue on GitHub.
Please be aware of that in the mean time.
benwaggoner
16th January 2025, 17:53
If my test is anything to be believed, it's around 5% to 8% "free performance" (more or less), depending on the target quality somehow.
(5% with CRF-14, 6.3% with CRF-18, 6.6% with CRF-22 and 8% with CRF-26, the 4 quality targets I often use to draw RD curve (there's no difference in the RD curve in this case, of course))
Hmm. Perhaps higher QP makes for fewer early exists so doing more in parallel helps? Can you share the command line? I imagine stuff like TU size could have different impact.
vBulletin® v3.8.11, Copyright ©2000-2025, vBulletin Solutions Inc.