Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

Domains: forum.doom9.org / forum.doom9.net / forum.doom9.se

 

Go Back   Doom9's Forum > Video Encoding > High Efficiency Video Coding (HEVC)

Reply
 
Thread Tools Display Modes
Old 4th February 2026, 19:05   #9941  |  Link
Patman
Registered User
 
Patman's Avatar
 
Join Date: Jan 2015
Posts: 288
I found the discussion about the --rd-refine parameter of x265 and its effects here quite interesting and took a look at the code myself. I have provided three test bins here that you are welcome to try out. Maybe there is a version that works quite well with your parameters.
__________________
Tools for StaxRip | Github

Last edited by Patman; 4th February 2026 at 19:10.
Patman is offline   Reply With Quote
Old 4th February 2026, 21:20   #9942  |  Link
Z2697
Registered User
 
Join Date: Aug 2024
Location: Between my two ears
Posts: 1,003
Quote:
Originally Posted by Patman View Post
I found the discussion about the --rd-refine parameter of x265 and its effects here quite interesting and took a look at the code myself. I have provided three test bins here that you are welcome to try out. Maybe there is a version that works quite well with your parameters.
The test1 and test2 produce same result (still with glitches), and test3 produces result as if the rd-refine were not enabled.
Z2697 is offline   Reply With Quote
Old 5th February 2026, 14:57   #9943  |  Link
Patman
Registered User
 
Patman's Avatar
 
Join Date: Jan 2015
Posts: 288
Quote:
Originally Posted by Z2697 View Post
The test1 and test2 produce same result (still with glitches), and test3 produces result as if the rd-refine were not enabled.
Thank you for your feedback. I have uploaded new files, perhaps one of them will be (partially) successful. Same link as before...
__________________
Tools for StaxRip | Github
Patman is offline   Reply With Quote
Old 5th February 2026, 15:55   #9944  |  Link
Z2697
Registered User
 
Join Date: Aug 2024
Location: Between my two ears
Posts: 1,003
Quote:
Originally Posted by Patman View Post
Thank you for your feedback. I have uploaded new files, perhaps one of them will be (partially) successful. Same link as before...
What did you change with the overhaul?
Z2697 is offline   Reply With Quote
Old 5th February 2026, 16:01   #9945  |  Link
Patman
Registered User
 
Patman's Avatar
 
Join Date: Jan 2015
Posts: 288
Quote:
Originally Posted by Z2697 View Post
What did you change with the overhaul?
Code:
 void Analysis::qprdRefine(const CUData& parentCTU, const CUGeom& cuGeom, int32_t qp, int32_t lqp)
 {
-    uint32_t depth = cuGeom.depth;
-    ModeDepth& md = m_modeDepth[depth];
-    md.bestMode = NULL;
+    uint32_t  depth = cuGeom.depth;
+    ModeDepth &md   = m_modeDepth[depth];
+    md.bestMode     = NULL;
 
     bool bDecidedDepth = parentCTU.m_cuDepth[cuGeom.absPartIdx] == depth;
 
-    int bestCUQP = qp;
-    int lambdaQP = lqp;
+    int bestCUQP  = qp;
+    int lambdaQP  = lqp;
+
     bool doQPRefine = (bDecidedDepth && depth <= m_slice->m_pps->maxCuDQPDepth) || (!bDecidedDepth && depth == m_slice->m_pps->maxCuDQPDepth);
+
     if (m_param->analysisLoadReuseLevel >= 7)
         doQPRefine = false;
-    if (doQPRefine)
+
+    if (!doQPRefine)
+    {
+        recodeCU(parentCTU, cuGeom, bestCUQP, lambdaQP);
+        md.bestMode->cu.copyToPic(depth);
+        md.bestMode->reconYuv.copyToPicYuv(*m_frame->m_reconPic[0], parentCTU.m_cuAddr, cuGeom.absPartIdx);
+        return;
+    }
+
+    if (!md.bestMode || md.bestMode->cu.isSkipped(0) || cuGeom.log2CUSize <= 2)
     {
-        uint64_t bestCUCost, origCUCost, cuCost, cuPrevCost;
+        recodeCU(parentCTU, cuGeom, bestCUQP, lambdaQP);
+        md.bestMode->cu.copyToPic(depth);
+        md.bestMode->reconYuv.copyToPicYuv(*m_frame->m_reconPic[0], parentCTU.m_cuAddr, cuGeom.absPartIdx);
+        return;
+    }
 
-        int cuIdx = (cuGeom.childOffset - 1) / 3;
-        bestCUCost = origCUCost = cacheCost[cuIdx];
+    uint64_t bestCUCost, origCUCost, cuCost, cuPrevCost;
+
+    int cuIdx = (cuGeom.childOffset - 1) / 3;
+    bestCUCost = origCUCost = cacheCost[cuIdx];
 
-        int direction = m_param->bOptCUDeltaQP ? 1 : 2;
+    int direction = m_param->bOptCUDeltaQP ? 1 : 1;
 
-        for (int dir = direction; dir >= -direction; dir -= (direction * 2))
+    int maxDeltaQP = 2;
+
+    if ((m_limitTU & X265_TU_LIMIT_NEIGH) && cuGeom.log2CUSize >= 4)
+        maxDeltaQP = 1;
+
+    bool strongPsy = (m_param->psyRd > 0.5);
+
+    if (strongPsy && maxDeltaQP > 1)
+        maxDeltaQP = 1;
+
+    int minQP = X265_MAX(m_param->rc.qpMin, qp - maxDeltaQP);
+    int maxQP = X265_MIN(QP_MAX_SPEC,       qp + maxDeltaQP);
+
+    int prevQP = (int)parentCTU.m_meanQP;
+    int dirToPrev = (prevQP > qp) ? +1 : (prevQP < qp ? -1 : 0);
+
+    for (int dirSign = 0; dirSign < 2; dirSign++)
+    {
+        int dir = (dirSign == 0) ? direction : -direction;
+
+        if (m_param->bOptCUDeltaQP && dir != 1)
+            continue;
+
+        if (m_param->bOptCUDeltaQP && dir == 1 && (qp + 3) >= prevQP)
+            break;
+
+        int baseThresh = (maxDeltaQP <= 1) ? 0 : 1;
+        int threshold = baseThresh;
+        if (dirToPrev != 0 && dir == dirToPrev && maxDeltaQP > 1)
+            threshold++;
+
+        int failure  = 0;
+        cuPrevCost   = origCUCost;
+
+        int modCUQP = qp + dir;
+        while (modCUQP >= minQP && modCUQP <= maxQP)
         {
-            if (m_param->bOptCUDeltaQP && ((dir != 1) || ((qp + 3) >= (int32_t)parentCTU.m_meanQP)))
+            if (m_param->bOptCUDeltaQP && modCUQP > prevQP)
                 break;
 
-            int threshold = 1;
-            int failure = 0;
-            cuPrevCost = origCUCost;
+            recodeCU(parentCTU, cuGeom, modCUQP, qp);
+            cuCost = md.bestMode->rdCost;
 
-            int modCUQP = qp + dir;
-            while (modCUQP >= m_param->rc.qpMin && modCUQP <= QP_MAX_SPEC)
-            {
-                if (m_param->bOptCUDeltaQP && modCUQP > (int32_t)parentCTU.m_meanQP)
-                    break;
-
-                recodeCU(parentCTU, cuGeom, modCUQP, qp);
-                cuCost = md.bestMode->rdCost;
+            COPY2_IF_LT(bestCUCost, cuCost, bestCUQP, modCUQP);
 
-                COPY2_IF_LT(bestCUCost, cuCost, bestCUQP, modCUQP);
-                if (cuCost < cuPrevCost)
-                    failure = 0;
-                else if (cuCost > cuPrevCost)
-                    failure++;
+            if (cuCost < cuPrevCost)
+                failure = 0;
+            else
+                failure++;
 
-                if (failure > threshold)
-                    break;
+            if (failure > threshold)
+                break;
 
-                cuPrevCost = cuCost;
-                modCUQP += dir;
-            }
+            cuPrevCost = cuCost;
+            modCUQP   += dir;
         }
-        lambdaQP = bestCUQP;
     }
 
-    recodeCU(parentCTU, cuGeom, bestCUQP, lambdaQP);
+    lambdaQP = bestCUQP;
+    setLambdaFromQP(parentCTU, lambdaQP);
 
-    /* Copy best data to encData CTU and recon */
+    recodeCU(parentCTU, cuGeom, bestCUQP, lambdaQP);
     md.bestMode->cu.copyToPic(depth);
     md.bestMode->reconYuv.copyToPicYuv(*m_frame->m_reconPic[0], parentCTU.m_cuAddr, cuGeom.absPartIdx);
 }
__________________
Tools for StaxRip | Github
Patman is offline   Reply With Quote
Old 5th February 2026, 20:33   #9946  |  Link
Z2697
Registered User
 
Join Date: Aug 2024
Location: Between my two ears
Posts: 1,003
The most significant change being the QP adjustment range is limited to plus or minus 1 or 2, right?
Technically that would work, but then there's one less reason to use rd-refine.
Too narrow of a range, then the effect is reduced to virtually nothing, too wide of a range, then the potential of glitch is still present.

None of the rd-refine, original or with patch, improves BD rate in my test though. (except the one with less skip, but it's even slower)
Modifying QP on a per block basis without knowing basically anything else is not a good thing, it seems.

Limit-tu is not the problem, nor the fix.
Z2697 is offline   Reply With Quote
Old 6th February 2026, 09:39   #9947  |  Link
hellgauss
Registered User
 
hellgauss's Avatar
 
Join Date: Sep 2002
Location: Italy
Posts: 191
Test results for second set of @Patman builds.

Script used. Test A (30 encodes)

Quote:
set dir=patman
set bf=6
set subme=3

for %%b in (1,2,3) do (
for %%r in (rd-refine,no-rd-refine) do (
for %%j in (0,1,2,3,4) do (

rem b,r,j = build, rd-refine, limit-tu
rem build 1 = m_limit_tu
rem build 2 = overhaul
rem build 3 = overhaul + intrardrefine

ffmpeg.exe -loglevel level+error -i "Vinland-Saga_S1E16-rdr-scene.mkv" -map 0:v -vf "removegrain=1:2:2,removegrain=0:2:2,format=yuv420p10le" -sws_flags accurate_rnd -strict -1 -f yuv4mpegpipe - | %dir%\x265_%%b.exe --y4m -D 10 --input-res 1920x1080 --input-depth 10 --fps 24000/1001 --preset slower --crf 17.2 --%%r --limit-tu %%j --deblock=-1:-1 --no-sao --no-strong-intra-smoothing --rc-lookahead 99 --bframes %bf% --rd 6 --subme %subme% --vbv-maxrate 20000 --vbv-bufsize 24924 --frame-threads 1 --pools none --no-wpp --level-idc 51 --high-tier --colorprim 1 --colormatrix 1 --transfer 1 --range limited --input - --output output.hevc

mkvmerge.exe --deterministic 1 -o "Vinland-Saga_S1E16_ [build_%%b][%%r][ltu_%%j].mkv" -B -T -M output.hevc

)))
Test B (6 encodes, same as A but with preset medium and rd=6)
Quote:
set dir=patman
set bf=6
set subme=3

for %%b in (1,2,3) do (
for %%r in (rd-refine,no-rd-refine) do (

rem b,r = build, rd-refine
rem build 1 = m_limit_tu
rem build 2 = overhaul
rem build 3 = overhaul + intrardrefine

ffmpeg.exe -loglevel level+error -i "Vinland-Saga_S1E16-rdr-scene.mkv" -map 0:v -vf "removegrain=1:2:2,removegrain=0:2:2,format=yuv420p10le" -sws_flags accurate_rnd -strict -1 -f yuv4mpegpipe - | %dir%\x265_%%b.exe --y4m -D 10 --input-res 1920x1080 --input-depth 10 --fps 24000/1001 --crf 17.2 --%%r --rd 6 --deblock=-1:-1 --no-sao --no-strong-intra-smoothing --rc-lookahead 99 --bframes %bf% --rd 6 --subme %subme% --vbv-maxrate 20000 --vbv-bufsize 24924 --frame-threads 1 --pools none --no-wpp --level-idc 51 --high-tier --colorprim 1 --colormatrix 1 --transfer 1 --range limited --input - --output output.hevc

mkvmerge.exe --deterministic 1 -o "Vinland-Saga_S1E16_ [build_%%b][%%r].mkv" -B -T -M output.hevc

))
In both tests, build 2 and 3 are the same, they give bit-equals output. If rd-refine=0 outputs do not depend on build.

--------------

Artifacts:
evident in test A, build 1, rd-refine and limit-tu=0 or 1, less evident with limit-tu=2. No artifact (or I did not notice them) in other tests.
hellgauss is offline   Reply With Quote
Old 15th February 2026, 19:53   #9948  |  Link
Z2697
Registered User
 
Join Date: Aug 2024
Location: Between my two ears
Posts: 1,003
Is x265 SAO same as HM? This question was asked some where that I can't recall now.

I'd say it's based on HM but worse.
The structure does have the resemblance, but the core logic is different.
HM uses source pixels to calculate the real cost.
x265 use reconstructed pixels to estimate the cost: SAO NOP = 0 distortion + lambda * signal bits, and other SAO modes actually have "negative" distortion + lambda * some more bits.
So basically x265's SAO loses the "in-loop" attribute, and becomes some awkward game of balancing between 2 "random" things.
This could be an optmization choice... but at the cost of obliterating quality, is it really worth it? Is it effective? Because it still loops over all modes.
Perhaps I'll try bring real cost back and see how things will be.

Last edited by Z2697; 15th February 2026 at 20:05.
Z2697 is offline   Reply With Quote
Old 24th February 2026, 20:01   #9949  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 5,147
Quote:
Originally Posted by Z2697 View Post
Is x265 SAO same as HM? This question was asked some where that I can't recall now.

I'd say it's based on HM but worse.
The structure does have the resemblance, but the core logic is different.
HM uses source pixels to calculate the real cost.
x265 use reconstructed pixels to estimate the cost: SAO NOP = 0 distortion + lambda * signal bits, and other SAO modes actually have "negative" distortion + lambda * some more bits.
So basically x265's SAO loses the "in-loop" attribute, and becomes some awkward game of balancing between 2 "random" things.
This could be an optmization choice... but at the cost of obliterating quality, is it really worth it? Is it effective? Because it still loops over all modes.
Perhaps I'll try bring real cost back and see how things will be.
Yeah, give the source and reconstructed frames should both be in L3 cache at that point, I wouldn't think the overhead would be that much. It might make sense as a quality/perf flag. Like:

--sao-mode-full (for proposed behavior)
--sao-mode-fast (for existing behavior)
__________________
Ben Waggoner
Principal Video Specialist, Amazon Prime Video

My Compression Book
benwaggoner is offline   Reply With Quote
Old 26th February 2026, 06:54   #9950  |  Link
Z2697
Registered User
 
Join Date: Aug 2024
Location: Between my two ears
Posts: 1,003
Uh, I was wrong, the recon (pre-SAO) to source distortion is stored in a variable during initilization.
So I guess it's on par with HM, which I haven't really used.
This precalculated dist won't tell how the post-SAO distortion is compared to the pre-SAO distortion, but HM and x265 SAO look similar if I squint my eyes, I count that as they are doing the same.
HM passes the source YUV to the init function in the "main structure", while x265 accesses source YUV directly in the init function, that's where I stumbled.
Which means both are not good?

But this SAO-specific distortion calculation does result in negative value A LOT, even like it's the majority.
I'm still under the impression that this is what causes it to be so bad.

I don't know what I'm talking about at this point, really.
Quit.

Last edited by Z2697; 26th February 2026 at 07:33.
Z2697 is offline   Reply With Quote
Old 27th February 2026, 20:23   #9951  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 5,147
Quote:
Originally Posted by Z2697 View Post
Uh, I was wrong, the recon (pre-SAO) to source distortion is stored in a variable during initilization.
So I guess it's on par with HM, which I haven't really used.
This precalculated dist won't tell how the post-SAO distortion is compared to the pre-SAO distortion, but HM and x265 SAO look similar if I squint my eyes, I count that as they are doing the same.
HM passes the source YUV to the init function in the "main structure", while x265 accesses source YUV directly in the init function, that's where I stumbled.
Which means both are not good?
My recollection from, sheesh, 7-8 years ago is that x265's SAO is exactly HM's algorithmically. And that MCW hadn't been able to reproduce the SAO errors people were describing at the time, so didn't know how to address it.

Quote:
But this SAO-specific distortion calculation does result in negative value A LOT, even like it's the majority.
I'm still under the impression that this is what causes it to be so bad.
That I can't speak about. As long as it's all using signed values and not clamping at zero, I don't know that negative would be a problem. Negatives are used all over in encoding, frequency transform on up.
__________________
Ben Waggoner
Principal Video Specialist, Amazon Prime Video

My Compression Book
benwaggoner is offline   Reply With Quote
Old 28th February 2026, 18:44   #9952  |  Link
Z2697
Registered User
 
Join Date: Aug 2024
Location: Between my two ears
Posts: 1,003
Because the "baseline" is always positive.
I could be wrong (again) though.

Determin offset:
Code:
int64_t bestCost = calcSaoRdoCost(0, 1, lambda);
Determin SAO type:
Code:
    // RDO SAO_NA
    m_entropyCoder.load(m_rdContexts.temp);
    m_entropyCoder.resetBits();
    m_entropyCoder.codeSaoType(0);
    int64_t costPartBest = calcSaoRdoCost(0, m_entropyCoder.getNumberOfWrittenBits(), lambda[0]);
Z2697 is offline   Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 19:04.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2026, vBulletin Solutions Inc.