Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. Domains: forum.doom9.org / forum.doom9.net / forum.doom9.se |
|
|
#9941 | Link |
|
Registered User
Join Date: Jan 2015
Posts: 288
|
I found the discussion about the --rd-refine parameter of x265 and its effects here quite interesting and took a look at the code myself. I have provided three test bins here that you are welcome to try out. Maybe there is a version that works quite well with your parameters.
Last edited by Patman; 4th February 2026 at 19:10. |
|
|
|
|
|
#9942 | Link | |
|
Registered User
Join Date: Aug 2024
Location: Between my two ears
Posts: 1,003
|
Quote:
|
|
|
|
|
|
|
#9945 | Link |
|
Registered User
Join Date: Jan 2015
Posts: 288
|
Code:
void Analysis::qprdRefine(const CUData& parentCTU, const CUGeom& cuGeom, int32_t qp, int32_t lqp)
{
- uint32_t depth = cuGeom.depth;
- ModeDepth& md = m_modeDepth[depth];
- md.bestMode = NULL;
+ uint32_t depth = cuGeom.depth;
+ ModeDepth &md = m_modeDepth[depth];
+ md.bestMode = NULL;
bool bDecidedDepth = parentCTU.m_cuDepth[cuGeom.absPartIdx] == depth;
- int bestCUQP = qp;
- int lambdaQP = lqp;
+ int bestCUQP = qp;
+ int lambdaQP = lqp;
+
bool doQPRefine = (bDecidedDepth && depth <= m_slice->m_pps->maxCuDQPDepth) || (!bDecidedDepth && depth == m_slice->m_pps->maxCuDQPDepth);
+
if (m_param->analysisLoadReuseLevel >= 7)
doQPRefine = false;
- if (doQPRefine)
+
+ if (!doQPRefine)
+ {
+ recodeCU(parentCTU, cuGeom, bestCUQP, lambdaQP);
+ md.bestMode->cu.copyToPic(depth);
+ md.bestMode->reconYuv.copyToPicYuv(*m_frame->m_reconPic[0], parentCTU.m_cuAddr, cuGeom.absPartIdx);
+ return;
+ }
+
+ if (!md.bestMode || md.bestMode->cu.isSkipped(0) || cuGeom.log2CUSize <= 2)
{
- uint64_t bestCUCost, origCUCost, cuCost, cuPrevCost;
+ recodeCU(parentCTU, cuGeom, bestCUQP, lambdaQP);
+ md.bestMode->cu.copyToPic(depth);
+ md.bestMode->reconYuv.copyToPicYuv(*m_frame->m_reconPic[0], parentCTU.m_cuAddr, cuGeom.absPartIdx);
+ return;
+ }
- int cuIdx = (cuGeom.childOffset - 1) / 3;
- bestCUCost = origCUCost = cacheCost[cuIdx];
+ uint64_t bestCUCost, origCUCost, cuCost, cuPrevCost;
+
+ int cuIdx = (cuGeom.childOffset - 1) / 3;
+ bestCUCost = origCUCost = cacheCost[cuIdx];
- int direction = m_param->bOptCUDeltaQP ? 1 : 2;
+ int direction = m_param->bOptCUDeltaQP ? 1 : 1;
- for (int dir = direction; dir >= -direction; dir -= (direction * 2))
+ int maxDeltaQP = 2;
+
+ if ((m_limitTU & X265_TU_LIMIT_NEIGH) && cuGeom.log2CUSize >= 4)
+ maxDeltaQP = 1;
+
+ bool strongPsy = (m_param->psyRd > 0.5);
+
+ if (strongPsy && maxDeltaQP > 1)
+ maxDeltaQP = 1;
+
+ int minQP = X265_MAX(m_param->rc.qpMin, qp - maxDeltaQP);
+ int maxQP = X265_MIN(QP_MAX_SPEC, qp + maxDeltaQP);
+
+ int prevQP = (int)parentCTU.m_meanQP;
+ int dirToPrev = (prevQP > qp) ? +1 : (prevQP < qp ? -1 : 0);
+
+ for (int dirSign = 0; dirSign < 2; dirSign++)
+ {
+ int dir = (dirSign == 0) ? direction : -direction;
+
+ if (m_param->bOptCUDeltaQP && dir != 1)
+ continue;
+
+ if (m_param->bOptCUDeltaQP && dir == 1 && (qp + 3) >= prevQP)
+ break;
+
+ int baseThresh = (maxDeltaQP <= 1) ? 0 : 1;
+ int threshold = baseThresh;
+ if (dirToPrev != 0 && dir == dirToPrev && maxDeltaQP > 1)
+ threshold++;
+
+ int failure = 0;
+ cuPrevCost = origCUCost;
+
+ int modCUQP = qp + dir;
+ while (modCUQP >= minQP && modCUQP <= maxQP)
{
- if (m_param->bOptCUDeltaQP && ((dir != 1) || ((qp + 3) >= (int32_t)parentCTU.m_meanQP)))
+ if (m_param->bOptCUDeltaQP && modCUQP > prevQP)
break;
- int threshold = 1;
- int failure = 0;
- cuPrevCost = origCUCost;
+ recodeCU(parentCTU, cuGeom, modCUQP, qp);
+ cuCost = md.bestMode->rdCost;
- int modCUQP = qp + dir;
- while (modCUQP >= m_param->rc.qpMin && modCUQP <= QP_MAX_SPEC)
- {
- if (m_param->bOptCUDeltaQP && modCUQP > (int32_t)parentCTU.m_meanQP)
- break;
-
- recodeCU(parentCTU, cuGeom, modCUQP, qp);
- cuCost = md.bestMode->rdCost;
+ COPY2_IF_LT(bestCUCost, cuCost, bestCUQP, modCUQP);
- COPY2_IF_LT(bestCUCost, cuCost, bestCUQP, modCUQP);
- if (cuCost < cuPrevCost)
- failure = 0;
- else if (cuCost > cuPrevCost)
- failure++;
+ if (cuCost < cuPrevCost)
+ failure = 0;
+ else
+ failure++;
- if (failure > threshold)
- break;
+ if (failure > threshold)
+ break;
- cuPrevCost = cuCost;
- modCUQP += dir;
- }
+ cuPrevCost = cuCost;
+ modCUQP += dir;
}
- lambdaQP = bestCUQP;
}
- recodeCU(parentCTU, cuGeom, bestCUQP, lambdaQP);
+ lambdaQP = bestCUQP;
+ setLambdaFromQP(parentCTU, lambdaQP);
- /* Copy best data to encData CTU and recon */
+ recodeCU(parentCTU, cuGeom, bestCUQP, lambdaQP);
md.bestMode->cu.copyToPic(depth);
md.bestMode->reconYuv.copyToPicYuv(*m_frame->m_reconPic[0], parentCTU.m_cuAddr, cuGeom.absPartIdx);
}
|
|
|
|
|
|
#9946 | Link |
|
Registered User
Join Date: Aug 2024
Location: Between my two ears
Posts: 1,003
|
The most significant change being the QP adjustment range is limited to plus or minus 1 or 2, right?
Technically that would work, but then there's one less reason to use rd-refine. Too narrow of a range, then the effect is reduced to virtually nothing, too wide of a range, then the potential of glitch is still present. None of the rd-refine, original or with patch, improves BD rate in my test though. (except the one with less skip, but it's even slower) Modifying QP on a per block basis without knowing basically anything else is not a good thing, it seems. Limit-tu is not the problem, nor the fix. |
|
|
|
|
|
#9947 | Link | ||
|
Registered User
Join Date: Sep 2002
Location: Italy
Posts: 191
|
Test results for second set of @Patman builds.
Script used. Test A (30 encodes) Quote:
Quote:
-------------- Artifacts: evident in test A, build 1, rd-refine and limit-tu=0 or 1, less evident with limit-tu=2. No artifact (or I did not notice them) in other tests. |
||
|
|
|
|
|
#9948 | Link |
|
Registered User
Join Date: Aug 2024
Location: Between my two ears
Posts: 1,003
|
Is x265 SAO same as HM? This question was asked some where that I can't recall now.
I'd say it's based on HM but worse. The structure does have the resemblance, but the core logic is different. HM uses source pixels to calculate the real cost. x265 use reconstructed pixels to estimate the cost: SAO NOP = 0 distortion + lambda * signal bits, and other SAO modes actually have "negative" distortion + lambda * some more bits. So basically x265's SAO loses the "in-loop" attribute, and becomes some awkward game of balancing between 2 "random" things. This could be an optmization choice... but at the cost of obliterating quality, is it really worth it? Is it effective? Because it still loops over all modes. Perhaps I'll try bring real cost back and see how things will be. Last edited by Z2697; 15th February 2026 at 20:05. |
|
|
|
|
|
#9949 | Link | |
|
Moderator
![]() Join Date: Jan 2006
Location: Portland, OR
Posts: 5,147
|
Quote:
--sao-mode-full (for proposed behavior) --sao-mode-fast (for existing behavior) |
|
|
|
|
|
|
#9950 | Link |
|
Registered User
Join Date: Aug 2024
Location: Between my two ears
Posts: 1,003
|
Uh, I was wrong, the recon (pre-SAO) to source distortion is stored in a variable during initilization.
So I guess it's on par with HM, which I haven't really used. This precalculated dist won't tell how the post-SAO distortion is compared to the pre-SAO distortion, but HM and x265 SAO look similar if I squint my eyes, I count that as they are doing the same. HM passes the source YUV to the init function in the "main structure", while x265 accesses source YUV directly in the init function, that's where I stumbled. Which means both are not good? But this SAO-specific distortion calculation does result in negative value A LOT, even like it's the majority. I'm still under the impression that this is what causes it to be so bad. I don't know what I'm talking about at this point, really. Quit. Last edited by Z2697; 26th February 2026 at 07:33. |
|
|
|
|
|
#9951 | Link | ||
|
Moderator
![]() Join Date: Jan 2006
Location: Portland, OR
Posts: 5,147
|
Quote:
Quote:
|
||
|
|
|
|
|
#9952 | Link |
|
Registered User
Join Date: Aug 2024
Location: Between my two ears
Posts: 1,003
|
Because the "baseline" is always positive.
I could be wrong (again) though. Determin offset: Code:
int64_t bestCost = calcSaoRdoCost(0, 1, lambda); Code:
// RDO SAO_NA
m_entropyCoder.load(m_rdContexts.temp);
m_entropyCoder.resetBits();
m_entropyCoder.codeSaoType(0);
int64_t costPartBest = calcSaoRdoCost(0, m_entropyCoder.getNumberOfWrittenBits(), lambda[0]);
|
|
|
|
![]() |
| Thread Tools | |
| Display Modes | |
|
|