Log in

View Full Version : x264 - CBR/Fixed GOP and random low quality frames


Wallabee
23rd June 2025, 02:10
Hello,

I've been testing some encodes with x264 (through FFMpeg) and I have the constraint that I need them to have a fixed GOP of two seconds and also be CBR (or at least peak bitrate constrained to around 3000 kbit/s).

I started by doing a purely subjective comparison between a single pass and two pass encode, and even doing A/B frame comparisons showed very little benefit doing two pass. Especially with the additional encoding time, it wasn't going to be worth it. And with other posts I've found also mentioning that two pass and CBR is almost never worth it, I was just going to stick with single pass.

I then just checked the VMAF for the encodes and seen something interesting. The following VMAF graph shows Big Buck Bunny encoded with the following settings:

Resolution: 1280x720
Preset: Medium
Bitrate: 3000 kbit/s
vbv-maxrate: 3000 kbit/s
vbv-bufsize: 6000 kbit
keyint: 48 (BBB is 24FPS)
noscenecut
single pass

(I shrunk the VMAF graphs to avoid overly large embeds)

https://i.imgur.com/klNW1QX.png

My eye was immediately drawn to the obvious dips where frame quality dipped really low, and so I looked at one of those problem areas in the video, and noticed very low quality frames for around half a second. Here is a single frame in one of these dips:

https://i.imgur.com/xNOwQdN.png

Very noticeable bad quality on the chinchilla character.

I then tried the same encoding settings above, except changed the preset to Very Slow to see if that would help, and the resulting VMAF graph was the following:

https://i.imgur.com/mUQBPnU.png

While the average VMAF score was higher, some of the dips got even worse than the medium preset! Here is the exact same frame as above, but on the very slow preset. Notice it's even worse quality than the medium preset.

https://i.imgur.com/EpRA2JB.png

At this point, I decided to just try the two pass (same settings, very slow preset), and here is the resulting VMAF:

https://i.imgur.com/LGcYaa6.png

Poof! The really bad frames disappear. And the same frame screencap:

https://i.imgur.com/D7kJHRk.png

Much better! (The medium preset two-pass had similar results, with the really bad frames fixed, but I'll omit those images for brevity)

I was curious now to see what would happen if I took out noscenecut and let it insert extra I-frames, and then ran just a single pass, and for the most part the really bad frames were gone.

default scenecut, single pass:

https://i.imgur.com/uHxFovS.png

Running two pass with default scenecut took care of that remaining dip:

https://i.imgur.com/jHZIJ6f.png

-------------

I'm not an x264 expert by any means, so my best guess is that these low quality frames occur due to Fixed GOP and bad luck placement of the I-Frames that can only be fixed with two pass encoding. I'd like to avoid two pass just due to the fact that the average VMAF score with CBR and two pass is not worth it, but the bad frames that occur in single pass definitely are noticeable. Also, I found it strange that a slower preset made the bad quality frames even worse. No clue why that happens.

Curious to hear what some of you x264 experts make of it.

Thanks!

benwaggoner
24th June 2025, 17:22
Do you have your full command line you can share? There's a variety of things that can be tweaked.

For single pass encoding, setting --rc-lookahead to the same value as --keyint can help a lot, so rate control is always looking ahead a whole GOP so it can massage VBV to make quality more consistent.

Doing a --crf encode with --vbv-bufsize and --vbv-maxrate set can also leave a little more buffer available for those I-frames, also smoothing out quality. --crf can be pretty low, like 18, so you get good quality. You're basically telling the encoder not to let QP drop lower than provides perceptual value.

Also, Big Buck Bunny is an oddly easy title to encode, and doesn't respond to rate control like more common content. I recommend Tears of Steel as the blender.org project that's best representative of real-world non-CGI content types.

Asmodian
24th June 2025, 17:24
This extremely limited VBV scenario is exactly where two pass encoding has the most benefit.

For medium v.s. very slow I think it is almost random which will be better. It depends on where exactly the VBV limits are hit and what the content is like past lookahead when it happens. It would be interesting to test different lookahead distances with single pass, perhaps a 250 frame lookahead would help without needing two passes?

Wallabee
25th June 2025, 03:02
Do you have your full command line you can share? There's a variety of things that can be tweaked.

Sure:

ffmpeg -v warning -hide_banner -stats -i "bbb.mov" -vf "scale=1280:720:flags=lanczos+accurate_rnd+bitexact,setsar=1" -c:v libx264 -x264-params "scenecut=0:keyint=48"
-pix_fmt yuv420p -video_track_timescale 2400 -an -y -b:v 3000k -maxrate 3000k -bufsize 6000k -preset medium bbb-3000k-3000k-6000k-medium-onepass.mp4

For single pass encoding, setting --rc-lookahead to the same value as --keyint can help a lot, so rate control is always looking ahead a whole GOP so it can massage VBV to make quality more consistent.

Tested by setting rc-lookahead=48, and the result was the same. VMAF graph was almost identical to whatever the preset sets by default. So setting it to the same as keyint didn't help (nor did increasing it to 250).

Doing a --crf encode with --vbv-bufsize and --vbv-maxrate set can also leave a little more buffer available for those I-frames, also smoothing out quality. --crf can be pretty low, like 18, so you get good quality. You're basically telling the encoder not to let QP drop lower than provides perceptual value.

I had actually tested with Capped CRF prior, and I wasn't quite sure how to do it correctly. I was under the impression you would just set CRF to 1 and let VBV handle lowering quality as needed while letting CRF use as high as quality as it could. I tested both CRF 18 and CRF 1 and both resulted in about the same encode speed, with CRF 1 marginally higher average VMAF.

Here is the VMAF graph of CRF 1:

https://i.imgur.com/hAXr9VZ.png

and the VMAF graph for CRF 18:

https://i.imgur.com/NzeM25e.png

CRF 18 did take care of a few dips, but there are still quite a few. So Capped-CRF really isn't helping here either.

Also, Big Buck Bunny is an oddly easy title to encode, and doesn't respond to rate control like more common content. I recommend Tears of Steel as the blender.org project that's best representative of real-world non-CGI content types.

I tested Tears of Steel as well (same settings as my command above). Here is VMAF graph of single pass:

https://i.imgur.com/SFjrFFb.png

and two pass:

https://i.imgur.com/qsULOl7.png

While there are less dips than BBB, it still has the noticeable one that was fixed with two pass. Here is a side by side slider comparison between the single pass and two pass in that worst dip area:

https://imgsli.com/MzkyMDA0

Two pass was able to fix it again.

This extremely limited VBV scenario is exactly where two pass encoding has the most benefit.

For medium v.s. very slow I think it is almost random which will be better. It depends on where exactly the VBV limits are hit and what the content is like past lookahead when it happens. It would be interesting to test different lookahead distances with single pass, perhaps a 250 frame lookahead would help without needing two passes?

I tried rc-lookahead=250 and the resulting VMAF graph was almost the same as whatever the default is for a particular preset. Didn't help fix the dips whatsoever.

EDIT: Trying to figure out if my rc-lookahead setting is applying. I'm seeing "rc_lookahead=48" in the NAL SEI unit, even though I'm specifying it as 250 in ffmpeg.

UPDATE: I think this line (https://github.com/mirror/x264/blob/c24e06c2e184345ceb33eb20a15d1024d9fd3497/encoder/encoder.c#L1116) limits rc-lookahead to my keyint.

cubicibo
25th June 2025, 09:25
It's pointless for the lookahead to exceed the parameters of the VBV model. The data lifetime with your VBV settings is at most 2 seconds (vbv-bufsize/vbv-maxrate). Frame N is agnostic to the complexity of frame N+48 as they will never be in the buffer at the same time, nor can they refer to each other (GOP length of 48).

benwaggoner
25th June 2025, 19:11
Hmm. Given the speed of x264 on modern hardware, I'd try --preset veryslow. Generally you want to use the slowest preset that isn't too slow for your use case. And --tune film definitely helps Tears of Steel, and IIRC helps Big Buck Bunny. Everything you can do that lowers QPs in general will help your issue.

I generally recommend using Spline36 over Lanczos for downscaling, as Lanczos with default parameters results in higher frequencies, which raise QP on their own a bit.

Also, big picture, are you seeing this "bad frames" when playing the content back at 24p. x264 is a video encoder, and can take advantage of human visual system properties like not being acute for spatial detail after a big discontinuity. A few "bad" frames right after a scene cut can look back in metrics or freeze frame but be unnoticeable when playing at 24 fps.

Wallabee
26th June 2025, 01:37
Hmm. Given the speed of x264 on modern hardware, I'd try --preset veryslow. Generally you want to use the slowest preset that isn't too slow for your use case. And --tune film definitely helps Tears of Steel, and IIRC helps Big Buck Bunny. Everything you can do that lowers QPs in general will help your issue.

I did try veryslow in the original post. It made the worse quality frames even worse

I generally recommend using Spline36 over Lanczos for downscaling, as Lanczos with default parameters results in higher frequencies, which raise QP on their own a bit.

I'll try a different downscaling algorithm, but I'm pretty confident that's not the root of the issue here.

Also, big picture, are you seeing this "bad frames" when playing the content back at 24p. x264 is a video encoder, and can take advantage of human visual system properties like not being acute for spatial detail after a big discontinuity. A few "bad" frames right after a scene cut can look back in metrics or freeze frame but be unnoticeable when playing at 24 fps.

They are absolutely noticeable. That scene in BBB in particular, it's at least half a second of really bad looking frames and really noticeable in regular watching conditions.

At this point I'm just going to chalk it up to a problem that happens with Fixed GOP and bad luck of I-frame placement.

While I'm glad two pass can fix it, it's just unfortunate to have to waste all that extra encoding time to do it.

benwaggoner
26th June 2025, 20:08
I did try veryslow in the original post. It made the worse quality frames even worse
That is...odd. You should try --preset slow and slower and see which does best. We could then look at the different parameters between the presets and see if we can narrow down what the issue is.

This is all pretty surpassing because BBB is really easy to encode and should look great at these bitrates.


I'll try a different downscaling algorithm, but I'm pretty confident that's not the root of the issue here.
It'd won't be the root, but it's something that can give an incremental nudge in the right direction. And quality optimization is often about the accumulation of a sufficient number of incremental nudges until they add up to a material improvement.

They are absolutely noticeable. That scene in BBB in particular, it's at least half a second of really bad looking frames and really noticeable in regular watching conditions.

At this point I'm just going to chalk it up to a problem that happens with Fixed GOP and bad luck of I-frame placement.
This is still weirding me out. I've encoded BBB many dozens of times with all sorts of encoders and codecs, and it's always looked pretty spectacular at these bitrates. I suppose those would mostly have been variable GOP, though. I've been crusading against fixed GOP even for adaptive streaming for getting near 20 years now. It's really only needed if you are stuck using terrible and old player heuristics.

While I'm glad two pass can fix it, it's just unfortunate to have to waste all that extra encoding time to do it.
The second pass is x264 itself is actually quite a bit faster than the first because it can reuse a bunch of the first pass work. You're probably being gated by repeating the preprocessing twice.

Wallabee
27th June 2025, 01:35
That is...odd. You should try --preset slow and slower and see which does best. We could then look at the different parameters between the presets and see if we can narrow down what the issue is.

Tried, and same result. Each slower preset you go, the better the average VMAF was, as expected, but the worst quality frames get worse (though slow and medium were about equal, but slower caused the VMAF in those dips to go almost down to 40)

This is all pretty surpassing because BBB is really easy to encode and should look great at these bitrates.

And on average it does look great. I was happy until I out of the blue decided to just see what the VMAF showed, and that's when it showed me some areas with issues.

It'd won't be the root, but it's something that can give an incremental nudge in the right direction. And quality optimization is often about the accumulation of a sufficient number of incremental nudges until they add up to a material improvement.

Tried the spline downscaling algorithm, and it made no difference.

This is still weirding me out. I've encoded BBB many dozens of times with all sorts of encoders and codecs, and it's always looked pretty spectacular at these bitrates. I suppose those would mostly have been variable GOP, though. I've been crusading against fixed GOP even for adaptive streaming for getting near 20 years now. It's really only needed if you are stuck using terrible and old player heuristics.

The video is going to a HLS segmenter and their pretty adamant about segmenting on 2 second GOPs.

The second pass is x264 itself is actually quite a bit faster than the first because it can reuse a bunch of the first pass work. You're probably being gated by repeating the preprocessing twice.

This has never been the case for me. Just testing now with medium preset and a single pass, I get a encode at 17.4x speed. For a two pass encode, I get 30.6x for the first pass, and 18x for the second pass. I know the first pass of a two pass encode is quicker because it uses faster settings behind the scenes.

---

Again, I'm no expert on x264, but my gut is saying it's something to do with Fixed GOP and luck of the draw where an I-Frame gets inserted. In my OP, I did a test where I removed noscenecut and let it insert extra I-Frames as it deemed fit, and it took care of all but one of the problem areas.

It's like it uses up most of it's allocated bits to an I-frame, and then a complicated scene comes along very shortly after where it now doesn't have enough left over bits, so it can't do much. Two pass then fixes this by allocating them better. But this is pure conjecture on my part.

Maybe a x264 dev can shed some light on this. I'm not even sure how much development work is going on in x264 nowadays. I'm sure most have moved on to more modern codecs like H.265/VP9/AV1, etc.

Z2697
27th June 2025, 05:33
How about actually segment the video rather than doing it by keyint

benwaggoner
1st July 2025, 02:47
Tried, and same result. Each slower preset you go, the better the average VMAF was, as expected, but the worst quality frames get worse (though slow and medium were about equal, but slower caused the VMAF in those dips to go almost down to 40)

And on average it does look great. I was happy until I out of the blue decided to just see what the VMAF showed, and that's when it showed me some areas with issues.
Well, something that looks good in practice is better than one that looks good in theory. If you didn't notice it until looking at a metric, it probably isn't that bad a problem.

It still is a weird one. BBB is easy in weird ways that could be freaking out rate control or something.

The video is going to a HLS segmenter and they're pretty adamant about segmenting on 2 second GOPs.
Is it a really old one. A modern HLS HEVC using fMP4 is fine with variable GOP durations. There are vanishingly few HLS players out there that need H.264 at all anymore.

Again, I'm no expert on x264, but my gut is saying it's something to do with Fixed GOP and luck of the draw where an I-Frame gets inserted. In my OP, I did a test where I removed noscenecut and let it insert extra I-Frames as it deemed fit, and it took care of all but one of the problem areas.
Yeah, but a long rc-lookahead normally gets around that issue. Fixed GOP is always worse for the reasons you describe; higher bitrates for same quality at best.

It's like it uses up most of it's allocated bits to an I-frame, and then a complicated scene comes along very shortly after where it now doesn't have enough left over bits, so it can't do much. Two pass then fixes this by allocating them better. But this is pure conjecture on my part.
That is what happens, yeah. rc-lookahead normally smooths that out. Increasing --vbv-bufsize should as well, if that's an option.

Maybe a x264 dev can shed some light on this. I'm not even sure how much development work is going on in x264 nowadays. I'm sure most have moved on to more modern codecs like H.265/VP9/AV1, etc.
Not much algorithmic work like rate control improvements have happened with x264 in recent years, no.

Z2697
1st July 2025, 17:35
The problem is probably that the keyframe needs to "compensate" for bits spent by previous frames, so doing separate encodes would workaround some of that problem.

benwaggoner
2nd July 2025, 17:39
The problem is probably that the keyframe needs to "compensate" for bits spent by previous frames, so doing separate encodes would workaround some of that problem.
That's a workaround with a decent chance of violating VBV when the segments are stitched back together, though.

If a higher VBV would work, better to just increase --vbv-bufsize, which should also fix the quality problem if it is VBV related.

HD MOVIE SOURCE
1st September 2025, 14:42
Interesting post, I'd like you to try a different one pass setting using CRF. You will never achieve the CRF value, but that's not the point of using this method, it still responds to bit-rate control. Its something I use to avoid using 2 pass encoding.

Bitrate: 3000 kbit/s
vbv-maxrate: 3000 kbit/s
vbv-bufsize: 6000 kbit
keyint: 48 (BBB is 24FPS)
noscenecut

so same settings as before, but instead of using one pass use CRF. Just use a CRF of 8. Again, you won't reach that ever with bit-rates so restricted, but we just want to see if CRF style one pass encoding suffers from the same issues as 1 pass encoding. This I'd like to see.


What are the b-frame settings? I find that anything other than 1 b-frame and you'll start to lose quality.
I'd be interested to see 1-bframe used also, because I wouldn't be surprised if that frame was caught on a b-frame.
I'd be willing to try b-frame=0 also, just to make sure that b-frames are the issue.
Overall quality would go down with no b-frames but, I wonder if the quality drops would be fixed?

noscenecut with x265 is very good, but with x264 and bframes being poor after 1, I'd definitely like to see only 1 used to see if the issues get fixed.

What program allows you to see bit-rates displayed in this way?

benwaggoner
3rd September 2025, 17:36
What are the b-frame settings? I find that anything other than 1 b-frame and you'll start to lose quality.
Yeah mean if it is >1? Generally the lower the bitrate, the more b-frame>>1 is going to be helpful. I'd want a minimum of 7 using reference B without strictly hierarchical pyramid and using --b-adapt 2 (although b-adapt 1 is a lot better than it used to be). I don't think I've used below 4 for anything in the last 15 years.

I've found even 16 b-frames helpful for very low bitrate motion graphics and animation content. I was able to do some 1080p24 simple line art animation at a high quality 100 Kbps using that and some other tricks.

I'd be interested to see 1-bframe used also, because I wouldn't be surprised if that frame was caught on a b-frame.
I'd be willing to try b-frame=0 also, just to make sure that b-frames are the issue.
Worth checking at least, but quality would be poorer, especially with a small vbv-bufsize.

Overall quality would go down with no b-frames but, I wonder if the quality drops would be fixed?

What program allows you to see bit-rates displayed in this way?
YUView can do that and a whole lot more, and is open source: https://github.com/IENT/YUView

You can see what's happening down to the block level. It's very helpful to see what's happening. And just capturing the console output of --verbose for analysis can reveal a lot. Like if particular frames spike to high QPs, and what type of frames those are. I wish x264 had taken the contribution of x265's --csv and --csv-loglevel. Having a CSV formatted dump of dozens of per frame values is incredibly powerful.

hellgauss
7th September 2025, 10:01
If OP is familiar with simple script languages (e.g. python) I would follow Z2697 suggestion for separate encoding. This would also allow single thread multiple encoding, which gives a small increase in quality. Eventually remove redundant SEI message from the merge.

Some other possible suggestions:

- veryslow preset should be helpful
- higher bframes should also be helpful. I've found some issues in x264 with higher bframes and badapt=2, so you could try badapt=1. Perhaps, due to unusual gop size, you could also try to lower the default ip and pb ratio.
- If you need a one-size-fit-all method, I suggest a soft resize for difficult videos. 3000-6000 for maxrate-bufsize can be not enough for difficult scenes at 720p and 2s fixed GOP with sub-optimal key frames placement. I feel well with bicublin resize, with a+2b=1 and 0<=a<=0.333. Also a light degrain filter before the resize should help to decrease quantizer in noisy videos, try removegrain=1:2:2 or 1:1:1. You can do more complex resize/filtering e.g. through avisynth.

Sharc
16th May 2026, 08:01
FWIW a similar (related?) case of sudden unexpected quality drop has been discussed here
https://forum.doom9.org/showthread.php?t=175662&highlight=mp3dom

Z2697
16th May 2026, 17:04
I'd say the implementation of VBV is not very good in x264 (in x265 it's even worse)...

Blue_MiSfit
22nd May 2026, 06:09
Which encoders are superior in your opinion in this scenario?

Z2697
22nd May 2026, 19:14
Which encoders are superior in your opinion in this scenario?

I don't have any...
But it looks like the look ahead doesn't work very well, or maybe it's a impossible task without multi pass.

rwill
25th May 2026, 11:31
I don't have any...
But it looks like the look ahead doesn't work very well, or maybe it's a impossible task without multi pass.

I have one for HEVC but cannot pass it around, its integrated into Dolby Hybrik though. I know these VBV problems, most of the time they are caused by short term frame size prediction failure, so improving size predictors helps.