Log in

View Full Version : Advice on HEVC with one/few frames for 1+ hour audo track


LunaRabbit
26th December 2024, 10:16
I need some advice when it comes to videos where there is one-few still pictures with an audio track that is sometimes 2 hours long. We've recently been translating Drama CDs. This obviously requires subtitles so we must have some kind of video to overlay them on. What we've been doing is making a 1080p picture with some information about the Drama CD and a place where we can put subtitles at the bottom.

My settings might seem odd to some people. But the gist of it is I used a closed GOP and a responsible min/keyinit to ensure seeking is fast. I typically use something like

--min-keyint=1 --keyint=120 --no-open-gop

This ensures fast seeking and has worked well on most set-top devices I've tried it on. It doesn't add much to the final file size for most content (actual animation) so I think it's a good trade off.

But when it comes to Drama CDs where I'm making something like this (example avisynth script):

imagesource("S:\path\to\image.png",fps=23.976,start=0,end=83920, use_DevIL=true)
ConvertToYUV420()
Convertbits(16)

As the video source and muxing it together with a sub-100MB audio file this is not ideal. Since the final file size will be upwards to 600MB all because that same frame is getting repeated every 120 frames as an I-frame.

I've noticed --frame-dup doesn't seem to do anything or decrease file size. I can use Open GOPs and increase keyinit. But seeking times suffer a lot. When I use keyinit/minkeyinit+Open GOP to have just one I-frame at the start I get a decrease in file size obviously. But it won't work on a lot of devices or if it does play it freezes/hard locks the moment you attempt to seek manually. My chapters obviously still work if I manually insert I-frames there. But I'd prefer it work as good as my usual settings.

I'm looking for a better way to handle this. Right now I'm having to live with 600+MB files that are one image+100MB of audio at most. I'd like to get the video portion smaller so I could give more bitrate to the audio.

Anyone have any advice? Or is Open GOP+endless keyinit just not that well supported by low powered set-top devices? I understand why. They have to decode a ton of frames going all the way back to frame one if you seek around.

Z2697
26th December 2024, 16:59
Try CQP (with extreme ratio)?

For example: --qp 51 --ipratio 20 --b-adapt 0 [--bframes 0] --no-sao --no-weightp --max-merge 1 --no-temporal-mvp --ref 1 --rskip 2

This will encode an I frame with QP 25, and all predictive frames with basically nothing (with a good / ok quality I frame and no motion you don't need), which is also the reason why we don't need b-adapt, we don't even really need B frames, as it only saves a tiny amount of space overall (a few hundreds of bytes, even with bframes=16 update: so somehow less bframes give better result, in my test case bframes=3 gives best result with... a few kilobytes).
no-sao and no-weightp speeds up encoding and saves bits. (the bframes will have negative effect on saving bits after these flags are disabled, you still need to disable b-adapt, it somehow still slowing down encoding)
max-merge=1 no-temporal-mvp ref=1 saves bits without any other effect by not signaling unnecessary flags (the effect is sure, the reason is my guess).
rskip=2 can increase encoding speed without any other effects, when bframes=0.

You can maybe disable more unimportant features to speed up encoding or save bits.
And if you are ok with less speed you can disable wpp to save some extra bits.

I don't know about set-top devices, but even a GOP as large as 10000 frames is seeking pretty fast in this scenario, on my computer, with LWLibavSource threads=1. Well, faster than what normal video will be like at least, but still takes some time. But I think something like 720 (30 seconds 24fps) will be a nice balance.
I think, because all blocks in the inter frames will be encoded as skip, the ref will be pointed right to the I frame, if the I frame is in the container's index (which should normally be), it should be pretty fast. But if the video player are "seeking by keyframes" the accuracy of seeking will be terrible.

If you are wondering, ipratio is not the actual ratio. The effect of this parameter is a minus QP offset of 6*log2(x). pbratio follows the same calculation but is positive offset.


However, AV1 will be more suitable for this kind of use, if you don't have compability problems with it, the speed is not much worse for static "video". I guess a big factor here is the maximum block size, so less blocks need to be signaled, so the "overhead" of completely skipped inter frames are smaller.

And there's no reason to use that weird decimal framerate.