Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
16th December 2020, 11:36 | #1 | Link |
Registered User
Join Date: Mar 2018
Location: Germany
Posts: 227
|
Variable Frame Rate (VFR) - basic questions
I recently stumbled upon a movie that has been coded with vfr.
Quite rare I found out, so I would like to ask: 1. Why isn't vfr used more often? I made some tests and found out that it is MUCH smaller for movies with lots of still scenes (as you find often at audios at youtube). If I build a movie (H264) with one identical image (1024 x 736) the vfr version uses 80kB (independent number of frames). The cfr version uses 1.3kB/frame (which gives large files if you consider fps, or the typical "audio movies" on youtube). 2. Conversion cfr to vfr - is there an easy way? (This question is often asked the other way round, as if you want to use the timeline in any way in a video editor it needs to deal with vfr which most of them don't.) Is there an easy way to turn a cfr to a vfr? A parameter would be a similarity limit, so if below should be taken as "still scene"? Or do I have to do it all manually (Identify the dupes (ffmpeg), build timecode file from that, remove dupes, update the timecode, hopefully audio stays sync)? |
16th December 2020, 14:20 | #2 | Link |
Registered User
Join Date: Mar 2018
Location: Germany
Posts: 227
|
Finally found these sources:
http://forum.doom9.org/showthread.php?t=149339 http://avisynth.nl/index.php/Categor...rame_Detectors |
16th December 2020, 17:57 | #3 | Link | ||
Registered User
Join Date: Sep 2007
Posts: 5,424
|
Quote:
It's not completely independent - it also depends on the number of frames and max keyframe interval of your encoding settings. The max keyframe interval is when encoder must place a new keyframe. eg. If you fps was 24, you had 10 second video, 240 original frames, you could theoretically represent that with 1 frame if your keyframe interval was 240 or more. But some devices, scenarios have limitations on max keyframe intervals for specifications. e.g. a "24p" typical blu-ray can only have a max keyframe of 1 second (so 24 frames). Default x264 setting is 250. You can set it to "infinite". VFR can be ok for some types of end distribution , but - -Impaired navigation. The ability to seek in a video is determined by keyframe placement. The entire GOP has to be decoded, so it can be slow. eg. If you have a slideshow presentation with long periods of time represented by 1 keyframe - you might have to wait a long time, and sometimes cannot seek at all (depends partially on the playback software) -It won't be "much smaller" for typical content using modern compression. There is some filesize reduction, but a typical movie with many duplicates (e.g. low fps animation), might only save 5-15%. A typical hollywood movie maybe 0.5-1%. The reason is b-frames do not "cost" a lot to encode. Yes, if you have a slideshow/presentation where there can be long periods of no change - the savings can be higher. -Min fps on YT re-encode was 6 FPS (it might have changed) . Theoretically you could have 1 frame video for the upload 0.000001 FPS (or some small number or functional equivalen), but YT re-encodes it anyways it to 6 FPS. It can save you some bandwidth on the upload, and the YT version is still seekable to an extent, but not a massive savings on YT end. -difficult to edit (so preferably not used for acquistion) -Potential loss of real data. Small movements might not be detected in the VFR conversion, depending on how it's done. eg. Subtle changes like eye movement might be lost by automatic conversions if they are not QCed. Usually there is a threshold control for duplicates, but lossy compression can already cause differnces between duplicate frames . Added grain in a movie. (ie. it can difficult to determine what is supposed to be a duplicate automatically) . Often the cutoff is a grey overlapping area and needs human eyes to be accurate. Quote:
2) I think handbrake can do it for "easy" and automatic without options I prefer avisynth dedup, because you have the ability to adjust thresholds (your "similarity limits") and debug with overlays (you can "see" what you're doing and adjust), settings to decide which frame to keep (e.g. sometimes the 1st duplicate frame is higher quality, sometimes the last) . Many options like limiting strings of duplicates , and it generates timecodes for you. The main negative is the max limit is strings of 20 . It's meant for typical sources. So for slideshows/presentations with long gaps with no activity it's not as useful Manual timecodes is not bad when you have long gaps such as presentations, because there will only be a few entries |
||
16th December 2020, 21:12 | #4 | Link |
Registered User
Join Date: Mar 2018
Location: Germany
Posts: 227
|
Thank you (once again) for your very valuable information!
To give an example - some time ago I downloaded this https://youtu.be/NOIbJFCa2xk Although with better quality. Size was 19kB, 75% of it the "video stream". Although the frames are identical, YT uses 25fps. For this special case (only 1 frame) there was no need for vfr, as I can reduce the constant frame rate of the one and only frame to the complete length. I did that with VD's filter "remove frames" (from shekh himself). And actually I ran into the seek problem. It doesn't play at all if I remove all but one frame. there have to be about 20 frames left. For the general case there remains the question to me how the workflow will be when there are few different frames at arbitrary positions (not only one as above). I will use one of avisynth's dedupe filters then. They will generate a timecode file ok. After removing the dupes from the movie it will have the same lenght but with adjusted constant fps so it has still the same lenght as the audio? And after saving that I use the timecode file to make it vfr, right? |
16th December 2020, 21:21 | #5 | Link | ||
Registered User
Join Date: Sep 2007
Posts: 5,424
|
Quote:
That was probably uploaded at 25fps, so YT keeps the same framerate and number of frames as the uploaded original What I mean is the minimum FPS was 6fps of the YT re-encode. e.g If you upload a 1 fps video, 10 seconds. That's 10 frames. Youtube will re-encode it to 6fps, 60 frames (duplicates). 6x the number. That theoretically take more bits to encode . YOu can upload an equivalent FPS 0.000001 fps video (1 frame) . But youtube will re-encode it to 6fps (maybe ten thousand more frames). In those situations the savings is potentially on the user upload side, not as much on the youtube streaming side because of the MIN fps . YT probably does this for good reason (seeking / navigation). Maybe people want to skip to chorus or other parts of song or slideshow Quote:
Last edited by poisondeathray; 16th December 2020 at 21:25. |
||
17th December 2020, 15:56 | #6 | Link | |
Registered User
Join Date: Mar 2018
Location: Germany
Posts: 227
|
Quote:
If the whole lenght is say 30min. I prepare a timecode file manually for say 5 frames. The fps of the cfr "pre movie" are so small (5/1800) I probably cannot set it exact to match the audio length. But that is necessary to combine the movie with the audio. And after that turn to vfr by the timecode file. |
|
17th December 2020, 22:47 | #7 | Link | |
Registered User
Join Date: Sep 2007
Posts: 5,424
|
Quote:
5 frames total for 30 min ? That's roughly 1 frame per 6 min (not necessarily spaced evenly) . I think that is quite large - most players will probably not play it properly (a very low equivalent FPS), you might need to increase the "granularity" by adding a few duplicates . Also , most players will have difficulty seeking. Look at the mkvtoolnix (mkvmerge) documentation on the formatting and types of timestamps under the "external timestamp files" section . ("timecodes" and "timestamps" are the same thing, the name has changed , they used to be called timecodes). v2 timestamps require an entry per frame, but they need to be "evenly spaced" in time. v4 timestamps do not, but sometimes they don't work correctly with some players You add the video and audio, select the video track, add the timestamps file in the timestamps box. You can also add chapters to navigate by in the player and use chapter names . Those chapter points must be I-frames to be seekable Other ways are to use mp4fpsmod for mp4 container (but does not support chapters directly) , and to add the timecodes in with tcfile-in while encoding with x264 |
|
17th December 2020, 23:22 | #8 | Link | |
Registered User
Join Date: Mar 2018
Location: Germany
Posts: 227
|
Quote:
What is the application I do the muxing with? Surely not VirtualDub? Do I have to use the codec kind of stand-alone? |
|
17th December 2020, 23:38 | #9 | Link | |
Registered User
Join Date: Sep 2007
Posts: 5,424
|
Quote:
If you're using x264 with tcfile-in, that's commandline only option. I tried a very low fps like your 5 frame/ 30 min example - it doesn't work for most players. At around 1 min frames it starts to work fairly consistently for common players. 30 second is even better. I think the frame duration is just too long for most players past 1 minute. They work in sync and with chapter markers, I placed audio tones at specific spots to check |
|
18th December 2020, 00:19 | #10 | Link |
Registered User
Join Date: Mar 2018
Location: Germany
Posts: 227
|
Thank you very much for your help.
I was focussing on manual timecode edit as AviSynth's DeDup has that 20 frames sequence lenght limitation. In conjuction with shekh's Remove Frame filter I can still exand my personal CLI app to do not only Dupe Patterns but also Dupe Sequences and to produce a timecode file. Will work on that. |
18th December 2020, 01:31 | #11 | Link | |
Registered User
Join Date: Sep 2007
Posts: 5,424
|
Quote:
I'm seeing compatibility issues with some players and container combinations as you use very low equivalent FPS's I got 1 sec frames, 30 frames to work for the 30min example in MPCHC, Potplayer in MKV container, but it needed to be remuxed to MP4 for VLC, MPV . It's a bit flaky behaviour. Going lower than this gets worse Pure duplicates , such as slideshows, don't "cost" much. Just increase the --bframes to 16 . A low FPS, e.g. 1FPS, with duplicates, --bframes 16, long GOP will not have those flaky compatibility issues . Your audio stream will often be larger than your video stream In my test example, 1min frames, all I -frames so seekable each minute, 30 frames, crf 20 video stream 397kb - some players have issues with some containers (e,g mp4 might not work for some, mkv for others) 1fps CFR , 1800 frames, keyframes every min (every 60 frames) ,crf 20 16bframes video stream 669kb 60x more frames, 1.65x the size, no compatibility issues because CFR 1fps CFR , 1800 frames, keyframes every 5 min (every 300 frames) ,crf 20 16bframes video stream 227kb . So this was smaller than the VFR I-frame encode, because of fewer keyframes I could have used long gop for the VFR encode, and that would make it smaller, but it was already showing some compatibility issues in some players In this example audio was 7.55Mb low bitrate MP3, so proportionaly much larger than the video streams . It will vary a bit depending on what the content is exactly, but pure duplicates "cost" very little |
|
18th December 2020, 10:35 | #12 | Link |
Registered User
Join Date: Mar 2018
Location: Germany
Posts: 227
|
Wow - thank you for providing these results.
I would like to add just one aspect I observed if the dupes are not pure (bit-wise identical). Strangely enough this is the case for many already already existing "still frame" movies where one might want to get rid of the useless huge amount of video data. I would have bet it works for them too, but a closer look reveils that the codecs produced slightly different frames. So Avisynth's ExactDeDup (no sequence lenght limit) won't do. (BTW StainlessS pointed out that Lagarith produces nul frames for pure dupes). So it would be necessary to generate pure dupes from the similar ones before (by the orig Dup filter). I wonder why all these conditions (especially the ones poisondeathray worked out) aren't provided by a fine converter cfr > vfr, so nobody needed care about compatibility, and saved huge amounts of space. Last edited by nji; 18th December 2020 at 10:40. |
18th December 2020, 12:47 | #13 | Link | |
Registered User
Join Date: Mar 2018
Location: Germany
Posts: 227
|
Quote:
... and also seems to be outdated for some other reason that are discussed in the other deduping filters... So: Which dedupe filter should be taken? And one more thing I wonder: Do all they take into account that similariy is not associative? Take the sequence 10 11 12 13 14 15 16 has max. neighbour distance of 1, but the distance of 10 and 16... and the sequence 10 11 9 12 8 13 7 14 6 has max. neighbour distance of 8, but the distance of 10 and 6... Last edited by nji; 18th December 2020 at 20:09. Reason: typo + clearification |
|
18th December 2020, 15:59 | #14 | Link | |||
Registered User
Join Date: Sep 2007
Posts: 5,424
|
Quote:
Yes - for non pure duplicates, the more noisy, the more space saved. It's pure duplicates that don't "cost" much. This can be a significant savings if there are many duplicates This is also the "grey" area mentioned earlier about accuracy of duplicate detection. The threshold is not always accurate because of grain, noise, wobbly transfer, lossy compression. It takes some human eyes to get it right, or you might drop good frames, or keep noisy duplicates . Pure or very similar duplicates are much easier to detect accurately Quote:
It's not huge amounts of space, for typical "clean" scenarios. And for 1 frame "songs", several frame lectures - being able to seek within a song or a lecture is arguably more important than saving a few MB Playback is a player/splitter/decoder issue . It varies. They all work for "normal conditions" with variable FPS's such as dedup where you have strings of <20 duplicates. Basically all theatrical movies will be covered. It's only when you have very large gaps (very small equivalent FPS) that some begin to have issues It works with all versions, but it's x86 only. You can use it with mp_pipeline for avs+ x64 (avisynth 3.6.2) Quote:
If you're referring to duplicates out of order, or later in a sequence - dedup only works for consecutive duplicate frames . It keeps a reference placeholder frame for each string of duplicates |
|||
18th December 2020, 19:38 | #16 | Link |
Registered User
Join Date: Sep 2007
Posts: 5,424
|
Maybe I don't understand the notation ?
In the 1st sequence, 13 and 13 are duplicates, and they are neighboring . This is ok . The 2nd "13" will be dropped, timecodes adjusted by dedup. 10 and 16 are different frames The 2nd sequence has no duplicates? Or is this referring to something else ? |
18th December 2020, 20:09 | #17 | Link |
Registered User
Join Date: Mar 2018
Location: Germany
Posts: 227
|
I'm terribly sorry ... I did a typo in the first sequence ...
I corrected it just now. And yes, you're misunderstanding. It's not about frames, but simply natural numbers and their differences. Last edited by nji; 18th December 2020 at 21:08. |
18th December 2020, 20:20 | #18 | Link | ||
Registered User
Join Date: Sep 2007
Posts: 5,424
|
Quote:
Quote:
If it's "not about frames",what do the numbers refer to ? This implies each number has a relative "position", because you mention "neighbor distance of 8" Is it just about misordered frames ? 6,7,8,9,10,11,12,13,14 ? |
||
18th December 2020, 21:04 | #19 | Link |
Registered User
Join Date: Mar 2018
Location: Germany
Posts: 227
|
Think of the numbers as the "characteristic values" of the frames they stand for.
The question is how the deduping algorithms define "sequence of similar": Are the neighboured frames compared, or is the first frame of the sequence the reference to compare the others to? If it's the first definition, then a "sliding changement" of only difference 1 (10 11 12 13 etc.) has to be "cut off" at some point as the difference between the first (10) and the last (13 or whatever) has become too large ALTHOUGH the difference of the neighbours is only 1. My suspicion is that might be the reason for AviSynth DeDups limit of 20. Last edited by nji; 18th December 2020 at 21:07. |
18th December 2020, 21:25 | #20 | Link | |
Registered User
Join Date: Sep 2007
Posts: 5,424
|
Quote:
They usually compare similarity characteristics either difference from previous, or difference to next. Usually only small areas are sampled, not entire frame for speed purposes . The "%" value is what is used for the threshold As mentioned earlier, for dedup it's only directly adjacent frame Position 10 difference from 11 11 difference from 12 . . . An example from a log looks like this Code:
frm 22: diff from frm 23 = 2.1809% at (160,352) frm 23: diff from frm 24 = 3.5938% at (256,448) frm 24: diff from frm 25 = 2.9563% at (128,352) frm 25: diff from frm 26 = 3.1458% at (160,352) frm 26: diff from frm 27 = 3.4811% at (256,448) frm 27: diff from frm 28 = 0.2851% at (416,256) frm 28: diff from frm 29 = 3.4487% at (256,448) |
|
Thread Tools | Search this Thread |
Display Modes | |
|
|