View Full Version : To Xvid developers: Variable B-frames?
athos
6th March 2002, 14:56
Hi all! This is my first post (actually it is my second try at my first post, had some problems with cookies), although i have been lurking this forum for quite some time.
I had written a very elaborate post which got lost due to problems with my webbrowser and firewall [insert some harsh language about these two here] :\
Any way, my point was that I had an idea about using variable amount of b-frames per sequence (by sequence i mean IPBPBPBPBPBPBPBPBPBPBPBPBPBP, refer to -h's post in "any opinions on divx 5?" (http://forum.doom9.org/showthread.php?s=&threadid=19132&pagenumber=2#post95500)) in Xvid. I read that divxnetworks have decided on using one b-frame per sequence, whereas most mpeg1/2 encoders use two. this leads me to believe that there is no absolute optimal number of b-frames to use for all video, but that it somehow depends on the content. further, i assume that this would then be measurable at encode time.
as and example (note: this is not a real theory) i assume that this depends on the factors that are used when deciding the bitbudget for a part of the video in a vbr situation. in my example, content that require lower bitates would lend itself better to using more b-frames (maybe even 3 or 4 per sequence, again just an example) while more demanding content would be better off with fewer or even no b-frames. because i understand that b-frame decoding is pretty cpu demanding, there would be an added advantage in this example because there would be less b-frames in high bitrate parts of the video, and more in low bitrate parts.
this dynamic decision would probably be easier to make when doing 2 passes, so in a 1-pass vbr situation you might want to use a more conservative strategy, or a fixed number of b-frames.
because we want xvid to be as user configurable as possible, one would like to be able to set:
whether to use b-frames at all, and if so if to use variable b-frames
minumum (probably 0 or 1) and, more importantly, maximum b-frames per sequence. (assuming variable b-frames chosen)
a number or slider controlling the conservatism of the algorithm (again if variable b-frames are used).
if fixed number of b-frames, how many per sequence.
a couple of relevant questions:
does the iso mpeg-4 standard allow for variable b-frames? if it is not specified explicitly, perhaps it would be possible for a workaround, while still conforming to standard?
what factor(s) decide the optimal number of b-frames per sequence?
how much would the gain be of implementing this idea, assuming the above questions can be answered satisfyingly?
i am sorry that this post is not as well written as it fast first, i just got tired of rewriting the whole thing.
keep up the good work with xvid! i hope the divx scene will move away from divx3 (which i think is obselete and has no future), towards xvid and perhaps ogg audio instead of mp3
Koepi
6th March 2002, 15:05
DXN choose the IBPBPBPIBPBP solution because you will get in trouble with the AVI file format if doing otherwise.
Regards,
Koepi
athos
6th March 2002, 15:31
oh, ok.
Maybe it would be possible using another format, for example OGM, MP4 or MCF?
Koepi
6th March 2002, 15:35
The primary goal is to stay as compatible as possible.
So i guess we're trying some workarounds for .avi first. Another option is to write our own encoding application - with that we could use whatever we want. But it's a bad idea to write another format parser.
Just sit back and relax, starting on sunday it'll get really interesting ;)
Regards,
Koepi
Originally posted by Koepi
Just sit back and relax, starting on sunday it'll get really interesting ;)
Regards,
Koepi
Hummm....What surprises do you boys have arranged to us? ;):D
Actron
6th March 2002, 16:10
i cant wait till sunday :))
athos
6th March 2002, 16:32
i am also excited about sunday!
but, back to the subject. i understand that variable or even multiple b-frames will not be implemented at this time, because of limitations in the avi format and the strive for compability (which i totally agree on). still, i am curious what you people think of the idea of variable b-frames? would it be a good idea, assuming that it will be possible to implement in the future?
Acaila
6th March 2002, 16:59
@Koepi:
So to be able to use B-frames to their fullest extent you need something that's not avi?
Tronic is hard at work on the first MCF parser at this very moment, and I expect him to have an alpha version done very soon. If XviD developers would work together with the MCF coders you would both benefit from it.
1- You would be able to make the codec a lot better than it already is. That is without the restrictions of avi.
2- You would really help the launch of MCF once it's ready. Being implemented with one of the best codecs around is a great way to bring it to the masses.
Because MCF is still in development stage it's quite easy to add/change features. And XviD is also still in development stage so your job wouldn't change a lot as I see it.
Hmm, the above post reminds me of ChristianHJW, maybe I've been hanging around with him a bit too much :D
saVe
6th March 2002, 17:09
i must admit that i have some problems imagining a way this could be achieved. here's my personal q&a:
how will the codec know where to use how many b frames?
could be solved scene-based. scenes with more bits will recieve more p frames rather than b-frames.
how will the codec know where to insert the p-frames to get the most efficient quality?
when the limit of maximum changes in picture information is exceeded the codec inserts a p-frame and processes the missing b-frames.
maybe i'm completely wrong and the codec can't operate scene-based. but my idea would be to do a normal 1st pass using only i-frames and p-frames.
in the second pass each scene (i-frame to i-frame) could be taken and processed on it's own by comparing the i-frame to the following p-frames and when a certain (of course optimally tweaked) amount of information relating to the i-frame cahnged inserting a p-frame and inserting the missing b-frames. then the whole thing would be repeated using the new p-frame like the i-frame was used before. this would go on until the next i-frame is reached and started all over again. there could also be a fixed value for high motion scenes and one for low motion scenes, with some sort of breaking point in the average bitrate of the whole scene
would this be possible or am i telling compete *edit*?
No offense Acailia, but when I first saw you posting in DivX.com I thought you were Christian using a different handle....you both have a very similar, outlook & style....
.... :D
Cheers,
-Nic
ps
Im keeping a look out for the first parser, christian has done amazing work at pushing this along...
(....Tronic's posted at www.videocoding.de as well)
athos
6th March 2002, 17:28
save> i think the problem can be summarized as this: given a number of frames between two keyframes, how many should be p and how many should be b frames? Ok, maybe im oversimplifying here because there could be a higher complexity in parts of this list of frames than in the others.
still, too many b frames will lead to unnecessarily large p-frames, and too few b frames will lead to too many p frames. the optimal number of b frames in the sequence is a number where if you add b frames the p frames will grow so that the total number of bits used in the sequence (given a fixed quantisizer) grows, and if you remove b frames the p frames added instead would also grow the total number of bits used. so the optimal ratio between b frames and p frames is one which yields the smallest file when using quality-based/fixed quantisizer encoding.
now, this is only a description of the problem and not the solution, since you do not want to do several encodes of the same frames just to see what number of b frames is optimal. there should be some way to determine this by looking at some factor, for example the same as used for determining the bitbudget and quantisizer settings.
pandv
6th March 2002, 17:38
About the B frames thing, I am thinking if it's possible to have IBBBBI (maybe reordered to IIBBBB in the stream), without P frames?
This can be util in transitions between two images (a scroll for example), without the penalisation of the biggers P frames.
pandv.
saVe
6th March 2002, 17:45
i totally agree with you, athos.
i think the perfect number of b-frames between two i-frames or p-frames can only be found by trying. as i wrote before, this could maybe determined by changes in picture information, so when a p-frame from the 1st pass is different in more than xx% than the i-/p-frame you are referring to there would be a p-frame inserted in the 2nd pass. of course this value has yet to be found but i think there is one. the developers could implement a slider as you suggested and let us users (aren't we beta testers by now? ;)) find the best settings for the percentages (by using the famous movie "the replacements" *lol*).
athos
6th March 2002, 17:59
save> i was thinking maybe it is possible to estimate how large the p frames need to be, given a number of b frames, without processing the entire frame? I'm looking for a formula that would state something in the style of:
4 p-frames of size x
translates to
2 p-frames of size y and 2 b-frames of size z
obviously, you would here be looking for a ratio of b and p frames such that the sum of 2 * y + 2 * z is less than 4 * x.
Now, this estimation will probably not be exact, but hopefully it might be good enough to render good results.
Acaila
6th March 2002, 19:48
@Nic:
When I started posting at DivX.com I was a total noob, but being compared to Chris is always a compliment in my views.
@Athos:
I hope I understand you thoughts correctly, but the only way to give good results (as I see it) would be for the codec to encode each frame between two I-frames as both P and B frames. Then to calculate which sequence of these P and B frames will result in the smallest size, and discard the rest. I think this will slow down encoding a lot though.
But it would be a much more efficient use of space.
athos
6th March 2002, 20:54
Acalia> Well, this is one solution, that is slow but thorough. I can think of two alternatives (not sure if they are plausible, just ideas):
Encode p frames only in the first pass, and b frames only in the second pass and then compare these. This would only work in 2 pass mode of course. I imagine this might not work, as the second pass is not identical to the first, but maybe the relative relationship in size is comparable?
Somehow (more or less heuristically) estimate any bit savings by using b frames instead of p frames. I am not very good at math, so i am not sure if this can be done, and if so how good. I am thinking of some function that takes the factor that decides how useful b frames are (complexity? motion? bitrate demand?) and size of p frames and returns an estimate of the savings (or not) by turning them into b frames. a requirement of this function is of course that it is faster (less complex) than actually rendering the b frames and comparing.
as you can see, i am thinking abstractly here, just throwing ideas.
even in the case that you describe (sort of a worst case), maybe the benefits in quality would make it worth to implement (in the future) as an option? if the double rendering is only needed in the second pass, then the rendering time would increase by 50% roughly in a 2 pass encode. of course, if it is needed in both passes it would increase by roughly 100%.
saVe
6th March 2002, 21:41
@athos:
your first idea cannot work, because the first pass uses bigger quantizers over the whole movie than the second pass, so comparing them in terms of size would not be leading to a result. what we need to do is to compare the size of the second pass p-frames to the size of second pass b-frames.
i think your second idea has the problem that estimating bit savings depends on the source a lot. in high motion scenes certainly too many b-frames would be the worst thing to do because for example the ending p-frame is very different from the first b-frame. i hope you get what i mean, maybe it's not the best way of descibing what i mean...
@acaila:
when you render all frames as b-frames instead of p-frames you don't take into account that when you insert p-frames more often the b-frames get smaller too. again, if the distance between b-frame and one of the p-frames gets too big, maybe the b-frame will come out bigger than a p-frame would have... in this case inserting another p-frame can save space and the b-frames will be different from the greater number of b-frames that were encoded in the first place. the problem is just where the heck to insert the damn p-frames!
to you both: please don't take this as an offense, it's an amateur's opinion!
Acaila
6th March 2002, 22:13
@acaila:
when you render all frames as b-frames instead of p-frames you don't take into account that when you insert p-frames more often the b-frames get smaller too. again, if the distance between b-frame and one of the p-frames gets too big, maybe the b-frame will come out bigger than a p-frame would have... in this case inserting another p-frame can save space and the b-frames will be different from the greater number of b-frames that were encoded in the first place. the problem is just where the heck to insert the damn p-frames!
Yes you are correct. I had also realised that before I wrote it down, but I refrained from adding that part because it would have over-complicated things.
I'm no coder, I was just passing an idea around so someone with understanding of codec internals could either use it as inspiration or dismiss it.
I do think the whole idea of variable B-frames is a very good one and would do XviD a lot of good when implented (if it's possible).
Oh, and thank you for writing my name correctly. It seems many people use a personal variation instead of the real deal. I've seen like 3 different versions already today :D
saVe
6th March 2002, 22:26
so you got my point? wow, someone understands what i'm saying! ;) have to improve my explanation skills though....
i'm not a coder either so maybe we should post this over at videocoding.de/xvid.org! the core coders are all there, maybe they find it interesting...
edit: -h, koepi, nic (in alphabetical order ;)), what do you think about this? would it be possible? if yes, could one of you post it at videocoding.de? i think they would more likely listen to you!
Acaila
6th March 2002, 22:37
No need to clutter up that forum with trivials. If Koepi, -h or Nic think this is an interesting idea they'll pass it along.
athos
6th March 2002, 22:50
save> well i figured there should be a problem with using the first encode directly, but maybe it still could be used for some sort of heuristic estimate.
anyway, when talking about the other option of somehow estimating the gain of more b frames, i have tried to keep what factor decides this open. thus, this factor might very well be the motion of a scene; higher motion -> less b frames and vice versa. maybe there is someone here with more indepth knowledge of compression and mpeg-4 who could help us develop this idea?
kopei (or someone with knowledge of mpeg-4 and avi)> would it be possible to have variable b frames in an avi if it could only vary between 1 and 0 (for a sequence)? or is it the variation part that cannot be contained in an avi? maybe this little variation would not gain very much quality?
also, decoding content with b frames is much heavier than without. i was wondering why this is? (this time i did a search in the forum but did not find an answer)
Acaila
6th March 2002, 22:57
also, decoding content with b frames is much heavier than without. i was wondering why this is? (this time i did a search in the forum but did not find an answer)
B-frames need both previous and future I/P frames for proper decoding, while P-frames only need previous frames. So in order to playback B-frames correctly the decoder has to process the future frame as well. That's two frames instead of one.
saVe
6th March 2002, 23:04
also, decoding content with b frames is much heavier than without. i
well, the decoder has to take two frames as a reference to decode one b-frame instead of just one to decode a p-frame.
edit: just saw this question is already answered. sorry, i was writing this post at that time ;)
i wrote it two times now, but no one responded to that:
what do you think of the idea of calculating the difference in percentage values between p-frames of the first pass and the i-frame it refers to and inserting a p-frame when some value is reached? there must be a optimal value for this, i think. this way scenes with more action would get more p-frames and scenes with less action get more.
athos
6th March 2002, 23:27
Acaila (correctly spelled :)), saVe: so the calculation using two frames instead of one more complex? surely the rendering of the following p frame ahead of time does not add any complexity, as this frame does not need to be rendered again when its position in the frame sequence is reached?
saVe: i think you might be on to something. one way to evaluate different methods for estimating the use of b frames would be to compare it to the "base case" proposed by Acaila, where every frame (except i frames) is rendered as both a p and a b frame a compare the sizes. the goal is to make the estimation method produce results as close to the thorough algorithm as possible.
unfortunately, this method wont work on 1-pass encodes. Perhaps you could use another method here. first, encode an i-frame, then the next i-frame, and then a p-frame in the middle of these. estimate the difference between the p fram and the two i-frames, respective. use this estimate to determine percentage of b-frames between first i frame and p frame, and between the p frame and the second i frame. to clarify; do two estimates, between i1 and p, and betweem p and i2, as there might be a different need for b frames between i1 and p than between p and i2.
come to think of it, this might not work for 1-pass either, as the encoder has to know ahead of time where the next i frame will be. perhaps the only plausible solution for 1-pass is the thorough solution.
saVe
6th March 2002, 23:53
this is getting pretty complex... maybe i should take some goof old pen & paper to clean up my thinking! ;)
i don't think this will in any case work well for 1pass encodeings because in every case the codec needs information on what the next frames are like. well, maybe this could stop people from making 1pass rips ;)
the only (not optimized at all) way to do this in just one pass could be to calculate the average bitrate before starting to encode by using the desired final size of the video. everytime the bitrate gets higher than average the codec could skip one frame and put in the missing frame as b-frames. when the bitrate drops below average the same could be done using 2-3 frames. this would give scenes with high motion (= higher bitrate) less b-frames and low motion scenes (= lower bitrate) more b-frames. the values for b-frames used above and below average could depend on the average bitrate. but again, i'm not quite sure if it would be possible for the codec to load so many frames in advance.
what do you think about that?
athos
7th March 2002, 00:16
well the average bitrate for the entire video might first of all not be known, when using fixed quantisizers, and second it might not be a good value to use since the bitrate would be highly variable. um im too tired now to think, will continue this discussion tomorrow =)
saVe
7th March 2002, 00:24
in 1pass encodes there is always an average bitrate, except when you don't aim at a certain filesize... is that what you mean?
yeah, i should go to bed now too, gotta get up very early tomorrow morning :(
a good excuse for coming late: xvid wouldn't let me sleep!;)
Decoding B-frames shouldn't be all that much harder than P-frames really, it sounds like they've stuffed something up. But yes encoding is definitely slower..
As far as variable B-frame frequency goes, I don't know if it's possible (don't see why it wouldn't be, it's probably specified in the VOL header, or even VOP, so we can change whenever we want), but it would create horrible side-effects if used in AVI - every time we want to use a B-frame, we have to "delay" the output to AVI, so frames start lagging behind audio. If we always use, say, 2 B-frames per P-frame, we'll have a 2-frame lag at the very beginning of the encode, but no more lags after that (audio will have to be delayed to stay in sync, though DivX5's dshow filter already caters for this audio sync issue). I couldn't find anything explicit in the MPEG4 spec about how to convey B-frame placement information, I'll have another look later.
Then there's the concept of whether or not this is useful. I haven't seen any tests that say B-frames are worse in certain situations - though it is certainly possible if the interval between P-frames is large enough (the P-frame will blow out to huge sizes, or perhaps even become an I-frame). The best time to huge large B-frame sequences would be in low motion, "talking head" phases - in those cases, the P-frames, despite being far from the reference I-frame, won't actually be inflated by too much.
I haven't thought about this much yet, but it will require an MP4 file encoder to be half useful. Making the decision "intelligent" via 2-pass analysis, would be tremendously difficult, and I'm not sure whether it's within the scope of XviD's current development focus. Just random thoughts, you'd have to encode the first pass with all I/P frames, then during the second, analyse the median motion vector displacements of sequences of P-frames, to decide how large a difference we can tolerate in order to insert a string of B-frames. Scary stuff.
-h
athos
7th March 2002, 10:06
perhaps we are making this overly complicated. Wouldnt it be possible to use the same value that is used to decide whether a frame should be and i frame?
if this value for some frame is over some threshold, it will be an i frame, just like it is now. if it is not over this threshold, test it for some other (lower) threshold to see if it should be a p frame. otherwise a b frame. if this is possible, and plausible, it should work wherever variable keyframe interval works?
as for playback, an absolute maximum number of b frames could be decided on. this maximum (say 5) would have to be enforced during encoding. then the playback problem is the same as if were a fixed number of b frames (5 in this example). just always prebuffer this number of frames, and compensate for it when synching with the other streams (audio, subtitles etc).
saVe> i was thinking of constant quality 1-pass.
Acaila
7th March 2002, 10:13
Even this limited number of variation on B-frames wuold mean hell to avi for sure, but why stick with such an old and obiously outdated format? You're only cutting yourself short that way. Several new formats are in the work which could handle these B-frames much better.
Audio/video sync won't be problem even with pre-buffering a variable number of frames when both video frames and audio frames have timestamps, which some future formats are implementing already.
Maybe playback will be slower, but newer cpu's will be able to handle it easily.
Yes, encoding will be slower, but isn't quality what really counts?
Ps. Forget about 1-pass modes, they will never be able to handle this correctly.
saVe
7th March 2002, 12:20
@athos: what you wrote is exactly what i mean, because wether a frame is an i-frame or a p-frame is decided upon scene changes, meaning that an amount of image information over a certain threshold changed. i think we are agreeing without knowing it.... ;)
@acaila: i agree with you in your opinion about avi and alternative formats. just imagine an ogg container with xvid video and vorbis audio! a file full of gpl ;)!
well, for 1pass we could still use a constant number of b-frames, don't you think? the gain in extra quality for 2pass will prevent people from using 1pass onc and for all!
athos
7th March 2002, 13:32
i agree that one would probably have to stick to constant b-frames in 1-pass, as you say if one opts for maximum quality given a specified filesize one would use 2-pass anyway.
it is my opinion too that avi is outdated and hopefully will soon be replaced by some other standard format. i dont know too much about the properties of each format, but we allready have ogm, mp4 (well sort of, maybe soon with aac audio?) and mcf coming up. it is good to have these discussion of ideas for the future. maybe the variable b-frames is useful, maybe not, anyway we should not let the limitation of the avi format hinder us from discussing it.
it is understandable, however, that the developers for the time being concentrate on implementing stuff that is useful now, with the current formats.
i do some coding myself, but i am not a very advanced c/c++ programmer and have not studyed the mpeg-4 standard or the xvid format so i dont think i would be able to do a good implementation of these ideas for the time being.
Jon Ingram
7th March 2002, 14:26
just imagine an ogg container with xvid video and vorbis audio! a file full of gpl ;)
I don't have to imagine it - this is already here, and working well.
(also, technically Vorbis is BSD, not GPL, but we'll let that pass :)
saVe
7th March 2002, 17:02
@jon: you're right, but ogg doesn't work well for me since it can not be edited atm... :(
(also, technically Vorbis is BSD, not GPL, but we'll let that pass ;)
oh thank you mister! *sits down on his desk* ;);)
vBulletin® v3.8.4, Copyright ©2000-2009, Jelsoft Enterprises Ltd.