View Full Version : New metric for deciding on bitrates?
descartes
10th June 2007, 05:02
I've settled on some h264 settings that I like, but the prospect of adjusting the settings (or at the very least the bitrate) for every encode looks daunting. I'm trying to come up with a new metric for bitrate determination; see the posts below. Quoted is my original post:
After endless playing with x264 settings, I've settled on some that I'm happy with. But now that I've started backing up my collection, I've discovered that the resolutions of my source materials can vary. Is there some way of adjusting a given (resolution, bitrate) pair to give the same "quality" at a different resolution?
Thanks,
Austin
foxyshadis
10th June 2007, 06:45
Of course, use --crf (constant quality) modes. You only have at best approximate control over the bitrate, though. (Obviously, that also says nothing about the source's quality, some need a lot of help...)
Awatef
10th June 2007, 09:20
If you wanna have control over the bitrate (say size) and have acceptable quality, try not to go under 0.1 bit per pixel.
Say, if you source is 720x576 @ 25fps, you better not go below:
(720 x 576 x 25 x 0.1) / 1000 = 1037kbps
Of course, this is just an anchor if you don't have the time to test the compressibility of your source.
I came out w/ a little compressibility test of my own involving XviD. Like you could take a representative 5 minutes clip out of your source and encode it w/ quantizer 2 in XviD (everything else at default). Cut the bitrate you get in half and reduce by 25%. That would be the minimum bitrate you should use w/ x264.
Like if you get 1400kbps w/ XviD @ Q2, you shouldn't go below:
(1400 / 2) - 25% = 525kbps in x264
descartes
10th June 2007, 15:23
what i've been trying is to run a bunch of encodes of the same source material (say, 15000 frames of it) at various bitrates, then do regression analysis on the ssim values. since you aren't really supposed to compare the ssim values of distinct scenes, but only the ssim values of the same scene encoded differently, im coming up with a metric where instead of targeting a particular ssim value, im targeting an ssim rate of change. so the idea is "keep adding x to the bitrate of the encode until the ssim only increases over the previous bitrate by y".
this seems like a reasonable metric, and im hoping it WILL actually vary by source material (instead of working out to be some effectively constant bitrate). anyone have any complaints about that?
descartes
10th June 2007, 15:27
Of course, use --crf (constant quality) modes. You only have at best approximate control over the bitrate, though. (Obviously, that also says nothing about the source's quality, some need a lot of help...)
it was my understanding that you could only use a one-pass encode if you were using crf. i'm pretty attached to the idea of a two-pass encode, notwithstanding the ongoing "quest for true constant quality" thread. with crf, does two-pass no longer matter?
descartes
10th June 2007, 15:32
here are some bitrate/ssim pairs for different encodes. ill post the delta values in a sec...
deep space nine, season 1, disc 1:
100 = 0.9694508
200 = 0.9803445
300 = 0.9831403
400 = 0.9844795
500 = 0.9853174
600 = 0.98584
700 = 0.9863196
800 = 0.9866887
900 = 0.986999
1000 = 0.9873059
1100 = 0.9875991
1200 = 0.98786
1300 = 0.9881205
1400 = 0.9883649
garden state:
100 = 0.9550163
200 = 0.973199
300 = 0.9793873
400 = 0.9825119
500 = 0.9844211
600 = 0.9856942
700 = 0.9866228
800 = 0.9873405
900 = 0.9879248
1000 = 0.9884146
1100 = 0.9888465
1200 = 0.9892244
1300 = 0.9895655
1400 = 0.9898802
1500 = 0.9901772
1600 = 0.9904643
1700 = 0.9907261
1800 = 0.9909715
1900 = 0.9911981
2000 = 0.9914099
descartes
10th June 2007, 15:41
and here's the respective deltas:
deep space nine, season 1, disc 1:
100 --> 200: 0.0108937
200 --> 300: 0.00279580000000001
300 --> 400: 0.00133919999999998
400 --> 500: 0.000837900000000058
500 --> 600: 0.00052260000000004
600 --> 700: 0.000479599999999913
700 --> 800: 0.000369100000000011
800 --> 900: 0.000310299999999986
900 --> 1000: 0.000306899999999999
1000 --> 1100: 0.000293199999999993
1100 --> 1200: 0.000260900000000008
1200 --> 1300: 0.000260499999999997
1300 --> 1400: 0.000244400000000033
garden state:
100 --> 200: 0.0181827
200 --> 300: 0.00618829999999992
300 --> 400: 0.00312460000000003
400 --> 500: 0.00190920000000006
500 --> 600: 0.00127309999999992
600 --> 700: 0.000928600000000057
700 --> 800: 0.000717699999999932
800 --> 900: 0.000584300000000093
900 --> 1000: 0.000489799999999985
1000 --> 1100: 0.00043189999999993
1100 --> 1200: 0.000377900000000042
1200 --> 1300: 0.000341099999999983
1300 --> 1400: 0.000314700000000001
1400 --> 1500: 0.000296999999999992
1500 --> 1600: 0.000287099999999985
1600 --> 1700: 0.00026180000000009
1700 --> 1800: 0.000245399999999951
1800 --> 1900: 0.000226599999999966
1900 --> 2000: 0.000211799999999984
the deltas are different! for example, i think i might pick a minimum delta of 0.0005, and as soon as the ssim improvement drops below that threshold, stop increasing the bitrate. for these two examples, i'd choose 600 kbps and 900 kbps, respectively, as the final encoding bitrate.
the theory, again, is that even though the ssim metric isn't comparable between scenes, i believe you can still use it to detect when you get the point where you're wasting bits by increasing the bitrate.
Manao
10th June 2007, 18:34
I don't want to discourage you, but that's not how a metric should be used ( at least not that one ).
A delta in SSIM can't be compared to another, at least definitely not in such a linear fashion. That's because :
the max SSIM is 1, so, necessarily, the higher the SSIM, the lower the derivative of SSIM(bitrate) will be ( since we can assume SSIM=1 is reached for very high bitrates ).
The SSIM has never been defined in order SSIM(a) - SSIM(b) to mean anything. SSIM was designed to say SSIM(a) > SSIM(b) --> 'a' better than 'b'. ( *not* 'a' better than 'b' by some f(ssim) amount of quality ).
descartes
10th June 2007, 19:11
The SSIM has never been defined in order SSIM(a) - SSIM(b) to mean anything. SSIM was designed to say SSIM(a) > SSIM(b) --> 'a' better than 'b'. ( *not* 'a' better than 'b' by some f(ssim) amount of quality ).
Hmmm... then I must be misunderstanding the wikipedia entry:
The SSIM index is a decimal value between 0 and 1. 0 would mean zero correlation with the original image, and 1 means the exact same image. 0.95 SSIM, for example, would imply half as much variation from the original image as 0.90 SSIM.
That seems to imply that, for example, the difference between SSIM=.90 and SSIM=.95 would be *twice* as big a deal as the difference between SSIM=.90 and SSIM=.925.
the max SSIM is 1, so, necessarily, the higher the SSIM, the lower the derivative of SSIM(bitrate) will be ( since we can assume SSIM=1 is reached for very high bitrates ).
I understand that SSIM asymptotically approaches 1 as the bitrate approaches infinity, but that's the whole point: you can never stop saying "well, if I increase the bitrate just a little bit more, I can improve my SSIM score" because you can *always* improve your SSIM by increasing the bitrate.
But you *can* say "well, I've reached some threshold of diminishing returns where I could continue to increase the bitrate, but the SSIM gains that I'm achieving by doing so aren't really meaningful anymore." That's the point I'm trying to determine.
Manao
10th June 2007, 19:25
That's the point I'm trying to determine.That point is different for everybody ( fro two reason : people have different quality/bitrate tradeoff, and people don't have the same "quality tastes" ), so there's no point in using SSIM to try to determine that point. Use your eyes.
Finally, what rate control did you use ? CRF ? CQ ? Average bitrate ? Two passes ?
descartes
10th June 2007, 20:15
That point is different for everybody ( fro two reason : people have different quality/bitrate tradeoff, and people don't have the same "quality tastes" ), so there's no point in using SSIM to try to determine that point. Use your eyes.
Sure, that point will be different for everyone. Deciding where to stop is just as difficult as choosing a bitrate: both require you to use your eyes until you're satisifed. The difference is that every encode has a *different* bitrate at which you'd be satisfied, but my hope is that a collection of encodes, all of which have bitrates determined by the derivative of the SSIM curve would look "about the same" to a viewer who only has to choose bitrate settings they like once.
Finally, what rate control did you use ? CRF ? CQ ? Average bitrate ? Two passes ?
Two pass. Here's the script I'm using:
mencoder -quiet -frames 15000 -ofps 24000/1001 -nosound -of rawvideo -ovc x264 -vf crop=704:480:6:0,scale,harddup -x264encopts bitrate=$bitrate:frameref=6:analyse=all:me=umh:subme=7:trellis=2:bframes=1:subq=7:brdo:mixed_refs:weight_b:bime:no_fast_pskip:direct_pred=auto:mixed_refs:nr=200:threads=auto:turbo=2:pass=1 -noskip $1 -o /dev/null
mencoder -quiet -frames 15000 -ofps 24000/1001 -nosound -of rawvideo -ovc x264 -vf crop=704:480:6:0,scale,harddup -x264encopts psnr:ssim:bitrate=$bitrate:frameref=6:analyse=all:me=umh:subme=7:trellis=2:bframes=1:subq=7:brdo:mixed_refs:weight_b:bime:no_fast_pskip:direct_pred=auto:mixed_refs:nr=200:threads=auto:pass=2 -noskip -o movie.264 $1
With changes to the crop size and framerate as necessary.
descartes
10th June 2007, 20:37
So, just to be really clear: the problem I think I'm solving is not "Oh no, I'm really lazy and want a computer to make all the hard decisions about how to encode my content."
It's "Oh no, I spent a lot of time deciding how to encode one or two pieces of content, but now I have a bajillion more, and I don't want to a) spend as much time fiddling as I already have, or b) just wing it and assume that the settings that were appropriate for a poorly mastered 1990's TV series are appropriate for the high def content I pulled off the air yesterday."
Manao
10th June 2007, 20:46
Everything you want to do let me think that CRF is really what you want. You don't care about bitrate, you care about reaching what you consider the acceptable quality. So don't bother configuring a bitrate based encode, just use CRF as foxyshadis advised.
descartes
10th June 2007, 21:12
That seems reasonable to me. Any advice on choosing a quantifier?
PuzZLeR
10th June 2007, 22:05
@Descartes:
Now, I’m not sure whether deltas quantitatively play a role, however, serving them up in a regression function, and using calculus to find out where the derivative seems to converge to zero could be one answer.
Another good test, which is something I’m running right now is converting MPEG-1 video to H.264. As you know, MPEG-1 is rather limited and has only a certain quality level, and I do believe the threshold between least lossiness and “padding” in the H.264 conversion would be much more defined in this case. I have tried higher and higher bitrates and quality levels to some clips only to find that the law of diminishing returns always applies there because I know I’m overkilling at some point. I’m now trying to determine which maximum CRF level (quantizer) does this, because the bitrate maximum differs in each clip. This may give you an idea how to apply this to MPEG-2 clips as well.
However, when testing quality levels, your best bet is indeed CRF. Even if you’re a 2-pass advocate, you should know that it never guarantees you a certain quality, only size. Like you said, you can’t take that bitrate assumption from old 90s footage, and apply it to yesterday’s HD footage. The quality level should definitely be different for that given bitrate. CRF will adjust that for you.
CRF is a different mindset and should be used differently from bitrate. Don’t think “What quantizer gives me so-and-so bitrate?” Think instead, “What quality do I want, and what is the highest quantizer’s quality that I will accept if I want the smallest file size possible?” This is especially true in your case and once you find one you like, you may have your answer that will apply to all video.
You, and only you, can determine this quantizer. It's your "taste" as Manao pointed out.
Now as far as your question whether CRF gives you the same quality as 2-pass with the same bitrate, a lot has been pointed out in the Quest for Constant Quality thread. Since then, in my experimentation I have found very little difference, if any advantage at all for either since. I do believe x264 has made great progress on this technology.
I guess if I were to pick the real winner it may be 2-pass in motion scenes (but only by a bit). I say “may” because I don’t have the greatest eyes in this. But does a reading from SSIM/PSNR really determine this? I don’t think so when that data is discrete points on a graph. Once more, you would need calculus to determine a rate of change between frames and motion detail. This would be a continuous function, not a discrete one.
@ Awatef:
BTW – I too have experimented with the DivX quantizers to find some correlation. I do like your formula with Xvid, but keep in mind though, that it’s all hypothetical and general and certainly not conclusive. However, it does give a good inference nevertheless. Thanks.
akupenguin
10th June 2007, 23:51
Now, I’m not sure whether deltas quantitatively play a role, however, serving them up in a regression function, and using calculus to find out where the derivative seems to converge to zero could be one answer.
If you pick a good regression function, you'll find that the derivative converges to 0 at qp=0, ssim=1. If you pick a simpler regression function it'll give some other answer and it'll be wrong.
There is no point of diminishing returns from any objective metric, there is only a point after which you personally can't see the difference.
The fact that the derivative looks like it's decreasing is just an artifact of how ssim is scaled. If you looked at psnr instead, you'd find that the derivative increases with bitrate, so much so that it goes asymptotically to infinity as you get to qp=0. This is not evidence for psnr being less accurate than ssim, it just means psnr is log scaled and ssim is linear scaled. If you measured log ssim, it would have the same pattern.
Another good test, which is something I’m running right now is converting MPEG-1 video to H.264.
Now that's different. There is a point of diminishing returns in that case, and it should be when the transcoded video's qp is approximately equal to the source video's qp (after applying mpeg<->h264 qp translation, of course). There is still some room for ratecontrol, and some tradeoff for how much you're willing to bloat the bitrate in order to slightly reduce the quality loss, but past a certain point you'll just be coding the mpeg1 artifacts and not real content. The ssim between the mpeg1 and h264 videos will continue to increase, so to objectively measure the tradeoff you have to have the original source, and measure ssim from the source and not from the mpeg1 video.
descartes
11th June 2007, 00:23
The fact that the derivative looks like it's decreasing is just an artifact of how ssim is scaled.
So the point at which the derivative of SSIM crosses some line has nothing to do with the encoded content? Obviously, if you draw your line at the point where the derivative is zero you haven't accomplished anything, but are all lines equally as meaningless?
The question is: if I took a bunch of samples and encoded them at a bunch of bitrates, plotted the derivative of SSIM vs. bitrate, and drew a line across that curve at some point > 0, would comparing those intersections be meaningful in any way? For example, could you say that the sources that required higher bitrates to reach that threshold were more complicated, or more difficult to encode? Or is it just completely meaningless?
Awatef
11th June 2007, 00:27
Well thanks Puzzler.
I actually go even further sometimes and let XviD's first pass go over the whole source to get an accurate bitrate, and I found that staying over half of that bitrate is a guarantee for acceptable results (quantizers come out the same always, proportionally to the percentage from the original bitrate ;)).
As H.264 is supposed to be about 25% more efficient than XviD, I came out w/ that -25% formula. H.264 would have no sense for me anyway if it wouldn't be at least that efficient, considering the huge extra amount of time that it needs.
akupenguin
11th June 2007, 01:03
The question is: if I took a bunch of samples and encoded them at a bunch of bitrates, plotted the derivative of SSIM vs. bitrate, and drew a line across that curve at some point > 0, would comparing those intersections be meaningful in any way?
If you scale ssim properly, i.e. multiply ssim by resolution*fps (or equivalently, use d(ssim)/d(bpp)), then yes it's meaningful. What you describe is Lagrangian rate-distortion optimization. Equalizing the derivatives of the (scaled) ssim-vs-bitrate curves for several movies is equivalent to maximizing the total ssim of all the movies for any given total filesize.
Now, maximizing total quality may or may not be what you want. It's not constant quality, but it is a valid thing to do.
Reminds me of the RDRC patch...
foxyshadis
11th June 2007, 01:33
So the point at which the derivative of SSIM crosses some line has nothing to do with the encoded content? Obviously, if you draw your line at the point where the derivative is zero you haven't accomplished anything, but are all lines equally as meaningless?
The question is: if I took a bunch of samples and encoded them at a bunch of bitrates, plotted the derivative of SSIM vs. bitrate, and drew a line across that curve at some point > 0, would comparing those intersections be meaningful in any way? For example, could you say that the sources that required higher bitrates to reach that threshold were more complicated, or more difficult to encode? Or is it just completely meaningless?
You can coarsely differentiate this way. You can probably tell whether one scene is much easier to code than another, but the closer the ratios get, the more that measurement error gets in the way - source quality, metric imperfection, stuff like that. I don't know any good rules of thumb (though the deltas you had earlier were probably well under that threshold), but always keep in mind whether you need a linear or log representation, and you'll probably want to work with 100-ssim instead of the raw values.
vBulletin® v3.8.5, Copyright ©2000-2012, Jelsoft Enterprises Ltd.