PDA

View Full Version : ROI encoding in XviD


RadicalEd
27th March 2004, 00:33
I had an interesting thought today while pondering on the usefulness of adaptive quantization (in general, not in the currently implemented lumimasking sense). How well would region-of-interest encoding faire? In my experience with still-image examples, it does work rather well when looking directly at the ROI. Naturally, of course, it sucks when not. That makes it a bit questionable for still-image encoding. In video, however, the focus is normally fixed to the relative motion in the frame, so it may be somewhat worthwhile after all.

So it's better to center the quality around the action and polarize the background/edges, this Doom has pointed out many a time in his codec comparisons. It should be possible to utilize the motion vectors to calculate a bias towards the center of action/motion. From there quants could increase parabolically towards the edges of the frame; the larger the vector, the deeper the parabola. This of course with a cutoff, obviously there's a point where high motion becomes harder to focus on than the surroundings. So too large a motion vector, and perhaps you could apply more quantization to it.

Then of course there's the scenario of global motion, where you'd want the opposite -- the background is the moving part, the focus is static. Using the same method that GMC does to consolidates global motion, the proposed AQ effect could be inverted in the case that such motion exists. Or, better yet, calculated against the global motion. Then you'd have a relatively fail-proof center of action in any normal scenario.

Ideas?

Manao
27th March 2004, 00:43
Interesting...

However, firstly, ROI are suboptimal PSNR wise, so it would be rather hard to test the benefit of such a method ( as it is for each psychovisual enhancements ).

Secondly, the MPEG-4 norm doesn't allow you to make the quant vary too much inside a frame : you only have the +2 +1 0 -1 -2 choices. So you would not be able to quantize as much as you might want the border of the frame.

Yet, I'll keep the idea in mind for the next avisynth filter I'll make :)

RadicalEd
27th March 2004, 01:39
Hmm, I wasn't aware of the variation limit placed by the standard. Ghey.

Though I also failed to take into account the fact that such a thing would only be beneficial with extremely tight I-frame distances. Since the unmoving parts of the frame aren't being quantized except in I-frames, and I frames have no MV information :|
Bah, you know what they say about genius and common sense being inversely proportional.

Still would have been somewhat useful for relative motion, though.

sysKin
27th March 2004, 02:09
Originally posted by RadicalEd
Then of course there's the scenario of global motion, where you'd want the opposite -- the background is the moving part, the focus is static. Using the same method that GMC does to consolidates global motion, the proposed AQ effect could be inverted in the case that such motion exists. Or, better yet, calculated against the global motion. Then you'd have a relatively fail-proof center of action in any normal scenario.Can I point out that you want to reverse your AQ just because GMC is used in a frame? I wouldn't say psychovisual vision depends on codec details, it only depends on the sequence.

Anyway, this is all great, but you'll be the one to code it. Generally it will be easy to add such stuff to xvid once I implement my idea of HVS plugin system - you'll just copy exisiting plugin (like lumimasking), write your own internals (you will get access to picture, motion vectors etc) and follow the rule to output HVS data: importance of HF componenets, LF components, visibility of false motion etc, for every block (including chroma). You'll have all the fun you can get :)

Originally posted by Manao
However, firstly, ROI are suboptimal PSNR wise, so it would be rather hard to test the benefit of such a method ( as it is for each psychovisual enhancements ). Yup. But it will not stop us implementing psychovisual stuff, do not worry.
Secondly, the MPEG-4 norm doesn't allow you to make the quant vary too much inside a frame : you only have the +2 +1 0 -1 -2 choices. So you would not be able to quantize as much as you might want the border of the frame.Untrue. It can only change by up to 2 from one macroblock to another, but it can change as much as you want within a frame.

Regards,
Radek

RadicalEd
27th March 2004, 03:15
Originally posted by sysKin
Can I point out that you want to reverse your AQ just because GMC is used in a frame? I wouldn't say psychovisual vision depends on codec details, it only depends on the sequence.


What I meant was this: if we're using an algorithm that over-quantizes static parts of a frame - normally the background - and under-quantizes moving parts - normally the focus of the action, the concept fails during global motion. In the case of completely global motion, no quantization differences need be made. However, if there is a static object against a moving background (think the weapons scene), we need to do the opposite of what the algo normally does and under-quantize the static, over-quantize the moving. I was (too vaguely) referring to GMC as an analogious system of discerning the said cases.

Btw, the new plugin system sounds good. The quant-differences correction is good news as well. :up:

Manao
27th March 2004, 03:23
The quant-differences correction is good news as well. :up:Yep, my bad, I should have checked before speaking. But with what sysKin says, what is exactly the meaning of the quant of a frame ? Mean quantizer ? Median quantizer ?

sysKin
27th March 2004, 06:57
Originally posted by Manao
Yep, my bad, I should have checked before speaking. But with what sysKin says, what is exactly the meaning of the quant of a frame ? Mean quantizer ? Median quantizer ? This is the quant we start with - normally quant of the first macroblock, but in theory even the first MB is allowed to immidietly change it before its value is used (just a waste of bits, but possible).

Manao
27th March 2004, 09:14
In that case, the quant info shown by the status windows, or by ffdshow at decoding is useless, am I wrong ? We should in fact not speak of the quant of a frame ?