How to focus bitrate on video's POI? [Archive]

View Full Version : How to focus bitrate on video's POI?

JELSTUDIO

19th February 2019, 16:53

I don't know if this is the place to ask, of if it really pertains to some other codec, but I will try.

Say we have a video of an interview.
2 people and an almost static background (perhaps some trees in gentle wind or a road with a car from time to time or a house with some smoke coming up the chimney)

How do I get the codec to focus its bitrate on the 'talking heads' and not share it equally between the less interesting background and the point-of-interest (the 'talking heads' in this case)?

Or, as another example, a static non-moving camera showing a video of a lawn with a cat walking through the frame.
Which settings would leave the frame mostly a still-image and use most bits on the moving cat?

Objective: to limit the needed bits to the minimum (I guess that's obvious :) ) required to make the POI look fair in quality (the cat or the 'talking head', while letting the background 'suffer' from lower bitrate)

Maybe it's simple and I just don't know the correct term, but I'm hoping somebody knows what I'm talking about (and perhaps how to achieve it or where to find further info on how to achieve it)

I'm testing using handbrake (perhaps not the optimal tool for this?), but the 'extra options' (what, and how to syntax it, in that box) is a bit over my head still.

Thanks
jacob.

benwaggoner

19th February 2019, 20:49

What's the command line you're using now? By default the higher presets do a lot of adaptive quantization that can help this use case. The --aq-motion experimental mode might help (or hurt).

What you are really asking for ROI-specific adaptive quantization based on face detection and tracking, which is very common in videoconferencing. A very basic implementation would lower the QP of the ROI by 2, and then raise the background as needed to maintain bitrate.Patching the x265 library to allow ROI is probably not a huge task for a skilled engineer familiar with the code base.

We had ROI feature back in the PEP VC-1 encoder and it could make a huge difference in encoding some kinds of content, like mixed natural/synthetic.

Of course, you'd also need to know the coordinates of the face per frame, which is out of scope for x265. Bear in mind people move around, so the ROI would be dynamic. Youd'd also need a strategy to blend the ROI region with the surrounding region so you don't get a blocky seam (like tapering from -3 to -1 QP at the edge). Lowering the qg-size to get finer granularity would help.

If you have a way to define the ROI at least, blurring and applying noise reduction to the background area would reduce bits required there and thus increase quality in the face region, without having to patch x265. Bumping up --psy-rd in that case could also help.

Motenai Yoda

20th February 2019, 00:01

IIRC x265 got a ROI patch some time ago

benwaggoner

20th February 2019, 00:07

IIRC x265 got a ROI patch some time ago
Oh, good.

Of course, determining what the ROI IS is the hard part.

JELSTUDIO

20th February 2019, 06:05

What's the command line you're using now? By default the higher presets do a lot of adaptive quantization that can help this use case. The --aq-motion experimental mode might help (or hurt).

I am using handbrake and it doesn't list what options are available to input in the 'extra options' box for the various codecs.

It also doesn't list how to syntax them so the chosen encoder can understand them.

However, your mention of 'higher presets' and '--aq-motion' helped me along with both handbrake and google :)

I found this web-page: https://x265.readthedocs.io/en/default/cli.html where there appears to be a list of all the available options (for x265 at least) and a description of what they do.

And in handbrake I discovered, by selecting a 'higher preset' (I hadn't really tried the presets before, but only been using the manual options), that some of them actually fills in information into the 'extra options' box. So that at least gives a hint on how handbrake expects the syntax in that box to be.

So with that I can begin some experimentation :)

Thanks a lot :)

What you are really asking for ROI-specific adaptive quantization based on face detection and tracking, which is very common in videoconferencing.

No doubt face-detection, or perhaps skin-tone detection as a more simplistic method, would be a great option to have.

If you have a way to define the ROI at least, blurring and applying noise reduction to the background area would reduce bits required there and thus increase quality in the face region, without having to patch x265. Bumping up --psy-rd in that case could also help.

Currently the codec (when using the manual options in handbrake, such as speed and bitrate) seems to give more weight to 'background' than 'foreground'.

When I lower the bitrate, the first thing that get 'ugly' is the faces and then the background.

I can use a lower bitrate with the background than the foreground and since I only have one bitrate setting this means I end up with more bitrate used on the background than necessary (per my own subjective quality needs of course :) )

So the ROI is more or less already defined by the codec. I just need to find a way to have the codec use less of the bits on the background and spend them on the foreground instead.

Background being large flat areas (such as walls and relatively static content), and foreground being all the moving things (such as facial expressions and people moving around)

A frequency separated weighting perhaps (Which I assume is what the codec already uses in some form. I just want to modify the weights given to low vs high frequencies, such that the single bitrate I end up using makes both foreground and background 'ugly' at the same setting, so to speak :) )

I will experiment a bit with this now that I have some info to go on, and see if anything comes off it :)
Thanks again for the help.

/////////////////////////
EDIT:

First few tests done.
Turning "--aq-mode" OFF (entering "aq-mode=0" in handbrake's 'extra options' box) appears to shift weight slightly more from static objects to moving objects.
I get a sharper face and more artifacts on walls. It's a beginning :)

Boulder

20th February 2019, 17:57

I think that cutree does shift bits towards flat areas because they are referenced more often in other frames. You might want to try disabling that as well.

Motenai Yoda

21st February 2019, 02:35

https://bitbucket.org/multicoreware/x265/commits/a3a99fa18cbca8e00647a63ad6f498a828f78366

JELSTUDIO

21st February 2019, 19:25

benwaggoner

21st February 2019, 21:36

I did an extensive set of tests (including the cutree option, and many others) and compared until I found the settings I preferred.

This is the line I entered into the 'extra options' in Handbrake:

aq-mode=0:rd=6:rect:amp:cu-lossless:rd-refine:strong-intra-smoothing:rdoq-level=1:psy-rdoq=50:deblock=6:6
(it says ' psy-rdoq=50 ', not a smiley of course)

(Using X265 set to 'medium' and with fast-decode unchecked)

This gave me an image quality at around 2000 kbps that worked for the video I worked with (faces were sharp and walls and doors and floors were blurred a bit)
I would be stunned if --cu-lossless made that much of a difference unless you had static synthetic elements on the screen. If --cu-lossless actually helped, you could probably get even better results from --tskip.

The other options seem pretty nuts too, but possibly in interesting ways! Can you share your source? deblock 6:6 will reduce detail AND artifacts, certainly.

Did you compare just using --preset slower?

And then I discovered that the Chrome-browser can not play x265 videos :(

I should have checked that first, but oh well lesson learned.
Yeah, Firefox and Chrome don't support HEVC. They used to support passing on to a HW decoder, but then explicitly blocked that from working for political reasons.

So I googled around and found out which video-types the various web-browsers support, and are now focussing on WebM-VP9 instead.

You might find x264 works better than VP9 for your use case. x264 is MUCH more mature and flexible than libvpx.

SeeMoreDigital

21st February 2019, 22:31

...Yeah, Firefox and Chrome don't support HEVC. They used to support passing on to a HW decoder, but then explicitly blocked that from working for political reasons.Bonkers isn't it.

I've not checked recently but does anyone know if Youtube is HEVC friendly?

nevcairiel

21st February 2019, 22:33

I've not checked recently but does anyone know if Youtube is HEVC friendly?

If by friendly you mean that you can upload it, sure. On intake they take almost anything.

SeeMoreDigital

21st February 2019, 22:58

If by friendly you mean that you can upload it, sure. On intake they take almost anything.How about Youtube HEVC playback via various browsers?

poisondeathray

21st February 2019, 23:29

How about Youtube HEVC playback via various browsers?

Youtube does not stream HEVC .

You can upload HEVC, but their re-encoded versions that end users get are never HEVC

JELSTUDIO

22nd February 2019, 14:23

I would be stunned if --cu-lossless made that much of a difference unless you had static synthetic elements on the screen. If --cu-lossless actually helped, you could probably get even better results from --tskip.

The other options seem pretty nuts too, but possibly in interesting ways! Can you share your source? deblock 6:6 will reduce detail AND artifacts, certainly.

Did you compare just using --preset slower?

Yeah, Firefox and Chrome don't support HEVC. They used to support passing on to a HW decoder, but then explicitly blocked that from working for political reasons.

You might find x264 works better than VP9 for your use case. x264 is MUCH more mature and flexible than libvpx.

The source-video "retest.7z" (28 MegaBytes): https://1drv.ms/f/s!Ap4PGUC6dXtHgRUAbIFdK8aGuGbW

In there are 3 videos:

The source "SpejderSourceTestBest.mov" is a 10 second clip exported from Davinci at the setting called 'best'. The footage is originally from an old VHS after it has been denoised. This is the master-material.

And then 2 of the test-renders:

"01_x265_a200_v35_medium,53s.mp4" is with 'my' settings:
speed: medium
tune: none
fast decode: OFF
profile: auto
extra options (labeled 'v35' as it was my 35th test, and entered as this in handbrake): aq-mode=0:rd=6:rect:amp:cu-lossless:rd-refine:strong-intra-smoothing:rdoq-level=1:psy-rdoq=50:frame-threads=1:deblock=6:6
render-time: 53 seconds

"01_x265_a200_v0_slower,54s.mp4" is with blank settings in the 'extra options' box and just using the 'slower' preset instead of medium.
speed: slower
tune: none
fast decode: OFF
profile: auto
extra options (labeled 'v0' as it was my 1st test): none
render-time: 54 seconds

For both test-renders:
average bitrate used was 200 kbps single-pass.
constant framerate of 60 fps
audio-track not included

The differences are not huge (between using 'slower' preset with no options and 'my' preset with options), and on some frames 'my' settings produce a worse result. But overall, on the majority of frames, I prefer what I get with 'my' settings (for this video at least, and for what I choose as ROI of course :) )
A different combination of settings might very possibly be better, but testing all these variations obviously takes a lot of time and I only just figured out how to even input all these options into handbrake :)

I did not include the x264 render, because that one was just a block-fest at this bit-rate. It only took a few seconds to render though, even though I set it to 'slower', so the fastest, but ugliest, of all options.
In Handbrake I left the 'extra options' blank for the x264 render.

I did not include the WebM-VP9 either (although I'm pretty sure already this is the codec I will use from now on), but in my opinion it lands somewhere between the X265-slower and X265-JEL (the render with 'my' settings), and took 15 seconds to render, so quite a lot faster and with roughly similar overall quality (at my chosen ROIs) to the two x265 renders.
In Handbrake I used this line in the 'extra options' box for VP9, but I'm not sure all of them actually get parsed (the speed does for sure, but the weights appeared to make no difference at all. I need to test it further to be sure though, but I can't see any visual difference in the few tests I have done with VP9) : good:cpu-used=5:minsection-pct=0:maxsection-pct=1000:bias-pct=100:min-q=0:max-q=63:threads=1:static-thresh=10000

I included a picture here of the 4 test-renders (all done at 200 kbps-average single-pass). I upscaled it 200x (using no interpolation) just to give a quick crude idea of how each of them look.

And then a frame-grab (shown at 200% in Vdub) between x265-slower and x265-'my' settings, where I point out 1 ROI, the faces, and 1 'RONI' (region of non-interest :) ), the dot on the wall.

They both were too big for the forum though, so it converted them to jpg, but they still show the general idea.

And finally; the full movie (both the x265 and vp9 render) is uploaded here (just in case somebody is curious :) ) : https://archive.org/details/JELSTUDIO1993Selvsyn9ViRetterLinsenModEnSpejder

benwaggoner

22nd February 2019, 21:32

Net, net, HEVC for the win :).

As you increase in-loop deblocking strength, the filtering reduces frequencies and QP will drop. So you get a softer-looking image with fewer artifacts. The combination of really high RDOQ and really high deblocking is a novel one to me, and looks to defeat the purpose of --deblock 6:6. v0 certainly looks superior to v35 to me. The RONI is distractingly bad in v35, and the ROI shows a lot more ringing.

x265 has most of its tuning done around typical parameters, and they are typical because the HM and other experimentation have shown them to be generally appropriate. Going out to the edges means that you're far away from what's generally worked before, AND far away from what's gotten detailed psychovisual tuning.

Did you try --aq-mode 2 as well? I would think that might have been helpful.

JELSTUDIO

24th February 2019, 12:03

I tested aq-mode zero, 1, 2 and 3 and all except zero softened the image to my eyes.
Aq-mode at zero was the sharpest looking to me (for my chosen ROIs of course AND my subjective opinion :) )

Anyway, with Chrome not playing x265 videos my options with it are limited.