Suggestion for x265's --tune film [Archive]

View Full Version : Suggestion for x265's --tune film

Pages : [1] 2

littlepox

10th August 2015, 18:58

Several days ago we asked the x265's team how could we participate in the discussion, and they left me with e-mail address. When we finished writing the email, we are also glad to post it here so all of you are welcome to discuss about the topic: How to keep the film grain and details at higher quality when using x265?

We take several movie scenes with moderate level of film grain, rich details, and some dark scenes. Then we take out the tests between x264-10bit and x265-10bit:

(NOTE there are --input-depth 10 for both sets of parameters. Do NOT just copy and paste if you are not encoding 10-bit input)

x264: --preset veryslow --tune film --crf 19.0 --qcomp 0.75 --input-depth 10(a fairly trusted setting )

x265: --preset slower --ctu 32 --max-tu-size 16 --crf 20.0 --tu-intra-depth 2 --tu-inter-depth 2 --rdpenalty 2 --me 3 --subme 5 --merange 44 --b-intra --no-amp --ref 5 --weightb --keyint 360 --min-keyint 1 --bframes 8 --aq-mode 1 --aq-strength 1.0 --rd 5 --psy-rd 1.5 --psy-rdoq 5.0 --rdoq-level 1 --no-sao --no-open-gop --rc-lookahead 80 --scenecut 40 --max-merge 4 --qcomp 0.8 --no-strong-intra-smoothing --deblock -2:-2 --qg-size 16 --pbratio 1.2
(We've tested hundreds of parameter combinations and we conclude this set is the best outcome)

We compare the quality with our eyes, NOT psnr/ssim, which favors blurriness over grain keeping.
x264 gives an 118% size compared to x265, with nearly the same visual quality. We can also drop the bit-rate of x264 a bit, but then AVC outcome necessarily looks worse. At this point we believe x265 claims a victory, not at extremely low bit-rate cases where HEVC blurriness looks better than AVC blockiness, but rather a high-quality, near-transparent encoding.

Our test focus on several RC parameters: --ctu, --tu, --crf, --qcomp, --aq-mode, --aq-strength, --psy, --qcomp --qg-size...These parameters can alter RC behavior with minor impact on the speed. We took some iterative method with genetic algorithms to select the possibly best parameter sets out of infinite.

Here are some of our comments:
--ctu, --max-tu-size: This two should be decreased in 1080p @ high-quality encoding. Keeping other parameters constant, tweaking --ctu 64->32, --tu 32->16 gives even better quality with nearly 15% of size-decrease, in crf mode. We assume that adopting larger CU and TU actually wastes the bit-rate by producing over-smoothed block, forcing bit-rate to rise in order to keep a "constant rate-factor". So we'd better just leave them smaller, at least under 1080p

--crf: x265 gives a default value 28. This is way too low to compete against x264's default value 23. Many users believe by tweaking crf they can choose between quality and size without other concerns, but they are wrong; sometimes, tweaking other parameters is more efficient. --crf need to be coupled with other parameters properly tuned.

--qcomp: the lower your crf, the higher the quality, and you should set --qcomp to be higher. This is also true for x264. Our test suggest --qcomp 0.8 is very essential when x265_crf<23.

--aq-mode: keep it at 1. aq-mode=3 gives wired bit-rate for nothing.

--aq-strength: default 1.0 is pretty good.

--psy: the primary weapon to fight blurriness. --psy-rd 1.5 --psy-rdoq 5.0 --rdoq-level 1 works better than --tune grain's settings, trust me.

--deblock -2:-2: This is also a rule of thumb since x264's era: using smaller numbers if you prioritize quality.

--qg-size 16: The larger number you set, the smaller file you get, but much worse quality for trade. Just change it to 16 instead of 64; this is worthy.

--no-sao: SAO smooths everything up. Do NOT use it unless you really want to.

--me_range 44: me_range has minor impact on speed when coupled with low level of me and subme; but that's not true for --me 3 --subme 5. Decrease it a bit in 1080p do you a favor with little sacrifice.

--no-rect --no-amp: They trade time for nothing. Do NOT enable them unless you really have time to waste.

So, here are our suggestions for x265 --tune film:

--ctu 32 --max-tu-size 16 --qcomp 0.8 --aq-mode 1 --aq-strength 1.0 --psy-rd 1.8 --psy-rdoq 5.0 --rdoq-level 1 --deblock -2:-2 --qg-size 16 --no-sao --me_range 44 --no-rect --no-amp

By setting these parameters, you can probably set your crf to be around 22~19, and it gives you considerably good output with small size. if you set crf=21~22, --qcomp 0.75 is recommended.
We don't see any advantage for x265 if you are targeting at x264_crf<=16. At that point x265 requires even more bit-rate to keep the details.

Furthermore, here are some other recommendations:

1. Except for --tune and --preset, add another parameter "--expectation" and let user specify whether he wishes a high, medium or low quality(with respective size). This can also be derived if --crf is specified. Then use the input to change the rc behaviors. For example, if the user is expecting high quality rather than a crf=28 blurriness, increase --qcomp to 0.8.

2. Change RC behaviors and speed options based on resolution. 720p/1080p/1440p/4K don't optimally share a same set of parameters. For example, --max-tu-size --ctu --me_range.

3. Give less bias/more penalty to bigger CUs and TUs, especially at higher quality expectation and lower resolution. --rd-penalty 2 is a very useful parameter, but it only prohibits TU size of 32x32. Let's impose more penalties to similar cases.

Thank you for patiently reading this email, and I'm ready to conclude it:
So far x265 team has done a fantastic job by offering the best encoder (at low bit-rate), but let us greedily urge more. We wish x265 to be best not only at low quality, but also best overall. At the same time, we are willing to help the team with our own effort.

Regards,
LittlePox

2015/08/11

Boulder

11th August 2015, 13:34

Is your test material regular movies or anime etc? I could try your suggested settings with my Hot Fuzz test clip and see if there is any improvement (and also compare to x264).

sneaker_ger

11th August 2015, 14:48

littlepox

11th August 2015, 18:04

Thank you for sharing! I added it to my latest grain test (http://forum.doom9.org/showthread.php?p=1733664#post1733664) but since this is a suggestion for --tune film (not --tune grain) I will also try a less grainy source in the near future. There's definitely a difference, often less blurry (in exchange for other artifacts, though).

Thanks for the test; a few points I'd like to raise:

1. I don't know whether your equivalent crf fits for my intervals or not. Can you try a bit and see approximately how much crf is required to generate such bit-rate?

2. As I suggest, this is --tune film, not --tune grain. Currently don't even daydream to use x265 for proper grain keeping.

3. Can you point out what are the other artifacts? With your suggestion we can probably make some adjustments accordingly.

PS: If you are to talk about distortion, I'd say this is the side effect of psy. Or rather, this is what psy is supposed to be: people's eyes don't care some distortion since they don't compare it with the source frame by frame; they just want a non-blurry, similarly complex video. We believe for general film clips, the outcome products alone look quite acceptable.

sneaker_ger

11th August 2015, 18:30

1. crf 19.5~20 for --preset slower --tune grain, so below your target range
3. Look near his right (from viewers pov left) ear:
http://abload.de/img/1060_org_gqayh.png
http://abload.de/img/1060_x265_pzstp.png
http://abload.de/img/1060_littlepox_fhxsf.png
But as you say it's expected.

Sagittaire

12th August 2015, 00:04

Well I read your encoding profil:

x265: --preset slower --ctu 32 --max-tu-size 16 --crf 19.7 --tu-intra-depth 3 --tu-inter-depth 3 --rdpenalty 2 --me 3 --subme 5 --merange 44 --b-intra --no-rect --no-amp --ref 5 --weightb --keyint 720 --min-keyint 1 --bframes 10 --aq-mode 1 --aq-strength 1.0 --rd 5 --psy-rd 0.7 --psy-rdoq 5.0 --rdoq-level 1 --no-sao --no-open-gop --rc-lookahead 80 --scenecut 40 --max-merge 4 --qcomp 0.80 --no-strong-intra-smoothing --input-depth 10 --deblock -2:-2 --qg-size 16 --pmode

--ctu, --max-tu-size --qg-size: at high quality, if you want keep grain, it must be theoricaly usefull to use smaller value.

--qcomp: qcomp is RC compression. value 1.0 must produce highest possible bitrate variability (constant quantizer without psy and without type frame ratio). Use high bitrate variability for high quality encoding is theorically good (high complexity scene will have better quality)

--deblock -2:-2: deblocking is itself adaptative with quatizer mode. Low quantizer mean less deblocking. Anyway -2;-2 is not bad value and will produce better high frequency conservation but more blocking/ringing too.

--no-rect --no-amp: bad idea. It's theorically always usefull for quality and keep high frequency. Anyway it's really bad for speed.

--bframes 10: really bad idea. More than 3 bframe for high quality and keep noise is theorically bad. Moreover it's really bad for speed.

--keyint 720: bad idea. more than 10x the framespeed don't produce mesurable better quality. Moreover it's really bad for decoding.

You forgot really usefull setting for high quality encoding if you want keep noise: --ipratio 1.1 --pbratio 1.1. It's like qcomp. It's usefull at high quality to have better constant quality (no PBP or IBP flicking, keep noise in bframe).

littlepox

12th August 2015, 02:13

Well I read your encoding profil:

--ctu, --max-tu-size --qg-size: at high quality, if you want keep grain, it must be theoricaly usefull to use smaller value.

--qcomp: qcomp is RC compression. value 1.0 must produce highest possible bitrate variability (constant quantizer without psy and without type frame ratio). Use high bitrate variability for high quality encoding is theorically good (high complexity scene will have better quality)

--deblock -2:-2: deblocking is itself adaptative with quatizer mode. Low quantizer mean less deblocking. Anyway -2;-2 is not bad value and will produce better high frequency conservation but more blocking/ringing too.

--no-rect --no-amp: bad idea. It's theorically always usefull for quality and keep high frequency. Anyway it's really bad for speed.

--bframes 10: really bad idea. More than 3 bframe for high quality and keep noise is theorically bad. Moreover it's really bad for speed.

--keyint 720: bad idea. more than 10x the framespeed don't produce mesurable better quality. Moreover it's really bad for decoding.

You forgot really usefull setting for high quality encoding if you want keep noise: --ipratio 1.1 --pbratio 1.1. It's like qcomp. It's usefull at high quality to have better constant quality (no PBP or IBP flicking, keep noise in bframe).

--ctu --max-tu-size --qg-size: Tests suggest this is the best combination; and --qg-size is already set to be the smallest value. If you wish to further more refine the quality based on this set, dropping crf instead of these ones.

--qcomp/--deblock: Our choice should balance everything pretty well. Using a qcomp larger than 0.8 is no better than dropping crf at this point.

--no-rect --no-amp: We've tested multiple times, at different scenario, unfortunately never in a single case do we see any benefit of these two parameters. quality and size are indistinguishable between turning on and off.

--keyint 720: I'd agree with the decoding speed, but in practice, the larger your keyint is, the more efficient you get. x264/x265's scenecut detection is nearly perfect so that IDR will be inserted at the optimal location. Manually forcing to insert an I frame at an undesired position may benefit in the quality but hurt the overall efficiency. Anyway, dropping it to something like 360 doesn't change much, so you don't need to follow exactly.

--bframes 10: Again it worked during the test. better than --bframes 3/5. We don't see any need to go beyond 10, and speed punishment is indeed very little. bframes 3->10 adds less than 5% of encoding time.

--ipratio 1.1 --pbratio 1.1: We are not there ready for near-lossless encoding. We've included these two in the test, including a less aggressive pair(1.3, 1.2), but we find that they are not as useful as thought. Better just drop crf instead of these two if you wish for even better quality.

Sagittaire

12th August 2015, 12:02

--keyint 720: I'd agree with the decoding speed, but in practice, the larger your keyint is, the more efficient you get. x264/x265's scenecut detection is nearly perfect so that IDR will be inserted at the optimal location. Manually forcing to insert an I frame at an undesired position may benefit in the quality but hurt the overall efficiency. Anyway, dropping it to something like 360 doesn't change much, so you don't need to follow exactly.

No it's false. IFrame have between x2 or x3 PFrame size. Make simple calcul show that if you replace 720 keyint vs 240 keyint, you have theorically simply 0.5% for max gain size. In real encoding it's even less because overall keyframe interval is really less than 720 frame or even 240 frame. Use 24 frames for GOP is simply less than 8% for size efficiency.

--bframes 10: Again it worked during the test. better than --bframes 3/5. We don't see any need to go beyond 10, and speed punishment is indeed very little. bframes 3->10 adds less than 5% of encoding time.

No it's false because x265 in real encoding will never place 10 consecutive bframe. The value for bframe in x265 is generally 0,1,2 or 3 Bframe in most case (more than 95% of real situation). Use 10 bframe vs 3 bframe is perhaps usefull for really flat anime. Certainely not for grainy movie (in this case never x265 use 10 consecutive bframe). You can expect less than 1% gain size for 5bframe vs 10bframe in all real scanario.

--ipratio 1.1 --pbratio 1.1: We are not there ready for near-lossless encoding. We've included these two in the test, including a less aggressive pair(1.3, 1.2), but we find that they are not as useful as thought. Better just drop crf instead of these two if you wish for even better quality.

If you want make high encoding quality, you must use low ratio between frame type. Defaut ratio are better for overall efficiency but introduce massive higher quantizer for bframe (something like -1 for Bframe, -2 for Pframe and -3 for IFrame). Imply that you can have good noise conservation for IFrame and Pframe but not for Bframe and bframe.

Sagittaire

12th August 2015, 12:18

Moreover i don't understand that:

Better just drop crf instead

If you want make serious test, you must compare encoding at same size (multipass mode) and not at same crf.

I am skeptical about the seriousness of your test when you indicate that 10 bframe is really usefull for quality because in real encoding never x265 use 10 consecutive bframes (bframe placement is adaptative and x265 will certainely use just 0, 1, 2 or 3 Bframe in majority of case for noisy source). 10 Bframe is just placebo effect here. Moreover if you compare bframe number at same crf, x264 will always produce better quality without bframe (or with less bframe) but with higher output size simply because bframe will use really higher quantizer than Pframe in crf mode.

littlepox

12th August 2015, 13:56

OK let me explain the "genetic algorithms" we applied in order to clarify something, and this is my suggestion for parameter testing:

x265 parameter sets are of one species, and a single set of parameters is an individual in the species.
An individual is characterized by its genetic strands, and a parameter set consists of parameters.
An individual can reproduce its children with most of their genetic strands similar to the parent, and one parameter set can be modified to more sets by changing one or several parameters each.
In most cases, we don't have abrupt genetic mutation, but seldom we do. We tweak the parameters step by step, but seldom we do add wired changes hoping to make a difference.
Children with different genetic strands are subject to natural selection, and parameter sets are subject to quality benchmarks, either by pnsr/ssim or by human eyes.

We start from a small protobiont, a parameter sets with very low settings like --crf 30 --aq-strength 0.3 --qcomp 0.4...... You don't expect anything good from it but very small bit-rate.
We then drive this individual to grow and evolve by strengthening some parameters, for example:
dropping crf a bit gives better quality, but also bigger size;
increasing aq a bit gives better quality, but also bigger size;
--ipratio 1.1/pbratio 1.1 give better quality, but also bigger size;
......

You make each mutation once, properly control mutated individuals to be at similar size(in our test, the threshold is max/min<105%. If violated, say we dropped too much crf and get larger outcome than other children, we remade the mutation with less change to the crf in order to fit the size with other siblings.) and you select the best individual(s).
You then keep this best boy alive and let it grow furthermore. most of the children come from changing one parameter, but some are made by changing multiple parameters together.

Every batch you have children with similar size, and you choose the best one(s) to reproduce the next generation. Gradually, the bit-rate increases, quality refines, and parameter sets become more and more optimal.

when I say "dropping crf a bit instead of tweaking ibpratios", I mean:
1. if you tweak ibpratios alone, you get better output with bigger size
2. but you can also drop crf, increase aq, increase qcomp... to reach a similar, bigger size.
3. We tested that starting from the set we give, dropping crf is still better than tweaking ratio, and many other possible choices you can name.
4. Probably when the species grow even bigger, say we reach the situation near x264_crf=17, then tweaking the ratios become the best choice at that evolution stage.

Furthermore, I'm not quite interested in the topic regarding --keyint and --bframes. In the test their behaviors classified them as unimportant genetic strands. It means they don't have much impact on the results. Set whatever reasonable numbers you want; it just doesn't matter much. Even I said --bframes 10 is better than 3/5, the difference is indeed very small, about 1% as you said. The converse is also true: changing --bframes 10 to 5 makes very little difference, both quality-wise and time-wise.

I've also not included them in the recommended parameters, meaning they are out of the scope of --tune film. It's perfectly fine you set --keyint 250 --bframes 5 as you wish, but there is nothing much to criticize if I set them higher. And there is a reason for doing so in a test: I just wish to set it high enough so it won't possibly become any bottleneck.

Keiyakusha

12th August 2015, 14:56

I agree about coding unit size. I wouldn't use more than 16 for SD and more than 32 for hd. I vote for a somewhat dynamic preset/tune, not fixed value for all sources.
qcomp adjustment is really a must have for high quality encodes. As well as removing that nasty cu-tree, but the thing is, all x264 settings and I think x265 as well aren't really tuned for high quality encodes. They are tuned to keep as much quality as possible by throwing away everything that average people might not notice during realtime playback. By my book "high quality" is when you can't see the difference by a naked eye even at frame by frame comparison. Or if you do, you can't tell which one is the source, no matter if its a static or fast-moving (part of the) scene. But then I only use this for anime myself. To make live action virtually lossless and to keep it at reasonable (for me) sizes is not really possible. So I think this will not be accepted.
--deblock -2:-2: This is also a rule of thumb since x264's era: using smaller numbers if you prioritize quality
Never heard about such a rule and can't imagine using more than (less than?) -1:-1, unless you are dealing with tons of grain. -2:-2 might produce better SSIM/PSNR but it is not uncommon for it to create more obvious blocks in high complexity scenes, which can be hidden by the grain, but otherwise pretty nasty.
With other options I whatever haven't played or consider them to be not ready. Speaking of readiness, I don't see much reasons to tweak presets while some options and encoder itself is not really ready for general use. Some time later you'll end up tweaking them again.

Edit: about keyint, I think that 250 is not enough. 20*framerate is a better option. Personally I would even use infinite, but some restrictions are required for general presets/tunings so that it won't produce unseekable steams.

littlepox

12th August 2015, 15:33

I vote for a somewhat dynamic preset/tune, not fixed value for all sources.

but the thing is, all x264 settings and I think x265 as well aren't really tuned for high quality encodes.

That's why I recommended a parameter like "--expectation" and then tweak rc parameters accordingly. That shall satisfy users who wish for high quality encoders, as well as users who desperately want the smallest bit-rate.

Never heard about such a rule and can't imagine using more than (less than?) -1:-1

In old times we don't need that low for x264, true. But it seems x265 does need such values, at least at this quality level.

Speaking of readiness, I don't see much reasons to tweak presets while some options and encoder itself is not really ready for general use. Some time later you'll end up tweaking them again.

There have to be pioneers though. By doing such research, we don't only make x265 to be (partially) more practical, gradually extending its usage from low-quality cases to high-quality cases, but also find some systematic ways for enchantment and then we can give feedback to the developers. Feedback is required to accelerate its maturity; we aren't just wasting our time for nothing.

benwaggoner

12th August 2015, 18:10

Very interesting stuff, and much more productive than just complaining about x265's default behavior. I have a few comments:

Comparisons should really be done at the same CBR bitrate (with --bitrate, --vbv-maxrate, and --vbvbufsize all set) if we really want to see differences in apples-to-apples comparisons. That will provide local differences that are really local differences, and not reflective of whole-file rate control or crazy peaks. For the most accurate comparison of encoders, use the lowest legal HEVC level's maximum bufsize. For a more fair comparison, you could let H.264 have bigger have a bigger bufsize since the spec allows it.

Focusing on 10-bit is interesting for archiving and high-end PC use, but there are and will be a lot of 8-bit HEVC decoders in the wild. Plus comparing 10-bit streams on 8-bit displays can be pretty fraught, since there are so many different dithering and conversion modes that can be applied. I recommend testing in 8-bit, or defining a particular display mechanism. Or just using a 10-bit pipeline to a 10-bit DisplayPort monitor, which is what I do, but I know a lot of folks don't have capable hardware.

Also, x265 seems to do quite a bit better than x264 for banding etcetera in 8-bit. It don't know that it provides any benefit to use 10-bit x265 encoding when coming from an 8-bit 4:2:0 source, and it might actually hurt. De-dithering and then re-dithering at playback has its own downsides. Just because it was a big win in H.264 doesn't mean we should assume the same in HEVC.

--aq-mode 3 is really there for 8-bit SDR encodes. I wouldn't expect it to be valuable elsewhere.

Is the goal accurate feel more than true accuracy? High psy-rdoq is going to reduce mathematical accuracy, but adds energy to the encode that'll make it seem sharper and more detailed.

I like the ideas about making aparameters more adaptive to frame size and content. It's a lot of complex code, however, and having good examples of how different parameters are optimal in different contexts will be very helpful in adding automation.

Sagittaire

12th August 2015, 18:59

about keyint, I think that 250 is not enough. 20*framerate is a better option. Personally I would even use infinite, but some restrictions are required for general presets/tunings so that it won't produce unseekable steams.

One more time it's completely false. If you compare "250" vs "infinite" for keyint interval for exactly the same quality output you can expect 0.5% to reduce size output and it's for the best theorical case. For real source, x264 and x265 use really less than 10*fps in most case with cute scene detection and in this case size will be exactly the same for exactly same output quality.

-2:-2 might produce better SSIM/PSNR

-2;-2 don't produce the best psnr/ssim result. -2;-2 produce really strong deblocking effect too at high quantizer too. One more time declocking is adaptative itself. Lower quantizer imply always lower deblocking. If you make encoding at quantizer at q10 or less with H264 or HEVC, tou don't have deblocking because you are outside the deblocking threshold in most case even if you choose +6;+6 for deblocking.

Keiyakusha

12th August 2015, 19:07

One more time it's completely false. If you compare "250" vs "infinite" for keyint interval for exactly the same quality output you can expect 0.5% to reduce size output and it's for the best theorical case. For real source, x264 and x265 use really less than 10*fps in most case with cute scene detection and in this case size will be exactly the same for exactly same output quality.

Nothing is false here. It would be a bad idea if it was making encoder significantly slower, but it shouldn't be the case. How much size reduction it will bring is irrelevant (even if zero). Every little bit helps as long as you don't pay for it with some more hours of encoding.

-2;-2 don't produce the best psnr/ssim result.
In case of animated content and x264 they often do. Not best, it is likely not the same with every source, but better than -1:-1 and -3:-3. Some material actually benefits more from 0:0.
But looking at x265 reports, it actually feels like -2:-2 is a good idea. I don't know. My only point was that -2:-2 for x264 is not a rule that I know of.

Sagittaire

12th August 2015, 19:15

Sagittaire

12th August 2015, 19:32

Nothing is false here. It would be a bad idea if it was making encoder significantly slower, but it shouldn't be the case. How much size reduction it will bring is irrelevant (even if zero). Every little bit helps as long as you don't pay for it with some more hours of encoding.

You have major decoding issue if you use really long GOP for HEVC and it's useless for coding efficiancy. In most case, for real movie source, scene cut use less than 250 frames. It's like that. Use PSNR for make the test at same size in ABR mode. Metric are perfect to demonstrate that here because you change simply IFrame by PFrame in long GOP encoding. Make the test if you want. Use more than 250 for keyint is simply a placebo effect ... :devil:

Keiyakusha

12th August 2015, 19:43

You have major decoding issue if you use really long GOP for HEVC

This is very interesting. I didn't know that. Can you elaborate what is that and why x264 has no problems of this sort? Also you sure this is not a some kind of decoder-side issue?
Edit: so far with x264 I had absolutely no problems with gops as long as 4000+ frames (1 keyframe at the beginning per ~3min video)

In most case, for real movie source, scene cut use less than 250 frames. It's like that. Use PSNR for make the test at same size in ABR mode. Metric are perfect to demonstrate that here because you change simply IFrame by PFrame in long GOP encoding. Make the test if you want. Use more than 250 for keyint is simply a placebo effect ...
The whole HEVC by the most part is a placebo. They could simply use 10-bit AVC level 5.1 for a new HD standard and all the average users would be happy. The thing is, I acknowledge that this may be a placebo (see where I mentioned that its fine even if there is 0 gain). But there's nothing wrong with placebo if it doesn't costs us processing speed and have at least a potential to improve something (not counting issue that you mentioned earlier)

benwaggoner

12th August 2015, 23:12

Interessing pdf from ateme about 10 bits encoding for 8 bits sources

http://extranet.ateme.com/download.php?file=1114
Yeah, no doubt it is useful for H.264.

But I've not seen any papers or real-world demonstrations that the same is true for HEVC. Maybe it is, but I don't think we should assume it, because there are downsides to going to 10-bit. Among other things, relying on the playback device to do a high quality conversion to its native color space, which outside of recent high-end TVs, is generally 8-bit in pipeline, link, or panel.

So, let's validate that 10-bit works better than 8-bit for 8-bit sources in HEVC before we march on assuming it is.

benwaggoner

12th August 2015, 23:22

This is very interesting. I didn't know that. Can you elaborate what is that and why x264 has no problems of this sort? Also you sure this is not a some kind of decoder-side issue?
Edit: so far with x264 I had absolutely no problems with gops as long as 4000+ frames (1 keyframe at the beginning per ~3min video)
ALL decoders have this problem. If you want to do random access to the last frame of a GOP, then you have to decode all frames it references, and all frames those frames reference, back to the start of the GOP. Having a classic IPBb frame hierarchy helps, since only prior P-frames need to be decoded. But with a 4000 frame GOP with an average of 75% P-frames, that's still ~999 frames to decode before you can display the last one. At 24fps, 4000 frames is about 2.8 minutes. If you actually have a shot that long, the average random seek into it will require decoding ~500 frames, which would take >5 seconds with many decoders.

Yes, for typical content it won't matter. But maximum GOP duration for single-file encoding should be based on the maximum random access delay you can put up with. If it's 10 seconds, most IDR frame will come from scenecut. But those handful of "forced" IDR frames will really help random access into those sections, with vanishingly tiny encoder efficiency issues.

Bear in mind that most commercial video that gets watched has a fixed GOP duration of just a few seconds. The good modern encoders are really very well tuned for not having keyframe strobing anymore.

The whole HEVC by the most part is a placebo. They could simply use 10-bit AVC level 5.1 for a new HD standard and all the average users would be happy. The thing is, I acknowledge that this may be a placebo (see where I mentioned that its fine even if there is 0 gain). But there's nothing wrong with placebo if it doesn't costs us processing speed and have at least a potential to improve something (not counting issue that you mentioned earlier)
That was certainly an option, but no one went with it because the quality improvements weren't worth it. For real-world delivery, HEVC delivers big advantages and bigger future improvements. And also can do UHD and HDR. Comparing x264 and x265, I'd say x264 needs 2.5x the bitrate to get to decent quality at UHD frame sizes. And HDR bitstream flags are only available in HEVC.

Sagittaire

12th August 2015, 23:29

Yeah, no doubt it is useful for H.264.

But I've not seen any papers or real-world demonstrations that the same is true for HEVC. Maybe it is, but I don't think we should assume it, because there are downsides to going to 10-bit. Among other things, relying on the playback device to do a high quality conversion to its native color space, which outside of recent high-end TVs, is generally 8-bit in pipeline, link, or panel.

So, let's validate that 10-bit works better than 8-bit for 8-bit sources in HEVC before we march on assuming it is.

There are some test here in doom9 with x265, I believe. HEVC have the same comportement than H264. 10 or 12 bits on 8 bits source produce better output (metric test). Moreover, 10 bits encoding seem produce better result for banding on 8 bits source and better HVS result too. Explain for this result seem exactly the same for all the MPEG codec.

Keiyakusha

12th August 2015, 23:31

ALL decoders have this problem.
I believe Sagittaire was talking about a different kind of issue as we was talking about a case where such long gops will not be created. When I mentioned 4000 frames, I was obviously saying that there was no issues other than inability to seek fast.

In most case, for real movie source, scene cut use less than 250 frames.

But those handful of "forced" IDR frames will really help random access into those sections, with vanishingly tiny encoder efficiency issues.

That's if random access has a significant value for you. But let's not forget that my suggestion was not "infinite" gop. I said I would use that myself, but for preset i suggested 20*framerate. Which should cover all your seeking needs.

benwaggoner

12th August 2015, 23:40

That's if random access has a significant value for you. But let's not forget that my suggestion was not "infinite" gop. I said I would use that myself, but for preset i suggested 20*framerate. Which should cover all your seeking needs.
Maybe I seek more than others, but I almost always use something in the 4-10 second range.

Keiyakusha

12th August 2015, 23:48

Maybe I seek more than others, but I almost always use something in the 4-10 second range.

This is an interesting thing by the way. How much do people seek? Personally I only seek to chapters, and these always have keyframes inserted through QPfile. I can't really see why would you need anything else... in case you, let's say, missed part of the show and want to rewind a little, my suggested value should work good enough, I think...

Motenai Yoda

13th August 2015, 01:00

I'm not sure dynamic/adatpive parameters will be a good idea, it will make all much more hazy.
Let's take animation tune of x264, it increase b-frames by 2 and double ref if > 1
- animation (psy tuning):
--bframes {+2} --deblock 1:1
--psy-rd 0.4:<unset> --aq-strength 0.6
--ref {Double if >1 else 1}
so with it if you set 0,1,2 b-frames you'll get 2,3,4 b-frames, to get 0 b-frame you have to set -2!
about ref you can't get 0 ref coz it set it at 1.

Also isn't true that .8 qcomp is always better "if the user is expecting high quality" , assuming real world movie, it can be full of hi-complexity scenes, so you have to reduce qcomp to improve those few low-complexity scenes (if you have to);
but vice versa to get a decent quality on few hi-comp ones on a (very)low-comp/low-motion movie you have to increase it.

For the gops topic, most scenes (90%) doesn't last more than 8s, many (50%) are less than 4s, at least you can spare 10/20 I-frames on ~ 1000xhour.
Maybe animation stuff can take more advantage of exaggerated gops and/or +8 b-frames, but (imho) only a little (1-2%).

sonnati

13th August 2015, 14:51

Maybe I seek more than others, but I almost always use something in the 4-10 second range.

I agree with Ben, if you have to deal with encoding
in professional OTT scenarios, keyint range has to be much shorter than if you encode "simply" for ripping. Chunk size in ABR or trick-mode on STBs (for progressive download) usually set a limit to keyint.

littlepox

13th August 2015, 16:48

A few more points I'd like to discuss:

--keyint is usually unimportant since x264/x265 make good use of scenecut. If you restrict it lower, the encoder always chooses the point to maximize the efficiency where an I-frame helps to improve quality. The only noticeable situation is where you have something like a talk-interview, where a person just sits there talking for 10 minutes, and the camera never moves. In this case, bigger keyint does save you about 5% of the size, but you will suffer a lot if you do seeking. In real-life situation, very big keyint won't help that much that it only saves overall bit-rate up to ~1%, but sometimes it can cause seeking latency. My recommendation is to use 10x fps for online video and 20x fps for ripping.

I'm definitely not interested do CBR Testing, and there are two reasons:
1. Temporal rate-control is a significant point for testing. We want to see how a parameter set performs when it comes to deal with different scenes. As long as we get diversified samples across the timeline, we can still make a fair comparison.
2. In reality I never do CBR encoding. If I never eat apples, why would I compare apples to apples? Even if comparing pears to oranges is less fair, it makes sense for me to decide which one to buy.

8bit/10bit is not that problematic. Even if you use 8bit encoding, you still need to go through the process of Chroma Upscaling->Converting YUV to RGB->Dithering down to RGB24. In our test we make the process as follows:

1. All the procedures are made in 16bit-integer precision. After decoded, the YUV data is padded to 16bit integer.
2. Chroma upscaling is done with non-ringing lanczos 4
3. When converted to RGB48(RGB channels are 16bit each), use Serpentine Floyd-Steinberg error diffusion algorithm to convert it to RGB24 for inspect.

It's a process with high quality and high precision. With these, 10bit precision is utilized fully, even the final output is RGB24. I suppose no one here believes 10bit video = 10bit Display. This was a common misunderstanding when we began to enjoy the benefit of 10bit-x264. And today, playback filters like LAV/MadVR are there to do the job so that you can fully enjoy the high-precision YUV encoding even you are using a 8bit monitor.

In fact, I do know most TVs and BD players are not that nice; their pipelines are made 8bit. I've not setting up any test regarding that, and I believe it's not the encoder's responsibility to worry about playback procedure.

I have already validated that with high-quality conversion, x265-10bit works better than 8bit, ruling out artifacts caused by low-precision such as color-banding, blocking in the dark site. This should be very easy to check, especially if you get some anime sources where you don't have much grain to shield the color-banding.

Boulder

13th August 2015, 17:00

I don't think they're talking about doing a CBR encode, they mean a VBR one which should be almost identical compared to a CRF encode that produces the same average bitrate. At least that's how it is with x264.

littlepox

13th August 2015, 17:07

he does mean CBR encode:
Comparisons should really be done at the same CBR bitrate (with --bitrate, --vbv-maxrate, and --vbvbufsize all set) if we really want to see differences in apples-to-apples comparisons. That will provide local differences that are really local differences, and not reflective of whole-file rate control or crazy peaks.

As I said in my earlier posts, I did guarantee the bit-rates are same (less than 5% of difference) for comparison, so that would not be any problem.

Boulder

13th August 2015, 17:13

Oh, missed that one. I'd say it's a weird way to compare because the ratecontrol quality can and will affect the final quality quite a lot.

I have tested your settings and I still have the same issues with details lost. It seems that the B-frames get a quite big hit as they are quite blurred. It is interesting to see that x265 makes quite different decisions compared to x264 (finally understood to use FFInfo to see what frame type is being shown).

littlepox

13th August 2015, 17:23

Currently this is the best we can derive, at least far better than x265 default or --tune grain.
But as I said, this parameter only equals to somewhat like x264_crf=19, where you still need to tolerate detail lost, even with x264.
For even higher quality, just forget about x265 for the moment.

Boulder

13th August 2015, 17:28

I compare against CRF 18.5 or 18, that's what I use with x264. It is true and understandable that x264 also loses some detail as that is necessary to get a lower average bitrate. Personally I think that the psychovisual things are not yet at the same level with x265, but as I've said many times, it is very much possible that they will be addressed when some more pressing matters are finished.

littlepox

13th August 2015, 17:38

I compare against CRF 18.5 or 18, that's what I use with x264.

Should you are interested, you can test again but with changes as follows:

crf=18
ipratio=1.3
pbratio=1.2

I'm pretty much confident it shall produce a result as good as x264, but I cannot guarantee its bit-rate.

Boulder

13th August 2015, 17:43

Sure, I can try those.

EDIT: one thing: does it hurt to disable open GOP or is it basically meaningless? In x264 it's disabled by default but in x265 it's enabled.

littlepox

13th August 2015, 17:53

one thing: does it hurt to disable open GOP or is it basically meaningless? In x264 it's disabled by default but in x265 it's enabled.

It does not matter at all.

Boulder

13th August 2015, 18:09

Unfortunately it does not help, there's still a great deal of detail/noise/grain removed.

http://abload.de/img/x264_frame419s19.png
http://abload.de/img/x265_frame4yds47.png

benwaggoner

13th August 2015, 21:00

Sure, I can try those.

EDIT: one thing: does it hurt to disable open GOP or is it basically meaningless? In x264 it's disabled by default but in x265 it's enabled.
Open GOP can help quality a bit when an IDR comes in the middle of a scene. This was a big deal with MPEG-2, where the max GOP duration for DVD was 0.6 seconds. That leading B-frame in Open GOP can smooth discontinuities between the previous GOP and the new I-frame.

Basically, if you're having keyframe strobing issues, Open GOP can help. For the bitrates and GOP durations we're talking about here, I doubt it would have meaningful impact.

littlepox

14th August 2015, 02:07

Unfortunately it does not help, there's still a great deal of detail/noise/grain removed.

http://abload.de/img/x264_frame419s19.png
http://abload.de/img/x265_frame4yds47.png

Not that bad though.
Can you post your encoding parameters and sizes for both clips? And if possible, upload the source so we can test based on that.

Boulder

14th August 2015, 09:52

littlepox

14th August 2015, 10:54

The source is here: https://drive.google.com/open?id=0BzeF_1syecQwakhsX3RuZGhWcjA

The parameters for x264 was pretty much --preset veryslow --tune film, 2-pass encode and bitrate 4800 kbps (equals what CRF 18.5 produced IIRC). The parameters of x265 according to your suggestions, 2-pass encode also at 4800 kbps to compare directly. I've cropped the mattes and downsized to 1280x528 with bicubic (b=-0.6,c=0.3) in the script that was fed to the encoder.

Thanks for the info.

I've noticed that you are doing a < 720p encode. I'd recommended you use x264 all the way; major benefits for x265 work little in 720p encoding. Our test was conducted under 1080p so it's more deviated from your case.
So far, we have not setup any test regarding 720p, but empirical ideas are as above.

Sagittaire

14th August 2015, 11:11

--keyint is usually unimportant since x264/x265 make good use of scenecut. If you restrict it lower, the encoder always chooses the point to maximize the efficiency where an I-frame helps to improve quality. The only noticeable situation is where you have something like a talk-interview, where a person just sits there talking for 10 minutes, and the camera never moves. In this case, bigger keyint does save you about 5% of the size, but you will suffer a lot if you do seeking. In real-life situation, very big keyint won't help that much that it only saves overall bit-rate up to ~1%, but sometimes it can cause seeking latency. My recommendation is to use 10x fps for online video and 20x fps for ripping.

Calculation for potential gain is really simple: if you change PFrame by IFrame, you have between x2 and x3 for the size frame. Little comparison for movie at 24 fps:

- keyint 240 vs keyint 480
gain = (2 * 300 + 478 * 100) / (300 + 479 * 100) = 0.4%

- keint 240 vs infinite keyint
gain = 0.8%

It's maximal theoritical gain. But in fact it's really less than these max theoritical. First you have scene cut detection and for real movie, overall GOP is really less than 240 frames. Codec will place really small number of IFrame with keyint limitation. Second have really long GOP can imply higher error diffusion, at same quantizer quality for IFrame will be always better than equivalent PFrame and will be better reference frame for the other Pframe. For these reasons, potential gain for really long GOP is ... ~0%.

littlepox

14th August 2015, 11:33

Calculation for potential gain is really simple: if you change PFrame by IFrame, you have between x2 and x3 for the size frame. Little comparison for movie at 24 fps:

- keyint 240 vs keyint 480
gain = (2 * 300 + 478 * 100) / (300 + 479 * 100) = 0.4%

- keint 240 vs infinite keyint
gain = 0.8%

It's maximal theoritical gain. But in fact it's really less than these max theoritical. First you have scene cut detection and for real movie, overall GOP is really less than 240 frames. Codec will place really small number of IFrame with keyint limitation. Second have really long GOP can imply higher error diffusion, at same quantizer quality for IFrame will be always better than equivalent PFrame and will be better reference frame for the other Pframe. For these reasons, potential gain for really long GOP is ... ~0%.

The point is that when you have B-frames, very often there is a 10x/20x difference, especially when an IDR is forced rather than decided at a long, static scene. Furthermore, p-frames itself can be very saving as well. For assumptions under "a long, static scene", you'll find out the ratio between I/P is generally 5x or even more, and so it is with b-frames that size ratios can be even higher than usual. We've done with these cases before so we have the statistics, where in those cases your model is over-simplified.

The number 5% came from a case when we encoded an editor interview in a Blu-Ray. We encoded the 24fps clip in x264 first with keyint=240, then 720, finally infinite. It did matter, and was way beyond your "theoretical maximum gain". Due to forum rules we are not supposed to post the original materials here; but similar sources should be easy to find.

At last, let me state again that --keyint isn't really anything important here. So set anything you're happy with. That's enough.

benwaggoner

15th August 2015, 18:50

The point is that when you have B-frames, very often there is a 10x/20x difference, especially when an IDR is forced rather than decided at a long, static scene. Furthermore, p-frames itself can be very saving as well. For assumptions under "a long, static scene", you'll find out the ratio between I/P is generally 5x or even more, and so it is with b-frames that size ratios can be even higher than usual. We've done with these cases before so we have the statistics, where in those cases your model is over-simplified.

The number 5% came from a case when we encoded an editor interview in a Blu-Ray. We encoded the 24fps clip in x264 first with keyint=240, then 720, finally infinite. It did matter, and was way beyond your "theoretical maximum gain". Due to forum rules we are not supposed to post the original materials here; but similar sources should be easy to find.
yeah, that scenario makes sense. However that content would be so easy to encode that the bitrate savings aren't likely to be material.

Relatedly, static shots like that will have LOTS of B-frames, and thus will have better random access than more typical content at the same keyint.

Maybe we really want "maximum reference frame dependency" which would start a new GOP when the worst case number of frames to decode any given frame is exceeded. That would really control for worst-case decode complexity. With a little RDO, it would probably use 16 B-frames in those long sequences in order to avoid expensive IDRs.

-Ben Waggoner (via TapaTalk)

shinchiro

20th August 2015, 04:53

Unfortunately it does not help, there's still a great deal of detail/noise/grain removed.

http://abload.de/img/x264_frame419s19.png
http://abload.de/img/x265_frame4yds47.png

how about disabling cutree with ctu 32 (--no-cutree --ctu 32)? Does it improve quality a bit?

x265_Project

20th August 2015, 16:17

littlepox

20th August 2015, 17:37

We are following this thread, and we are very interested in your feedback. We really appreciate all the efforts that people are making to help us figure out what works best under various conditions.

This thread includes many suggestions for improving detail. Some of these suggestions may be appropriate for our default presets (they may improve visual quality under any conditions), and some are more focused on film. We think it's best to start with the general case, and then tune for the specific case. So I'd like to hear your suggestions for an improved default setting (--preset medium), and then for slower or faster presets. What changes would you make to improve visual quality with little tradeoff in encoding speed or bit rate?

Let's assume the quality segment to be around x264_crf=24~19, higher than the default value x265_crf=28 (which we are really not that interested for its terrible quality so our test was not based on that)

Parameters that generally improve visual quality with little tradeoff in encoding speed or bit-rate:
--no-sao
--deblock -2:-2

Parameters that generally do NOT affect visual quality but reducing time:
--me_range 44(under 1080p)
--no-rect --no-amp(We don't recommend it unless for --preset placebo)
--ctu 32 --max-tu-size 16 (in fact, with resolutions no more than 1080p, this does not only reduce time but ALSO reduce file-size)

Parameters that will increase file-size, but we believe it's worthy since doing so gives maximum gain in visual quality, rather than just add --bitrate or reduce --crf.

--qcomp 0.8
--psy-rd 0.7 --psy-rdoq 5.0 --rdoq-level 1
--qg-size 16

Note you may observe some increase in encoding time if you are using crf mode, this is because they increase both the bit-rate and the visual quality if compared to not setting them(Let me emphasize that the same crf never guarantees the same visual quality). Bigger files takes more computation for entropy-coding.

For example, if the outcome with ~5000Kb/s falls within the quality segment of our assumption(a bit higher than the default crf setting), and we have these two (N-pass) sets of parameters:

A: --bitrate 5000 --preset faster/fast/medium/slow/slower/veryslow/placebo (you can add --tune grain, which we believed to be terribly designed)
B: --bitrate 5000 --preset faster/fast/medium/slow/slower/veryslow/placebo --ctu 32 --max-tu-size 16 --qcomp 0.8 --aq-mode 1 --aq-strength 1.0 --psy-rd 0.7 --psy-rdoq 5.0 --rdoq-level 1 --deblock -2:-2 --qg-size 16 --no-sao --me_range 44 --no-rect --no-amp

It's almost sure by our test that for resolutions lower or equal to 1080p:
B has a better visual quality than A, AND B encodes faster than A. None of the parameters we recommend is going to increase the computational time.

Our test was not focusing on films alone, but also some anime. Recommended tuning for film and anime is slightly different, but the general trend somehow is the same, and you can take my advises above to be applicable for general purposes.

Ely

21st August 2015, 07:54

For 1080p, would you still recommend "--ctu 32 --max-tu-size 16" ? I'm confused as when referring to these, you say "under 1080p".

littlepox

21st August 2015, 08:00

For 1080p, would you still recommend "--ctu 32 --max-tu-size 16" ? I'm confused as when referring to these, you say "under 1080p".

OK let me clarify this:
"For resolutions lower or equal to 1920x1080"
That should make no confusion as we are not talking about UHD contents.

Dclose

30th September 2016, 18:07

--bframes 10: Again it worked during the test. better than --bframes 3/5. We don't see any need to go beyond 10, and speed punishment is indeed very little. bframes 3->10 adds less than 5% of encoding time.

Do you still recommend 10 bframes with newer builds? Some people disagree with 10, but my own tests before reading this thread has similar results as yours. My current test sample is 720p live video, with lots of jungle and small rocks and plenty of people moving around. The test gets more than 5% encode time added, but it's a complicated scene.

benwaggoner

1st October 2016, 20:52

Do you still recommend 10 bframes with newer builds? Some people disagree with 10, but my own tests before reading this thread has similar results as yours. My current test sample is 720p live video, with lots of jungle and small rocks and plenty of people moving around. The test gets more than 5% encode time added, but it's a complicated scene.
For actual film/video content with meaningful amounts of grain/noise, >8 b-frames are overkill; --preset placebo only goes up to 8. I've found some benefit with greater values at very low, highly compressed bitrates where noise isn't going to be preserved anyway.

Historically lots of b-frames were bad with highly noisy content with x265, but that's gotten way better in 2016.