Suggestion for x265's --tune film

littlepox · 10th August 2015, 18:58

Several days ago we asked the x265's team how could we participate in the discussion, and they left me with e-mail address. When we finished writing the email, we are also glad to post it here so all of you are welcome to discuss about the topic: How to keep the film grain and details at higher quality when using x265?

We take several movie scenes with moderate level of film grain, rich details, and some dark scenes. Then we take out the tests between x264-10bit and x265-10bit:

(NOTE there are --input-depth 10 for both sets of parameters. Do NOT just copy and paste if you are not encoding 10-bit input)

x264: --preset veryslow --tune film --crf 19.0 --qcomp 0.75 --input-depth 10(a fairly trusted setting )

x265: --preset slower --ctu 32 --max-tu-size 16 --crf 20.0 --tu-intra-depth 2 --tu-inter-depth 2 --rdpenalty 2 --me 3 --subme 5 --merange 44 --b-intra --no-amp --ref 5 --weightb --keyint 360 --min-keyint 1 --bframes 8 --aq-mode 1 --aq-strength 1.0 --rd 5 --psy-rd 1.5 --psy-rdoq 5.0 --rdoq-level 1 --no-sao --no-open-gop --rc-lookahead 80 --scenecut 40 --max-merge 4 --qcomp 0.8 --no-strong-intra-smoothing --deblock -2:-2 --qg-size 16 --pbratio 1.2
(We've tested hundreds of parameter combinations and we conclude this set is the best outcome)

We compare the quality with our eyes, NOT psnr/ssim, which favors blurriness over grain keeping.
x264 gives an 118% size compared to x265, with nearly the same visual quality. We can also drop the bit-rate of x264 a bit, but then AVC outcome necessarily looks worse. At this point we believe x265 claims a victory, not at extremely low bit-rate cases where HEVC blurriness looks better than AVC blockiness, but rather a high-quality, near-transparent encoding.

Our test focus on several RC parameters: --ctu, --tu, --crf, --qcomp, --aq-mode, --aq-strength, --psy, --qcomp --qg-size...These parameters can alter RC behavior with minor impact on the speed. We took some iterative method with genetic algorithms to select the possibly best parameter sets out of infinite.

Here are some of our comments:
--ctu, --max-tu-size: This two should be decreased in 1080p @ high-quality encoding. Keeping other parameters constant, tweaking --ctu 64->32, --tu 32->16 gives even better quality with nearly 15% of size-decrease, in crf mode. We assume that adopting larger CU and TU actually wastes the bit-rate by producing over-smoothed block, forcing bit-rate to rise in order to keep a "constant rate-factor". So we'd better just leave them smaller, at least under 1080p

--crf: x265 gives a default value 28. This is way too low to compete against x264's default value 23. Many users believe by tweaking crf they can choose between quality and size without other concerns, but they are wrong; sometimes, tweaking other parameters is more efficient. --crf need to be coupled with other parameters properly tuned.

--qcomp: the lower your crf, the higher the quality, and you should set --qcomp to be higher. This is also true for x264. Our test suggest --qcomp 0.8 is very essential when x265_crf<23.

--aq-mode: keep it at 1. aq-mode=3 gives wired bit-rate for nothing.

--aq-strength: default 1.0 is pretty good.

--psy: the primary weapon to fight blurriness. --psy-rd 1.5 --psy-rdoq 5.0 --rdoq-level 1 works better than --tune grain's settings, trust me.

--deblock -2:-2: This is also a rule of thumb since x264's era: using smaller numbers if you prioritize quality.

--qg-size 16: The larger number you set, the smaller file you get, but much worse quality for trade. Just change it to 16 instead of 64; this is worthy.

--no-sao: SAO smooths everything up. Do NOT use it unless you really want to.

--me_range 44: me_range has minor impact on speed when coupled with low level of me and subme; but that's not true for --me 3 --subme 5. Decrease it a bit in 1080p do you a favor with little sacrifice.

--no-rect --no-amp: They trade time for nothing. Do NOT enable them unless you really have time to waste.

So, here are our suggestions for x265 --tune film:

--ctu 32 --max-tu-size 16 --qcomp 0.8 --aq-mode 1 --aq-strength 1.0 --psy-rd 1.8 --psy-rdoq 5.0 --rdoq-level 1 --deblock -2:-2 --qg-size 16 --no-sao --me_range 44 --no-rect --no-amp

By setting these parameters, you can probably set your crf to be around 22~19, and it gives you considerably good output with small size. if you set crf=21~22, --qcomp 0.75 is recommended.
We don't see any advantage for x265 if you are targeting at x264_crf<=16. At that point x265 requires even more bit-rate to keep the details.

Furthermore, here are some other recommendations:

1. Except for --tune and --preset, add another parameter "--expectation" and let user specify whether he wishes a high, medium or low quality(with respective size). This can also be derived if --crf is specified. Then use the input to change the rc behaviors. For example, if the user is expecting high quality rather than a crf=28 blurriness, increase --qcomp to 0.8.

2. Change RC behaviors and speed options based on resolution. 720p/1080p/1440p/4K don't optimally share a same set of parameters. For example, --max-tu-size --ctu --me_range.

3. Give less bias/more penalty to bigger CUs and TUs, especially at higher quality expectation and lower resolution. --rd-penalty 2 is a very useful parameter, but it only prohibits TU size of 32x32. Let's impose more penalties to similar cases.

Thank you for patiently reading this email, and I'm ready to conclude it:
So far x265 team has done a fantastic job by offering the best encoder (at low bit-rate), but let us greedily urge more. We wish x265 to be best not only at low quality, but also best overall. At the same time, we are willing to help the team with our own effort.

Regards,
LittlePox

2015/08/11

Boulder · 11th August 2015, 13:34

Is your test material regular movies or anime etc? I could try your suggested settings with my Hot Fuzz test clip and see if there is any improvement (and also compare to x264).

sneaker_ger · 11th August 2015, 14:48

Thank you for sharing! I added it to my latest grain test but since this is a suggestion for --tune film (not --tune grain) I will also try a less grainy source in the near future. There's definitely a difference, often less blurry (in exchange for other artifacts, though).

littlepox · 11th August 2015, 18:04

Quote:

Originally Posted by sneaker_ger

Thank you for sharing! I added it to my latest grain test but since this is a suggestion for --tune film (not --tune grain) I will also try a less grainy source in the near future. There's definitely a difference, often less blurry (in exchange for other artifacts, though).

Thanks for the test; a few points I'd like to raise:

1. I don't know whether your equivalent crf fits for my intervals or not. Can you try a bit and see approximately how much crf is required to generate such bit-rate?

2. As I suggest, this is --tune film, not --tune grain. Currently don't even daydream to use x265 for proper grain keeping.

3. Can you point out what are the other artifacts? With your suggestion we can probably make some adjustments accordingly.

PS: If you are to talk about distortion, I'd say this is the side effect of psy. Or rather, this is what psy is supposed to be: people's eyes don't care some distortion since they don't compare it with the source frame by frame; they just want a non-blurry, similarly complex video. We believe for general film clips, the outcome products alone look quite acceptable.

sneaker_ger · 11th August 2015, 18:30

1. crf 19.5~20 for --preset slower --tune grain, so below your target range
3. Look near his right (from viewers pov left) ear:
http://abload.de/img/1060_org_gqayh.png
http://abload.de/img/1060_x265_pzstp.png
http://abload.de/img/1060_littlepox_fhxsf.png
But as you say it's expected.

Sagittaire · 12th August 2015, 00:04

Well I read your encoding profil:

Quote:

x265: --preset slower --ctu 32 --max-tu-size 16 --crf 19.7 --tu-intra-depth 3 --tu-inter-depth 3 --rdpenalty 2 --me 3 --subme 5 --merange 44 --b-intra --no-rect --no-amp --ref 5 --weightb --keyint 720 --min-keyint 1 --bframes 10 --aq-mode 1 --aq-strength 1.0 --rd 5 --psy-rd 0.7 --psy-rdoq 5.0 --rdoq-level 1 --no-sao --no-open-gop --rc-lookahead 80 --scenecut 40 --max-merge 4 --qcomp 0.80 --no-strong-intra-smoothing --input-depth 10 --deblock -2:-2 --qg-size 16 --pmode

--ctu, --max-tu-size --qg-size: at high quality, if you want keep grain, it must be theoricaly usefull to use smaller value.

--qcomp: qcomp is RC compression. value 1.0 must produce highest possible bitrate variability (constant quantizer without psy and without type frame ratio). Use high bitrate variability for high quality encoding is theorically good (high complexity scene will have better quality)

--deblock -2:-2: deblocking is itself adaptative with quatizer mode. Low quantizer mean less deblocking. Anyway -2;-2 is not bad value and will produce better high frequency conservation but more blocking/ringing too.

--no-rect --no-amp: bad idea. It's theorically always usefull for quality and keep high frequency. Anyway it's really bad for speed.

--bframes 10: really bad idea. More than 3 bframe for high quality and keep noise is theorically bad. Moreover it's really bad for speed.

--keyint 720: bad idea. more than 10x the framespeed don't produce mesurable better quality. Moreover it's really bad for decoding.

You forgot really usefull setting for high quality encoding if you want keep noise: --ipratio 1.1 --pbratio 1.1. It's like qcomp. It's usefull at high quality to have better constant quality (no PBP or IBP flicking, keep noise in bframe).

littlepox · 12th August 2015, 02:13

Quote:

Originally Posted by Sagittaire

Well I read your encoding profil:

--ctu, --max-tu-size --qg-size: at high quality, if you want keep grain, it must be theoricaly usefull to use smaller value.

--qcomp: qcomp is RC compression. value 1.0 must produce highest possible bitrate variability (constant quantizer without psy and without type frame ratio). Use high bitrate variability for high quality encoding is theorically good (high complexity scene will have better quality)

--deblock -2:-2: deblocking is itself adaptative with quatizer mode. Low quantizer mean less deblocking. Anyway -2;-2 is not bad value and will produce better high frequency conservation but more blocking/ringing too.

--no-rect --no-amp: bad idea. It's theorically always usefull for quality and keep high frequency. Anyway it's really bad for speed.

--bframes 10: really bad idea. More than 3 bframe for high quality and keep noise is theorically bad. Moreover it's really bad for speed.

--keyint 720: bad idea. more than 10x the framespeed don't produce mesurable better quality. Moreover it's really bad for decoding.

You forgot really usefull setting for high quality encoding if you want keep noise: --ipratio 1.1 --pbratio 1.1. It's like qcomp. It's usefull at high quality to have better constant quality (no PBP or IBP flicking, keep noise in bframe).

--ctu --max-tu-size --qg-size: Tests suggest this is the best combination; and --qg-size is already set to be the smallest value. If you wish to further more refine the quality based on this set, dropping crf instead of these ones.

--qcomp/--deblock: Our choice should balance everything pretty well. Using a qcomp larger than 0.8 is no better than dropping crf at this point.

--no-rect --no-amp: We've tested multiple times, at different scenario, unfortunately never in a single case do we see any benefit of these two parameters. quality and size are indistinguishable between turning on and off.

--keyint 720: I'd agree with the decoding speed, but in practice, the larger your keyint is, the more efficient you get. x264/x265's scenecut detection is nearly perfect so that IDR will be inserted at the optimal location. Manually forcing to insert an I frame at an undesired position may benefit in the quality but hurt the overall efficiency. Anyway, dropping it to something like 360 doesn't change much, so you don't need to follow exactly.

--bframes 10: Again it worked during the test. better than --bframes 3/5. We don't see any need to go beyond 10, and speed punishment is indeed very little. bframes 3->10 adds less than 5% of encoding time.

--ipratio 1.1 --pbratio 1.1: We are not there ready for near-lossless encoding. We've included these two in the test, including a less aggressive pair(1.3, 1.2), but we find that they are not as useful as thought. Better just drop crf instead of these two if you wish for even better quality.

Sagittaire · 12th August 2015, 12:02

Quote:

--keyint 720: I'd agree with the decoding speed, but in practice, the larger your keyint is, the more efficient you get. x264/x265's scenecut detection is nearly perfect so that IDR will be inserted at the optimal location. Manually forcing to insert an I frame at an undesired position may benefit in the quality but hurt the overall efficiency. Anyway, dropping it to something like 360 doesn't change much, so you don't need to follow exactly.

No it's false. IFrame have between x2 or x3 PFrame size. Make simple calcul show that if you replace 720 keyint vs 240 keyint, you have theorically simply 0.5% for max gain size. In real encoding it's even less because overall keyframe interval is really less than 720 frame or even 240 frame. Use 24 frames for GOP is simply less than 8% for size efficiency.

Quote:

--bframes 10: Again it worked during the test. better than --bframes 3/5. We don't see any need to go beyond 10, and speed punishment is indeed very little. bframes 3->10 adds less than 5% of encoding time.

No it's false because x265 in real encoding will never place 10 consecutive bframe. The value for bframe in x265 is generally 0,1,2 or 3 Bframe in most case (more than 95% of real situation). Use 10 bframe vs 3 bframe is perhaps usefull for really flat anime. Certainely not for grainy movie (in this case never x265 use 10 consecutive bframe). You can expect less than 1% gain size for 5bframe vs 10bframe in all real scanario.

Quote:

--ipratio 1.1 --pbratio 1.1: We are not there ready for near-lossless encoding. We've included these two in the test, including a less aggressive pair(1.3, 1.2), but we find that they are not as useful as thought. Better just drop crf instead of these two if you wish for even better quality.

If you want make high encoding quality, you must use low ratio between frame type. Defaut ratio are better for overall efficiency but introduce massive higher quantizer for bframe (something like -1 for Bframe, -2 for Pframe and -3 for IFrame). Imply that you can have good noise conservation for IFrame and Pframe but not for Bframe and bframe.

Sagittaire · 12th August 2015, 12:18

Moreover i don't understand that:

Quote:

Better just drop crf instead

If you want make serious test, you must compare encoding at same size (multipass mode) and not at same crf.

I am skeptical about the seriousness of your test when you indicate that 10 bframe is really usefull for quality because in real encoding never x265 use 10 consecutive bframes (bframe placement is adaptative and x265 will certainely use just 0, 1, 2 or 3 Bframe in majority of case for noisy source). 10 Bframe is just placebo effect here. Moreover if you compare bframe number at same crf, x264 will always produce better quality without bframe (or with less bframe) but with higher output size simply because bframe will use really higher quantizer than Pframe in crf mode.

littlepox · 12th August 2015, 13:56

OK let me explain the "genetic algorithms" we applied in order to clarify something, and this is my suggestion for parameter testing:

x265 parameter sets are of one species, and a single set of parameters is an individual in the species.
An individual is characterized by its genetic strands, and a parameter set consists of parameters.
An individual can reproduce its children with most of their genetic strands similar to the parent, and one parameter set can be modified to more sets by changing one or several parameters each.
In most cases, we don't have abrupt genetic mutation, but seldom we do. We tweak the parameters step by step, but seldom we do add wired changes hoping to make a difference.
Children with different genetic strands are subject to natural selection, and parameter sets are subject to quality benchmarks, either by pnsr/ssim or by human eyes.

We start from a small protobiont, a parameter sets with very low settings like --crf 30 --aq-strength 0.3 --qcomp 0.4...... You don't expect anything good from it but very small bit-rate.
We then drive this individual to grow and evolve by strengthening some parameters, for example:
dropping crf a bit gives better quality, but also bigger size;
increasing aq a bit gives better quality, but also bigger size;
--ipratio 1.1/pbratio 1.1 give better quality, but also bigger size;
......

You make each mutation once, properly control mutated individuals to be at similar size(in our test, the threshold is max/min<105%. If violated, say we dropped too much crf and get larger outcome than other children, we remade the mutation with less change to the crf in order to fit the size with other siblings.) and you select the best individual(s).
You then keep this best boy alive and let it grow furthermore. most of the children come from changing one parameter, but some are made by changing multiple parameters together.

Every batch you have children with similar size, and you choose the best one(s) to reproduce the next generation. Gradually, the bit-rate increases, quality refines, and parameter sets become more and more optimal.

when I say "dropping crf a bit instead of tweaking ibpratios", I mean:
1. if you tweak ibpratios alone, you get better output with bigger size
2. but you can also drop crf, increase aq, increase qcomp... to reach a similar, bigger size.
3. We tested that starting from the set we give, dropping crf is still better than tweaking ratio, and many other possible choices you can name.
4. Probably when the species grow even bigger, say we reach the situation near x264_crf=17, then tweaking the ratios become the best choice at that evolution stage.

Furthermore, I'm not quite interested in the topic regarding --keyint and --bframes. In the test their behaviors classified them as unimportant genetic strands. It means they don't have much impact on the results. Set whatever reasonable numbers you want; it just doesn't matter much. Even I said --bframes 10 is better than 3/5, the difference is indeed very small, about 1% as you said. The converse is also true: changing --bframes 10 to 5 makes very little difference, both quality-wise and time-wise.

I've also not included them in the recommended parameters, meaning they are out of the scope of --tune film. It's perfectly fine you set --keyint 250 --bframes 5 as you wish, but there is nothing much to criticize if I set them higher. And there is a reason for doing so in a test: I just wish to set it high enough so it won't possibly become any bottleneck.

Keiyakusha · 12th August 2015, 14:56

I agree about coding unit size. I wouldn't use more than 16 for SD and more than 32 for hd. I vote for a somewhat dynamic preset/tune, not fixed value for all sources.
qcomp adjustment is really a must have for high quality encodes. As well as removing that nasty cu-tree, but the thing is, all x264 settings and I think x265 as well aren't really tuned for high quality encodes. They are tuned to keep as much quality as possible by throwing away everything that average people might not notice during realtime playback. By my book "high quality" is when you can't see the difference by a naked eye even at frame by frame comparison. Or if you do, you can't tell which one is the source, no matter if its a static or fast-moving (part of the) scene. But then I only use this for anime myself. To make live action virtually lossless and to keep it at reasonable (for me) sizes is not really possible. So I think this will not be accepted.

Quote:

--deblock -2:-2: This is also a rule of thumb since x264's era: using smaller numbers if you prioritize quality

Never heard about such a rule and can't imagine using more than (less than?) -1:-1, unless you are dealing with tons of grain. -2:-2 might produce better SSIM/PSNR but it is not uncommon for it to create more obvious blocks in high complexity scenes, which can be hidden by the grain, but otherwise pretty nasty.
With other options I whatever haven't played or consider them to be not ready. Speaking of readiness, I don't see much reasons to tweak presets while some options and encoder itself is not really ready for general use. Some time later you'll end up tweaking them again.

Edit: about keyint, I think that 250 is not enough. 20*framerate is a better option. Personally I would even use infinite, but some restrictions are required for general presets/tunings so that it won't produce unseekable steams.

littlepox · 12th August 2015, 15:33

Quote:

Originally Posted by Keiyakusha

I vote for a somewhat dynamic preset/tune, not fixed value for all sources.

but the thing is, all x264 settings and I think x265 as well aren't really tuned for high quality encodes.

That's why I recommended a parameter like "--expectation" and then tweak rc parameters accordingly. That shall satisfy users who wish for high quality encoders, as well as users who desperately want the smallest bit-rate.

Quote:

Originally Posted by Keiyakusha

Never heard about such a rule and can't imagine using more than (less than?) -1:-1

In old times we don't need that low for x264, true. But it seems x265 does need such values, at least at this quality level.

Quote:

Originally Posted by Keiyakusha

Speaking of readiness, I don't see much reasons to tweak presets while some options and encoder itself is not really ready for general use. Some time later you'll end up tweaking them again.

There have to be pioneers though. By doing such research, we don't only make x265 to be (partially) more practical, gradually extending its usage from low-quality cases to high-quality cases, but also find some systematic ways for enchantment and then we can give feedback to the developers. Feedback is required to accelerate its maturity; we aren't just wasting our time for nothing.

benwaggoner · 12th August 2015, 18:10

Very interesting stuff, and much more productive than just complaining about x265's default behavior. I have a few comments:

Comparisons should really be done at the same CBR bitrate (with --bitrate, --vbv-maxrate, and --vbvbufsize all set) if we really want to see differences in apples-to-apples comparisons. That will provide local differences that are really local differences, and not reflective of whole-file rate control or crazy peaks. For the most accurate comparison of encoders, use the lowest legal HEVC level's maximum bufsize. For a more fair comparison, you could let H.264 have bigger have a bigger bufsize since the spec allows it.

Focusing on 10-bit is interesting for archiving and high-end PC use, but there are and will be a lot of 8-bit HEVC decoders in the wild. Plus comparing 10-bit streams on 8-bit displays can be pretty fraught, since there are so many different dithering and conversion modes that can be applied. I recommend testing in 8-bit, or defining a particular display mechanism. Or just using a 10-bit pipeline to a 10-bit DisplayPort monitor, which is what I do, but I know a lot of folks don't have capable hardware.

Also, x265 seems to do quite a bit better than x264 for banding etcetera in 8-bit. It don't know that it provides any benefit to use 10-bit x265 encoding when coming from an 8-bit 4:2:0 source, and it might actually hurt. De-dithering and then re-dithering at playback has its own downsides. Just because it was a big win in H.264 doesn't mean we should assume the same in HEVC.

--aq-mode 3 is really there for 8-bit SDR encodes. I wouldn't expect it to be valuable elsewhere.

Is the goal accurate feel more than true accuracy? High psy-rdoq is going to reduce mathematical accuracy, but adds energy to the encode that'll make it seem sharper and more detailed.

I like the ideas about making aparameters more adaptive to frame size and content. It's a lot of complex code, however, and having good examples of how different parameters are optimal in different contexts will be very helpful in adding automation.

Sagittaire · 12th August 2015, 18:59

Quote:

about keyint, I think that 250 is not enough. 20*framerate is a better option. Personally I would even use infinite, but some restrictions are required for general presets/tunings so that it won't produce unseekable steams.

One more time it's completely false. If you compare "250" vs "infinite" for keyint interval for exactly the same quality output you can expect 0.5% to reduce size output and it's for the best theorical case. For real source, x264 and x265 use really less than 10*fps in most case with cute scene detection and in this case size will be exactly the same for exactly same output quality.

Quote:

-2:-2 might produce better SSIM/PSNR

-2;-2 don't produce the best psnr/ssim result. -2;-2 produce really strong deblocking effect too at high quantizer too. One more time declocking is adaptative itself. Lower quantizer imply always lower deblocking. If you make encoding at quantizer at q10 or less with H264 or HEVC, tou don't have deblocking because you are outside the deblocking threshold in most case even if you choose +6;+6 for deblocking.

Keiyakusha · 12th August 2015, 19:07

Quote:

Originally Posted by Sagittaire

One more time it's completely false. If you compare "250" vs "infinite" for keyint interval for exactly the same quality output you can expect 0.5% to reduce size output and it's for the best theorical case. For real source, x264 and x265 use really less than 10*fps in most case with cute scene detection and in this case size will be exactly the same for exactly same output quality.

Nothing is false here. It would be a bad idea if it was making encoder significantly slower, but it shouldn't be the case. How much size reduction it will bring is irrelevant (even if zero). Every little bit helps as long as you don't pay for it with some more hours of encoding.

Quote:

Originally Posted by Sagittaire

-2;-2 don't produce the best psnr/ssim result.

In case of animated content and x264 they often do. Not best, it is likely not the same with every source, but better than -1:-1 and -3:-3. Some material actually benefits more from 0:0.
But looking at x265 reports, it actually feels like -2:-2 is a good idea. I don't know. My only point was that -2:-2 for x264 is not a rule that I know of.

Sagittaire · 12th August 2015, 19:15

Quote:

Originally Posted by benwaggoner

Very interesting stuff, and much more productive than just complaining about x265's default behavior. I have a few comments:

Comparisons should really be done at the same CBR bitrate (with --bitrate, --vbv-maxrate, and --vbvbufsize all set) if we really want to see differences in apples-to-apples comparisons. That will provide local differences that are really local differences, and not reflective of whole-file rate control or crazy peaks. For the most accurate comparison of encoders, use the lowest legal HEVC level's maximum bufsize. For a more fair comparison, you could let H.264 have bigger have a bigger bufsize since the spec allows it.

Focusing on 10-bit is interesting for archiving and high-end PC use, but there are and will be a lot of 8-bit HEVC decoders in the wild. Plus comparing 10-bit streams on 8-bit displays can be pretty fraught, since there are so many different dithering and conversion modes that can be applied. I recommend testing in 8-bit, or defining a particular display mechanism. Or just using a 10-bit pipeline to a 10-bit DisplayPort monitor, which is what I do, but I know a lot of folks don't have capable hardware.

Also, x265 seems to do quite a bit better than x264 for banding etcetera in 8-bit. It don't know that it provides any benefit to use 10-bit x265 encoding when coming from an 8-bit 4:2:0 source, and it might actually hurt. De-dithering and then re-dithering at playback has its own downsides. Just because it was a big win in H.264 doesn't mean we should assume the same in HEVC.

--aq-mode 3 is really there for 8-bit SDR encodes. I wouldn't expect it to be valuable elsewhere.

Is the goal accurate feel more than true accuracy? High psy-rdoq is going to reduce mathematical accuracy, but adds energy to the encode that'll make it seem sharper and more detailed.

I like the ideas about making aparameters more adaptive to frame size and content. It's a lot of complex code, however, and having good examples of how different parameters are optimal in different contexts will be very helpful in adding automation.

Interessing pdf from ateme about 10 bits encoding for 8 bits sources

http://extranet.ateme.com/download.php?file=1114

Sagittaire · 12th August 2015, 19:32

Quote:

Originally Posted by Keiyakusha

Nothing is false here. It would be a bad idea if it was making encoder significantly slower, but it shouldn't be the case. How much size reduction it will bring is irrelevant (even if zero). Every little bit helps as long as you don't pay for it with some more hours of encoding.

You have major decoding issue if you use really long GOP for HEVC and it's useless for coding efficiancy. In most case, for real movie source, scene cut use less than 250 frames. It's like that. Use PSNR for make the test at same size in ABR mode. Metric are perfect to demonstrate that here because you change simply IFrame by PFrame in long GOP encoding. Make the test if you want. Use more than 250 for keyint is simply a placebo effect ...

Keiyakusha · 12th August 2015, 19:43

Quote:

Originally Posted by Sagittaire

You have major decoding issue if you use really long GOP for HEVC

This is very interesting. I didn't know that. Can you elaborate what is that and why x264 has no problems of this sort? Also you sure this is not a some kind of decoder-side issue?
Edit: so far with x264 I had absolutely no problems with gops as long as 4000+ frames (1 keyframe at the beginning per ~3min video)

Quote:

Originally Posted by Sagittaire

In most case, for real movie source, scene cut use less than 250 frames. It's like that. Use PSNR for make the test at same size in ABR mode. Metric are perfect to demonstrate that here because you change simply IFrame by PFrame in long GOP encoding. Make the test if you want. Use more than 250 for keyint is simply a placebo effect ...

The whole HEVC by the most part is a placebo. They could simply use 10-bit AVC level 5.1 for a new HD standard and all the average users would be happy. The thing is, I acknowledge that this may be a placebo (see where I mentioned that its fine even if there is 0 gain). But there's nothing wrong with placebo if it doesn't costs us processing speed and have at least a potential to improve something (not counting issue that you mentioned earlier)

benwaggoner · 12th August 2015, 23:12

Quote:

Originally Posted by Sagittaire

Interessing pdf from ateme about 10 bits encoding for 8 bits sources

http://extranet.ateme.com/download.php?file=1114

Yeah, no doubt it is useful for H.264.

But I've not seen any papers or real-world demonstrations that the same is true for HEVC. Maybe it is, but I don't think we should assume it, because there are downsides to going to 10-bit. Among other things, relying on the playback device to do a high quality conversion to its native color space, which outside of recent high-end TVs, is generally 8-bit in pipeline, link, or panel.

So, let's validate that 10-bit works better than 8-bit for 8-bit sources in HEVC before we march on assuming it is.

benwaggoner · 12th August 2015, 23:22

Quote:

Originally Posted by Keiyakusha

This is very interesting. I didn't know that. Can you elaborate what is that and why x264 has no problems of this sort? Also you sure this is not a some kind of decoder-side issue?
Edit: so far with x264 I had absolutely no problems with gops as long as 4000+ frames (1 keyframe at the beginning per ~3min video)

ALL decoders have this problem. If you want to do random access to the last frame of a GOP, then you have to decode all frames it references, and all frames those frames reference, back to the start of the GOP. Having a classic IPBb frame hierarchy helps, since only prior P-frames need to be decoded. But with a 4000 frame GOP with an average of 75% P-frames, that's still ~999 frames to decode before you can display the last one. At 24fps, 4000 frames is about 2.8 minutes. If you actually have a shot that long, the average random seek into it will require decoding ~500 frames, which would take >5 seconds with many decoders.

Yes, for typical content it won't matter. But maximum GOP duration for single-file encoding should be based on the maximum random access delay you can put up with. If it's 10 seconds, most IDR frame will come from scenecut. But those handful of "forced" IDR frames will really help random access into those sections, with vanishingly tiny encoder efficiency issues.

Bear in mind that most commercial video that gets watched has a fixed GOP duration of just a few seconds. The good modern encoders are really very well tuned for not having keyframe strobing anymore.

Quote:

The whole HEVC by the most part is a placebo. They could simply use 10-bit AVC level 5.1 for a new HD standard and all the average users would be happy. The thing is, I acknowledge that this may be a placebo (see where I mentioned that its fine even if there is 0 gain). But there's nothing wrong with placebo if it doesn't costs us processing speed and have at least a potential to improve something (not counting issue that you mentioned earlier)

That was certainly an option, but no one went with it because the quality improvements weren't worth it. For real-world delivery, HEVC delivers big advantages and bigger future improvements. And also can do UHD and HDR. Comparing x264 and x265, I'd say x264 needs 2.5x the bitrate to get to decent quality at UHD frame sizes. And HDR bitstream flags are only available in HEVC.

10th August 2015, 18:58	#1 \| Link
littlepox Registered User Join Date: Nov 2012 Posts: 218	Suggestion for x265's --tune film Several days ago we asked the x265's team how could we participate in the discussion, and they left me with e-mail address. When we finished writing the email, we are also glad to post it here so all of you are welcome to discuss about the topic: How to keep the film grain and details at higher quality when using x265? We take several movie scenes with moderate level of film grain, rich details, and some dark scenes. Then we take out the tests between x264-10bit and x265-10bit: (NOTE there are --input-depth 10 for both sets of parameters. Do NOT just copy and paste if you are not encoding 10-bit input) x264: --preset veryslow --tune film --crf 19.0 --qcomp 0.75 --input-depth 10(a fairly trusted setting ) x265: --preset slower --ctu 32 --max-tu-size 16 --crf 20.0 --tu-intra-depth 2 --tu-inter-depth 2 --rdpenalty 2 --me 3 --subme 5 --merange 44 --b-intra --no-amp --ref 5 --weightb --keyint 360 --min-keyint 1 --bframes 8 --aq-mode 1 --aq-strength 1.0 --rd 5 --psy-rd 1.5 --psy-rdoq 5.0 --rdoq-level 1 --no-sao --no-open-gop --rc-lookahead 80 --scenecut 40 --max-merge 4 --qcomp 0.8 --no-strong-intra-smoothing --deblock -2:-2 --qg-size 16 --pbratio 1.2 (We've tested hundreds of parameter combinations and we conclude this set is the best outcome) We compare the quality with our eyes, NOT psnr/ssim, which favors blurriness over grain keeping. x264 gives an 118% size compared to x265, with nearly the same visual quality. We can also drop the bit-rate of x264 a bit, but then AVC outcome necessarily looks worse. At this point we believe x265 claims a victory, not at extremely low bit-rate cases where HEVC blurriness looks better than AVC blockiness, but rather a high-quality, near-transparent encoding. Our test focus on several RC parameters: --ctu, --tu, --crf, --qcomp, --aq-mode, --aq-strength, --psy, --qcomp --qg-size...These parameters can alter RC behavior with minor impact on the speed. We took some iterative method with genetic algorithms to select the possibly best parameter sets out of infinite. Here are some of our comments: --ctu, --max-tu-size: This two should be decreased in 1080p @ high-quality encoding. Keeping other parameters constant, tweaking --ctu 64->32, --tu 32->16 gives even better quality with nearly 15% of size-decrease, in crf mode. We assume that adopting larger CU and TU actually wastes the bit-rate by producing over-smoothed block, forcing bit-rate to rise in order to keep a "constant rate-factor". So we'd better just leave them smaller, at least under 1080p --crf: x265 gives a default value 28. This is way too low to compete against x264's default value 23. Many users believe by tweaking crf they can choose between quality and size without other concerns, but they are wrong; sometimes, tweaking other parameters is more efficient. --crf need to be coupled with other parameters properly tuned. --qcomp: the lower your crf, the higher the quality, and you should set --qcomp to be higher. This is also true for x264. Our test suggest --qcomp 0.8 is very essential when x265_crf<23. --aq-mode: keep it at 1. aq-mode=3 gives wired bit-rate for nothing. --aq-strength: default 1.0 is pretty good. --psy: the primary weapon to fight blurriness. --psy-rd 1.5 --psy-rdoq 5.0 --rdoq-level 1 works better than --tune grain's settings, trust me. --deblock -2:-2: This is also a rule of thumb since x264's era: using smaller numbers if you prioritize quality. --qg-size 16: The larger number you set, the smaller file you get, but much worse quality for trade. Just change it to 16 instead of 64; this is worthy. --no-sao: SAO smooths everything up. Do NOT use it unless you really want to. --me_range 44: me_range has minor impact on speed when coupled with low level of me and subme; but that's not true for --me 3 --subme 5. Decrease it a bit in 1080p do you a favor with little sacrifice. --no-rect --no-amp: They trade time for nothing. Do NOT enable them unless you really have time to waste. So, here are our suggestions for x265 --tune film: --ctu 32 --max-tu-size 16 --qcomp 0.8 --aq-mode 1 --aq-strength 1.0 --psy-rd 1.8 --psy-rdoq 5.0 --rdoq-level 1 --deblock -2:-2 --qg-size 16 --no-sao --me_range 44 --no-rect --no-amp By setting these parameters, you can probably set your crf to be around 22~19, and it gives you considerably good output with small size. if you set crf=21~22, --qcomp 0.75 is recommended. We don't see any advantage for x265 if you are targeting at x264_crf<=16. At that point x265 requires even more bit-rate to keep the details. Furthermore, here are some other recommendations: 1. Except for --tune and --preset, add another parameter "--expectation" and let user specify whether he wishes a high, medium or low quality(with respective size). This can also be derived if --crf is specified. Then use the input to change the rc behaviors. For example, if the user is expecting high quality rather than a crf=28 blurriness, increase --qcomp to 0.8. 2. Change RC behaviors and speed options based on resolution. 720p/1080p/1440p/4K don't optimally share a same set of parameters. For example, --max-tu-size --ctu --me_range. 3. Give less bias/more penalty to bigger CUs and TUs, especially at higher quality expectation and lower resolution. --rd-penalty 2 is a very useful parameter, but it only prohibits TU size of 32x32. Let's impose more penalties to similar cases. Thank you for patiently reading this email, and I'm ready to conclude it: So far x265 team has done a fantastic job by offering the best encoder (at low bit-rate), but let us greedily urge more. We wish x265 to be best not only at low quality, but also best overall. At the same time, we are willing to help the team with our own effort. Regards, LittlePox 2015/08/11 Last edited by littlepox; 10th March 2016 at 15:37.

11th August 2015, 13:34	#2 \| Link
Boulder Pig on the wing Join Date: Mar 2002 Location: Finland Posts: 5,733	Is your test material regular movies or anime etc? I could try your suggested settings with my Hot Fuzz test clip and see if there is any improvement (and also compare to x264). __________________ And if the band you're in starts playing different tunes I'll see you on the dark side of the Moon...

11th August 2015, 14:48	#3 \| Link
sneaker_ger Registered User Join Date: Dec 2002 Posts: 5,565	Thank you for sharing! I added it to my latest grain test but since this is a suggestion for --tune film (not --tune grain) I will also try a less grainy source in the near future. There's definitely a difference, often less blurry (in exchange for other artifacts, though). Last edited by sneaker_ger; 11th August 2015 at 17:10.

11th August 2015, 18:30	#5 \| Link
sneaker_ger Registered User Join Date: Dec 2002 Posts: 5,565	1. crf 19.5~20 for --preset slower --tune grain, so below your target range 3. Look near his right (from viewers pov left) ear: http://abload.de/img/1060_org_gqayh.png http://abload.de/img/1060_x265_pzstp.png http://abload.de/img/1060_littlepox_fhxsf.png But as you say it's expected. Last edited by sneaker_ger; 11th August 2015 at 19:49.

12th August 2015, 13:56	#10 \| Link
littlepox Registered User Join Date: Nov 2012 Posts: 218	OK let me explain the "genetic algorithms" we applied in order to clarify something, and this is my suggestion for parameter testing: x265 parameter sets are of one species, and a single set of parameters is an individual in the species. An individual is characterized by its genetic strands, and a parameter set consists of parameters. An individual can reproduce its children with most of their genetic strands similar to the parent, and one parameter set can be modified to more sets by changing one or several parameters each. In most cases, we don't have abrupt genetic mutation, but seldom we do. We tweak the parameters step by step, but seldom we do add wired changes hoping to make a difference. Children with different genetic strands are subject to natural selection, and parameter sets are subject to quality benchmarks, either by pnsr/ssim or by human eyes. We start from a small protobiont, a parameter sets with very low settings like --crf 30 --aq-strength 0.3 --qcomp 0.4...... You don't expect anything good from it but very small bit-rate. We then drive this individual to grow and evolve by strengthening some parameters, for example: dropping crf a bit gives better quality, but also bigger size; increasing aq a bit gives better quality, but also bigger size; --ipratio 1.1/pbratio 1.1 give better quality, but also bigger size; ...... You make each mutation once, properly control mutated individuals to be at similar size(in our test, the threshold is max/min<105%. If violated, say we dropped too much crf and get larger outcome than other children, we remade the mutation with less change to the crf in order to fit the size with other siblings.) and you select the best individual(s). You then keep this best boy alive and let it grow furthermore. most of the children come from changing one parameter, but some are made by changing multiple parameters together. Every batch you have children with similar size, and you choose the best one(s) to reproduce the next generation. Gradually, the bit-rate increases, quality refines, and parameter sets become more and more optimal. when I say "dropping crf a bit instead of tweaking ibpratios", I mean: 1. if you tweak ibpratios alone, you get better output with bigger size 2. but you can also drop crf, increase aq, increase qcomp... to reach a similar, bigger size. 3. We tested that starting from the set we give, dropping crf is still better than tweaking ratio, and many other possible choices you can name. 4. Probably when the species grow even bigger, say we reach the situation near x264_crf=17, then tweaking the ratios become the best choice at that evolution stage. Furthermore, I'm not quite interested in the topic regarding --keyint and --bframes. In the test their behaviors classified them as unimportant genetic strands. It means they don't have much impact on the results. Set whatever reasonable numbers you want; it just doesn't matter much. Even I said --bframes 10 is better than 3/5, the difference is indeed very small, about 1% as you said. The converse is also true: changing --bframes 10 to 5 makes very little difference, both quality-wise and time-wise. I've also not included them in the recommended parameters, meaning they are out of the scope of --tune film. It's perfectly fine you set --keyint 250 --bframes 5 as you wish, but there is nothing much to criticize if I set them higher. And there is a reason for doing so in a test: I just wish to set it high enough so it won't possibly become any bottleneck. Last edited by littlepox; 12th August 2015 at 14:56.

12th August 2015, 18:10	#13 \| Link
benwaggoner Moderator Join Date: Jan 2006 Location: Portland, OR Posts: 4,770	Very interesting stuff, and much more productive than just complaining about x265's default behavior. I have a few comments: Comparisons should really be done at the same CBR bitrate (with --bitrate, --vbv-maxrate, and --vbvbufsize all set) if we really want to see differences in apples-to-apples comparisons. That will provide local differences that are really local differences, and not reflective of whole-file rate control or crazy peaks. For the most accurate comparison of encoders, use the lowest legal HEVC level's maximum bufsize. For a more fair comparison, you could let H.264 have bigger have a bigger bufsize since the spec allows it. Focusing on 10-bit is interesting for archiving and high-end PC use, but there are and will be a lot of 8-bit HEVC decoders in the wild. Plus comparing 10-bit streams on 8-bit displays can be pretty fraught, since there are so many different dithering and conversion modes that can be applied. I recommend testing in 8-bit, or defining a particular display mechanism. Or just using a 10-bit pipeline to a 10-bit DisplayPort monitor, which is what I do, but I know a lot of folks don't have capable hardware. Also, x265 seems to do quite a bit better than x264 for banding etcetera in 8-bit. It don't know that it provides any benefit to use 10-bit x265 encoding when coming from an 8-bit 4:2:0 source, and it might actually hurt. De-dithering and then re-dithering at playback has its own downsides. Just because it was a big win in H.264 doesn't mean we should assume the same in HEVC. --aq-mode 3 is really there for 8-bit SDR encodes. I wouldn't expect it to be valuable elsewhere. Is the goal accurate feel more than true accuracy? High psy-rdoq is going to reduce mathematical accuracy, but adds energy to the encode that'll make it seem sharper and more detailed. I like the ideas about making aparameters more adaptive to frame size and content. It's a lot of complex code, however, and having good examples of how different parameters are optimal in different contexts will be very helpful in adding automation. __________________ Ben Waggoner Principal Video Specialist, Amazon Prime Video My Compression Book

Thread Tools	Search this Thread
Show Printable Version Email this Page	Search this Thread: Advanced Search
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode