x265 HEVC Encoder [Archive] - Page 88

LigH

26th October 2016, 20:51

To disappoint you slightly, x265 is already pretty optimized, regarding the usage of highly specialized CPU instruction sets, most important the AVX family. New and more efficient instruction sets will probably be used in the future as well, when the CPU supports it and the developers learn to use them.

But the HEVC technology in general is not perfectly parallelizable. It is limited by a lot of dependencies between several intermediate results. There is no magic to circumvent such dependencies. And there are more of them when you try to achieve more efficient encoding to spare bitrate and preserve more quality per bitrate.

Dclose

26th October 2016, 23:53

I've noticed the same issue with blurriness when increasing inter/intra depth.

So, the bottom line is - if areas with lots of movement are important in video (anime, sport, porn), try decreasing these options to get better defined edges and less blur. For general purpose encoding, Hollywood motion pictures, documentaries, etc, where scenery is as important as the action itself, feel free to use larger values to get smaller files.

My current test sample barely has any motion in it, and the file is bigger with 2 Intra than with 1 Intra. Maybe the small amount of motion on the screen draws my attention to the blur more, but even when only a character's mouth is moving, the entire video makes me want to rub my eyes of a haze.

The difference between Intra 1 and 2 isn't as pronounced as between Coding Unit 32 and 64. I encoded many hours of (animated) video at 1fps using CU64 to then watch it and think my media player had a blur filter turned on.

You said such Intra/Inter effects are more pronounced with higher b-frames, and most of my testing was with 8 or 10 b-frames. Though, again, my current video has little action. I've done plenty of testing of other video with CRF of 23-33 to try to maximize file reduction vs. "tolerable" quality, but this is with CRF of 16-18, so I can't blame that.

Previous live-action video I encoded I used 10 b-frames since it was obvious to me at the CRF 23+ I was using that the image was simply "tighter." Perhaps more easily noticed during action such as someone walking, but even during no action there is still usually at least slight movement on the screen from an actor or the camera movement.

Actually, for this cartoon, most of my testing was with 10 b-frames, but I dropped that down to 8 since 10 seemed slightly sharper (or maybe just less "fluid" during motion) than the source video. 6 b-frames seems to make the motion blurrier than the source. Perhaps mainly/only during motion. As if the black edges of the drawn characters have thicker black edges during motion.

That coincides with your saying more Intra/Inter tries to re-use material from previous frames and keep it on screen longer.

From my previous tests, even more b-frames helped sharpen the video, but the diminishing returns of encoding time vs. quality gain was too much for me. And I suppose maybe "10 b-frames almost never matters" might be other people's way of saying, "It helps but isn't worth the encoding time to me."

I know some will say most video only usually has x number of b-frames in a row or whatever, but I consider my eyes to be an impartial judge since I couldn't give a technical description for "b-frame" or "b-pyramid" or "weighted-b-frame" if I tried and am only going by what I see. And it's not like I want to increase encoding time by using more b-frames. It'd be great if 2 b-frames was better since it encodes faster.

Dclose

27th October 2016, 00:20

Max Merge Candidates.

The setting for that increases with slower presets, but I haven't seen much if any discussion about what it does to the quality. A higher setting does help "concentrate" and "focus" the video with the low bitrates I've mainly tested it on, but it seems like a smoothing filter in that it makes bad video more watchable but removes detail, and so maybe isn't the best setting to turn up if trying to preserve detail.

I stopped using x265 for a long time because the presets, (or the x265 build at the time), would destroy detail compared to x264 -- regardless of CRF setting or bitrate.

Just thought I'd mention it since I've seen little discussion of it.

gamebox

27th October 2016, 05:03

@Dclose

Max merge has something to do with neighboring blocks, that is - including them in analysis of currently encoded one. I haven't experimented much around that option.

Deblock -3 helps me preserve more sharpness and detail, and I also try to keep as high resolution as possible and "reasonable" (576p for SD content, for example). Since decoding is done in software, I looked at the options there too - for example, I've found that Media Player Classic uses bad resize algorithm during playback by default, to scale video to fullscreen resolution of monitor (Bilinear, so I've set it to Bicubic). And, of course, newest version of encoder is a must, options like no rskip helped a lot with the quality, even if they increased encoding time significantly as well.

Boulder

27th October 2016, 11:07

A question regarding recursion skip: it is disabled in --tune grain. How big a difference would it theoretically make (i.e. how important it is in keeping grain) compared to enabling it? It makes a really big difference performance-wise at least in --preset slower.

MeteorRain

27th October 2016, 20:48

I am not a programmer and do not know methods of software optimization. My questions are, is it possible that x265 could become noticeable faster in the future, or are there any realistic possibilities for optimizing the program code?

Only if there are some new instructions specifically optimized for HEVC encoding, and x265 can use them.

Another way to make it run faster, is to get a better CPU -- Yes I'm talking about expensive-as-hell E5s.

aymanalz

28th October 2016, 11:39

I am not a programmer and do not know methods of software optimization. My questions are, is it possible that x265 could become noticeable faster in the future, or are there any realistic possibilities for optimizing the program code? Do people optimize the speed of x265 in this phase of development? And is it thinkable that x265 in future can use the computational power of a modern graphics card as a teammate together with the cpu?

Hehe, I asked the exact same question over a year ago, on this very thread. At that time, the latest version was 1.6, and the developers replied that they did have some plans to improve the speed. And they were right - by version 1.9, encoding was significantly faster.

Perhaps it is time to ask that question again - is there a roadmap to make the encoding noticeably faster in future? From a couple of responses to your question, apparently not. Which would be a shame, because I can see that x265 is significantly better in quality than any previous encoder - and yet, I can't help feeling that the encoding time is still too slow for most people.

If it is true, as selur and another poster have said, that x265 has already been optimized to the maximum possible extent for today's hardware, then I get a gut feeling that x265 would not be able to dethrone x264 as the reigning monarch of compressed video. (Especially for storing home video, or for streaming websites like Youtube.)

*******************************

The second part of your question is very pertinent to me - about the use of graphics cards. I hope that in future it would be possible to use my otherwise idle GPU to take part in the encoding process. If some calculations can be offloaded to the GPU, and that can make a 20% difference to the encoding time, that would be a significant shot in the arm for x265 adoption.

LigH

28th October 2016, 12:01

"Home video" with resolutions up to FullHD at most is not the main market for HEVC. You may be able to spare a bit more bitrate, especially when tuning for sharpness and grain gets improved more. But the main target are UHD resolutions where HEVC can draw its trumps: bigger coding units and more flexible partitioning of them.

My samples (https://www.mediafire.com/folder/ldwl20fppplbx/samples) are already quite old; but the "in_to_tree" videos give a good hint where x264 could not compete with x265 already in April 2014. The resolutions are "only" 1080p, but in this sample you can compare a low bitrate HEVC result (CRF 30) with about the same, 2x, 3x, and 4x the bitrate in AVC, and decide how much was required to let x264 handle the sky above the trees good enough not to throw the towel... Of course, this example appears to be quite academic. I believe we should repeat this test now with current versions and see if x265 fixed a few flaws in the meantime, handling the challenge of lots of leaves even better now with grain tuning or similar options.

x265_Project

28th October 2016, 17:20

I am not a programmer and do not know methods of software optimization. My questions are, is it possible that x265 could become noticeable faster in the future, or are there any realistic possibilities for optimizing the program code? Do people optimize the speed of x265 in this phase of development? And is it thinkable that x265 in future can use the computational power of a modern graphics card as a teammate together with the cpu?

Yes. We're always working on 2 things...
1 - improve compression efficiency (achieve the highest possible visual quality at any given bit rate; or, stated differently, achieve the lowest possible bit rate for any target level of visual quality).
2 - improve performance (without compromising compression efficiency, make x265 go faster). We can improve performance algorithmically, through smarter decisions that avoid unnecessary computations, and we are always looking to optimize x265 for the platforms it runs on (avoiding any bottlenecks).

Of course, x265 will benefit from advances in CPU performance from Intel, AMD, IBM and ARM. For example, the next generation of Intel Xeon chips (the Skylake Xeons, code-named Purley) will include AVX3 instructions which operate on 512 bits of data per clock cycle. But there are other possible ways to accelerate x265, and we're working on them.

Boulder

28th October 2016, 19:18

A question regarding recursion skip: it is disabled in --tune grain. How big a difference would it theoretically make (i.e. how important it is in keeping grain) compared to enabling it? It makes a really big difference performance-wise at least in --preset slower.One more thing in addition to this: what about deblocking, does disabling it reduce the smoothing effect or will it cause ill effects elsewhere? I've been unable to determine it by a frame-by-frame comparison.

CruNcher

28th October 2016, 20:21

Yes. We're always working on 2 things...
1 - improve compression efficiency (achieve the highest possible visual quality at any given bit rate; or, stated differently, achieve the lowest possible bit rate for any target level of visual quality).
2 - improve performance (without compromising compression efficiency, make x265 go faster). We can improve performance algorithmically, through smarter decisions that avoid unnecessary computations, and we are always looking to optimize x265 for the platforms it runs on (avoiding any bottlenecks).

Of course, x265 will benefit from advances in CPU performance from Intel, AMD, IBM and ARM. For example, the next generation of Intel Xeon chips (the Skylake Xeons, code-named Purley) will include AVX3 instructions which operate on 512 bits of data per clock cycle. But there are other possible ways to accelerate x265, and we're working on them.

Hopefully they will be better then your decoder optimizations on release ;)

gamebox

29th October 2016, 09:17

@Boulder:

Deblocking filter in HEVC, in my opinion, should always be set lower than what you've been using in H264. For example, I used values of -1 (sometimes -2) in H264, but now use -2 and also, often, -3 in HEVC. Reducing strength of this filter does increase visibility of coding artifacts like blocking and (even more pronounced) ringing, but they seem to be less disturbing and present in smaller areas than in H264, possibly because of different block sizes. I also prefer sharper, detail-rich image with some artifacts better than completely smooth one, especially as newer kind of video content (HD) is often recorded differently than older media and it helps reducing visibility of artifacts (objects in foreground in modern media content are very sharp/focused with a smooth/unfocused/simplified background (that means lower visual complexity and better optimization for low bitrate encoding)). I haven't tried encoding with deblock filter completely off, and some people advised to keep it on at all times anyway, but set to lowest value (-6) if desired.

brumsky

29th October 2016, 13:12

My current test sample barely has any motion in it, and the file is bigger with 2 Intra than with 1 Intra. Maybe the small amount of motion on the screen draws my attention to the blur more, but even when only a character's mouth is moving, the entire video makes me want to rub my eyes of a haze.

Just a shot in the dark here but are the sources interlaced? I've seen the same thing when the source is not properly deinterlaced.

Barough

30th October 2016, 18:56

x265 v2.1+36-d216cb9b3b47 (http://www109.zippyshare.com/v/9vWVNhQd/file.html) (MSYS/MinGW, GCC 6.2.0, 32 & 64bit 8/10/12bit multilib EXEs)

LigH

30th October 2016, 21:26

Me too:

x265 2.1+36-d216cb9b3b47 (https://www.mediafire.com/file/eogh5j4zkoq2oi8/x265_2.1+36-d216cb9b3b47.7z) (MSYS/MinGW, GCC 6.2.0, 32 + 64 bit, 8+10+12 bit single EXE + DLL and multi-lib EXE)

More advanced options:

--[no-]opt-qp-pps Discard optional HRD timing information from the bistream. Default enabled
--[no-]opt-ref-list-length-pps Discard optional HRD timing information from the bistream. Default enabled
--[no-]multi-pass-opt-rps Enable storing commonly RPS in SPS in multi pass mode. Default disabled

Dclose

30th October 2016, 22:56

Just a shot in the dark here but are the sources interlaced? I've seen the same thing when the source is not properly deinterlaced.

Nope. And I went through part of it frame by frame.

Also, as for size, I've been doing testing with 1 minute from The Matrix lately. People mostly standing around talking, lots of fine textures and a decent amount of grain.

Using Hybrid, 1080p, CRF 20... Intra/Inter 2 is slightly bigger file than Intra/Inter 1. I did the test again to double check, CRF23, again Intra 2 is slightly bigger. I thought it's supposed to be the opposite.

brumsky

31st October 2016, 15:51

Nope. And I went through part of it frame by frame.

Also, as for size, I've been doing testing with 1 minute from The Matrix lately. People mostly standing around talking, lots of fine textures and a decent amount of grain.

Using Hybrid, 1080p, CRF 20... Intra/Inter 2 is slightly bigger file than Intra/Inter 1. I did the test again to double check, CRF23, again Intra 2 is slightly bigger. I thought it's supposed to be the opposite.

It's hard to say, it's most likely getting more fine detail. Can you share a small sample file of the motion blur you are talking about? Maybe of the source as well, I'd like to see if I get the same effects.

Also, what are you full settings?

nandaku2

1st November 2016, 03:26

A question regarding recursion skip: it is disabled in --tune grain. How big a difference would it theoretically make (i.e. how important it is in keeping grain) compared to enabling it? It makes a really big difference performance-wise at least in --preset slower.

Since it was first introduced, rskip has now been optimized significantly to prevent loss of detail. It is definitely worth testing to see if we could re-enable it in tune grain and improve speed.

Kavitha

1st November 2016, 07:24

@x265_project

Thank you guys so much! I've been wishing for a option like this for some time.

A few quick tests show a performance drop of ~.5% with a file size ~2% smaller. This is comparing inter 1 vs inter 4 + limit tu 1. Without limit tu, inter 4 is ~ 20% slower.

So far I see no visible quality difference.

Great job!!!

Could you explain the differences between limit tu 1 + 2 a bit more in depth, please? :)

TU node in quad tree is traversed by depth first search process to find the best TU partition.
Aim of limit-tu feature is to limit the depth search range. By limiting the depth search range, the encoder can early exit thus improving the performance with minimal compromise in quality.
In limit tu 1, the depth search is limited using breath first search traversing. Here all partitions of current depth are processed before deciding if next depth should be traversed
In limit tu 2, depth search is limited by allowing the first partition to recurse fully to maximum allowed depths
(--tu-inter/intra-depth value determines the maximum TU depth the encoder is allowed to traverse) and limits the depth search of other partitions by reusing the maximum best depth that the first partition chooses.
Since --tu-inter-depth 1 allows the encoder to traverse only upto current depth, limit-tu has no scope to optimize the depth range. Hence limit-tu is enabled only if tu-inter-depth > 1
From the test results:
performance : limit-tu 2 > limit-tu 1
quality : limit-tu 1 > limit-tu 2

Selur

1st November 2016, 12:24

Is 'multi-pass-opt-rps' only needed during the 1st pass of a 2pass encoding?

benwaggoner

1st November 2016, 15:12

The difference between Intra 1 and 2 isn't as pronounced as between Coding Unit 32 and 64. I encoded many hours of (animated) video at 1fps using CU64 to then watch it and think my media player had a blur filter turned on.
Note that your --ctu size implicitly sets the max tu size, and thus the starting point for further recursion to smaller tu sizes set by tu-inter/intra.

So, the difference between ctu 64 tu-inter/intra 2 and ctu 32 tu inter-intra 1 is that the first gives the option of a 32x32 tu. Both can down to 16x16 and 8x8.

I strongly suspect that some of the benefit of a smaller ctu with faster presets is that they allow smaller tu sizes. The default CTU 64 an tu inter/intra of 1 only gives the option of 32x32 and 16x16 tu sizes. Comparing ctu size impact on its own merits should probably use tu-inter 4 and tu-intra 4 to make sure that the minimum tu size is always 4x4.

benwaggoner

1st November 2016, 15:16

Using Hybrid, 1080p, CRF 20... Intra/Inter 2 is slightly bigger file than Intra/Inter 1. I did the test again to double check, CRF23, again Intra 2 is slightly bigger. I thought it's supposed to be the opposite.
Comparing quality and ABR changes together is quite challenging. To really understand the impact of quality changes from parameters, I strongly recommend use of a 2-pass VBR encode with the alternate encodes parameters all using the same --bitrate, --vbv-maxrate, and, --vbv-buffsize. That way we are only comparing the quality @ bitrate. Once we figure out optimal settings for that, we can then work back to the right CRF.

It is completely predictable that different parameters can have different optimal CRF values to hit the same perceptual quality or the same ABR.

benwaggoner

1st November 2016, 15:20

@Boulder:

Deblocking filter in HEVC, in my opinion, should always be set lower than what you've been using in H264. For example, I used values of -1 (sometimes -2) in H264, but now use -2 and also, often, -3 in HEVC. Reducing strength of this filter does increase visibility of coding artifacts like blocking and (even more pronounced) ringing, but they seem to be less disturbing and present in smaller areas than in H264, possibly because of different block sizes. I also prefer sharper, detail-rich image with some artifacts better than completely smooth one, especially as newer kind of video content (HD) is often recorded differently than older media and it helps reducing visibility of artifacts (objects in foreground in modern media content are very sharp/focused with a smooth/unfocused/simplified background (that means lower visual complexity and better optimization for low bitrate encoding)). I haven't tried encoding with deblock filter completely off, and some people advised to keep it on at all times anyway, but set to lowest value (-6) if desired.
Note that this is really an alpha/beta pair, and that using different numbers is supported and often effective. I've seen some promising results with --deblock -1:1 for example. That reduces the strength of the deblocking (-1) but increases how much it gets used (+1).

benwaggoner

1st November 2016, 15:25

One more thing in addition to this: what about deblocking, does disabling it reduce the smoothing effect or will it cause ill effects elsewhere? I've been unable to determine it by a frame-by-frame comparison.
It can be complex. Less deblocking means less compression efficiency, so at the same bitrate, it will push up QP, which might trigger more SAO. Turning off both may reduce smoothness but increases QP. Thus at low-moderate bitrates true details can be lost, perhaps psychovisually balanced by the increased false detail of DCT "sizzle" (mild ringing and maybe blocking).

Again, I recommend making these comparisons at fixed ABR, or even CBR, since using CRF is going to yield simultaneous changes in ABR and perceptual quality, and we're better off understanding first how changes impact quality at the same bitrate.

brumsky

1st November 2016, 15:51

TU node in quad tree is traversed by depth first search process to find the best TU partition.
Aim of limit-tu feature is to limit the depth search range. By limiting the depth search range, the encoder can early exit thus improving the performance with minimal compromise in quality.
In limit tu 1, the depth search is limited using breath first search traversing. Here all partitions of current depth are processed before deciding if next depth should be traversed
In limit tu 2, depth search is limited by allowing the first partition to recurse fully to maximum allowed depths
(--tu-inter/intra-depth value determines the maximum TU depth the encoder is allowed to traverse) and limits the depth search of other partitions by reusing the maximum best depth that the first partition chooses.
Since --tu-inter-depth 1 allows the encoder to traverse only upto current depth, limit-tu has no scope to optimize the depth range. Hence limit-tu is enabled only if tu-inter-depth > 1
From the test results:
performance : limit-tu 2 > limit-tu 1
quality : limit-tu 1 > limit-tu 2

Thank you for the explanation, that was exactly what I was looking for!

Boulder

2nd November 2016, 20:32

Since it was first introduced, rskip has now been optimized significantly to prevent loss of detail. It is definitely worth testing to see if we could re-enable it in tune grain and improve speed.I did some tests with the noisy, blocky Star Trek TOS stuff, resized to 720p. I cannot tell which one looks better what comes to comparing to the original, because it seems to depend on the frame. I might even say that with rskip it's generally better and considering the cost in heavily increased encoding time, I'll use rskip also with --tune grain.

However, I need to make some more tests with a better source such as The Hobbit.

While I'm writing, I'd like to thank for the much improved encoder and --tune grain. It now looks like I can switch to x265 for good. Hopefully I can get me an Intel NUC that supports HW decoding of 10-bit HEVC soon :)

Boulder

2nd November 2016, 20:33

Note that this is really an alpha/beta pair, and that using different numbers is supported and often effective. I've seen some promising results with --deblock -1:1 for example. That reduces the strength of the deblocking (-1) but increases how much it gets used (+1).Thanks, I ended up using -6:1 to make only light deblocking but more often than with -6:-6.

burfadel

3rd November 2016, 04:26

performance : limit-tu 2 > limit-tu 1
quality : limit-tu 1 > limit-tu 2

How different is the quality between 1 and 2, at least in regards to your reckoning? Is using 2 worth it over none, and over 1?

Blue_MiSfit

3rd November 2016, 05:53

Damn. --tune grain is amazing.

With a very grainy movie I was able to achieve a 2:1 reduction versus x264 with no quality loss visible during motion. Still frame a:b comparison reveals differences, but nothing earth shattering.

jlpsvk

3rd November 2016, 19:27

Hi....what would be faster in encoding?? Core i7-4790K or 2xXeon E5-2670 (with ASUS Z9PE-D8 WS motherboard and 64GB DDR3 RAM)?

microchip8

3rd November 2016, 20:01

Hi....what would be faster in encoding?? Core i7-4790K or 2xXeon E5-2670 (with ASUS Z9PE-D8 WS motherboard and 64GB DDR3 RAM)?

obviously the latter...

jlpsvk

3rd November 2016, 20:03

even if Xeon E5-2670 is missing AVX2.0?

hajj_3

3rd November 2016, 20:20

even if Xeon E5-2670 is missing AVX2.0?

those xeons have 8 cores each. 16 slower cores will provide much more performance than a single quad core cpu.

microchip8

3rd November 2016, 20:36

even if Xeon E5-2670 is missing AVX2.0?

Don't get fooled in thinking that all you need for x265 is AVX2.0 CPU. Yes, AVX2.0 provides a performance boost for x265, but it isn't all and everything. The more cores you can throw at x265, the faster things will get, AVX2.0 or not. Obviously if you can throw as many cores with AVX2.0 you can get, that'll be great. But you're comparing here 16 cores to 4 cores

As your core count increases, it may be time to enable --pmode (and possibly --pme)

Boulder

5th November 2016, 10:50

Would it be possible to have some more statistics after the encoding has finished? I'm talking about things like number of refs used, percentages of CTU and TU sizes etc.

x265_Project

5th November 2016, 18:24

Would it be possible to have some more statistics after the encoding has finished? I'm talking about things like number of refs used, percentages of CTU and TU sizes etc.
Have you tried "--csv statlogfile.csv --csv-log-level 2"? This tracks the CU types used by prediction type (inter or intra) and size category (lumping rectangles and asymmetric partitions into the larger square partition size category).

If this doesn't give you everything you want - technically, of course, with respect to gathering and reporting statistics, anything is possible. Someone just has to write the code to gather and report the additional info to the console or to the csv log file. Contributions are welcomed.

Barough

5th November 2016, 18:29

x265 v2.1+46-583fc74fc0a2 (http://www96.zippyshare.com/v/uEW322FS/file.html) (MSYS/MinGW, GCC 6.2.0, 32 & 64bit 8/10/12bit multilib EXEs)

Selur

6th November 2016, 11:10

--[no-]multi-pass-opt-rps

Enable storing commonly RPS in SPS in multi pass mode. Default disabled.

Can someone elaborate on this. I mean:
- is it only meant to be used during the 1st pass of a 2pass encoding?
- what should be the effects of this when enabled? (should it optimize frame placement?)

benwaggoner

8th November 2016, 22:57

those xeons have 8 cores each. 16 slower cores will provide much more performance than a single quad core cpu.
For lower resolutions (720p and below), my 6700K Skylake beats my 16 core Sandy Bridge in encoding speed for a single stream at a time with some specific high quality settings. It depends on what you're doing.

jlpsvk

10th November 2016, 20:14

nevermind... :)

Going for i7-6700K with NZXT Kraken X61 + 32GB DDR4 RAM + GeForce GTX 1060, so I'm gonna test the HEVC 10bit NVENC also. :)

pradeeprama

11th November 2016, 11:31

Can someone elaborate on this. I mean:
- is it only meant to be used during the 1st pass of a 2pass encoding?
- what should be the effects of this when enabled? (should it optimize frame placement?)

This option should be enabled in both 1st and 2nd pass of encoding.

When enabled, it optimizes bitrate in the 2nd pass by storing the 64 most common RPSes from each GOP into the SPS that is emitted at the start of the GOP. This way, slice headers may just signal an index in the RPSes held in the SPS instead of sending the entire RPS; if the slice's RPS isn't in the SPS, then it needs to signal its own RPS in its header. This would result in bit-rate savings in the 2nd pass, without any change to visual quality.

pradeeprama

11th November 2016, 11:33

nevermind... :)

Going for i7-6700K with NZXT Kraken X61 + 32GB DDR4 RAM + GeForce GTX 1060, so I'm gonna test the HEVC 10bit NVENC also. :)

In the Maxwell generation, we noticed that nvenc didn't support B-frames. Is this fixed with Pascal?

I hope you will be sharing some of your findings in this comparison on this forum..

Selur

11th November 2016, 19:12

@pradeeprama: Thanks for the explanation.

Barough

12th November 2016, 20:27

x265 v2.1+47-a378efc939e3 (http://www20.zippyshare.com/v/pWUGdhWB/file.html) (MSYS/MinGW, GCC 6.2.0, 32 & 64bit 8/10/12bit multilib EXEs)

cojj

17th November 2016, 04:32

Quick question - Reading through x265 documentation, I'm seeing alot of reference to NxN. What does N reference to?

e.g. Enable analysis of rectangular motion partitions Nx2N and 2NxN (50/50 splits, two directions).

Jamaika

17th November 2016, 07:59

Unless there is a problem with yuv420p10le, but the yuv422p10le is much larger banding and the film is darker. (colormatrix bt2020, colorrange full). Decoder LAVFilters + madVR. I don't know why it is like that.

jlpsvk

17th November 2016, 14:32

Ok. Did some test with X-Men - Apocalypse bluray with GeForce GTX 1060 GPU (Pascal) to HEVC 10bit.

Screens:
http://www101.zippyshare.com/v/IMP3zy2H/file.html

NvEncC64 (3.01) encode settings in latest StaxRip Nightly:
--cqp I:P:B --codec h265 --ref 4 --gop-len 240 --max-bitrate 160000 --aq --colormatrix bt709 --colorprim bt709 --transfer bt709 --cabac --no-deblock --fullrange --output-depth 10 --enable-ltr --lookahead 32

Did test with I:P:B: 18:20:24, 19:21:25: 20:22:26.
Screens from untouched BD also included.

B-frames are generally useless, as Pascal supports B-frames in AVC/h264 only. So no B-frames for HEVC. :( Indeed, I think that Pascal is doing very good job in terms of speed/quality. I am getting encoding speed around 220fps.

Resulting bitrates:
Bluray - 23.4Mbit/s
18:20:24 - 5.9Mbit/s
19:21:25 - 4.7Mbit/s
20:22:26 - 3.9Mbit/s

Hoping in some opinions.

Motenai Yoda

17th November 2016, 17:25

Quick question - Reading through x265 documentation, I'm seeing alot of reference to NxN. What does N reference to?

e.g. Enable analysis of rectangular motion partitions Nx2N and 2NxN (50/50 splits, two directions).

replace N with 4,8,16,32 and 2N with 8,16,32,64
I'm not sure it split 4x4 MB too

JohnLai

18th November 2016, 03:04

Ok. Did some test with X-Men - Apocalypse bluray with GeForce GTX 1060 GPU (Pascal) to HEVC 10bit.

Screens:
http://www101.zippyshare.com/v/IMP3zy2H/file.html

NvEncC64 (3.01) encode settings in latest StaxRip Nightly:
--cqp I:P:B --codec h265 --ref 4 --gop-len 240 --max-bitrate 160000 --aq --colormatrix bt709 --colorprim bt709 --transfer bt709 --cabac --no-deblock --fullrange --output-depth 10 --enable-ltr --lookahead 32

Did test with I:P:B: 18:20:24, 19:21:25: 20:22:26.
Screens from untouched BD also included.

B-frames are generally useless, as Pascal supports B-frames in AVC/h264 only. So no B-frames for HEVC. :( Indeed, I think that Pascal is doing very good job in terms of speed/quality. I am getting encoding speed around 220fps.

Resulting bitrates:
Bluray - 23.4Mbit/s
18:20:24 - 5.9Mbit/s
19:21:25 - 4.7Mbit/s
20:22:26 - 3.9Mbit/s

Hoping in some opinions.

Wait...Pascal only 220fps for 1080p hevc encoding? Sound like CPU decoding bottleneck. Did you use hardware accelerated decoding? It is located at Basic-->Decoder-->nvencc (Native).

*Did you know one can use high quality resizer through --vpp-resize option? (assuming if you resize the video) Need to extract NPP stuff to nvencc location for cubic_bspline, cubic_catmull, cubic_b05c03, super, lanczos, bilinear, spline36 usage.

*Nvenc only make use of single reference frame for hevc encoding. It doesn't matter how many ref you specified.

*There is slight quality issue with CQP mode. It is hard to explain, so... nah, most people won't notice it anyway. Lazy to explain.

jlpsvk

19th November 2016, 20:51

just about CPU... i7-3930K (6-core with HT) is about 15% slower in encoding with x265 2.1+47 to HEVC 10bit 3840x1600 than i5-6600 (4-cores, no HT).