x265 HEVC Encoder [Archive] - Page 104

Krautmaster

5th April 2017, 14:22

oh, wow, I wish somewone would look at my custom line as well.

I am using handbrake nightly righty now, seem they include the 2.3 version of x265 too. Im transcoding 30GB x264 material to few GB for direct streaming with plex. Single pass. Quality mode. 5.1 opus audio @ 320 kb/s.
My params are:

1. medium preset
+
rc-lookahead=32:ssim-rd=1:ctu=32:crf=23:pbratio=1.22:subme=3:aq-mode=3:aq-strength=1.2:qcomp=0.64:no-sao=1:level-idc=4.1:high-tier=1:rd=4:psy-rd=2.5:psy-rdoq=3.5:rdoq-level=2:bframes=8:no-strong-intra-smoothing=1:weightb=1:b-intra=1:rect=1:limit-modes=1:rskip=1

thanks for testing and commenting!

I tried grain tune as well but that really sucks, very blocky in dark areas which my line does pretty well up there. You guys may try. I get 10-12 FPS with it , on a ryzen as well on a i7 5820K @ 4,3 Ghz.

need4speed

5th April 2017, 17:13

Hi again,
first off thanks to everybody for all the hints, finally managed to get the speed/quality I was looking for.
Same topic but a bit different: any special hint or tweak for animated contents/anime? 1080p.
TIA

need4speed

5th April 2017, 17:20

For example, you're increasing the # of reference frames from the default setting of 3 frames to --ref 6. This is a big waste of time. You'll get diminishing returns (in terms of quality for the speed impact) with each additional reference frame, and almost no improvement above 4 ref frames. You'll get a better overall result just by moving to a higher quality preset, like --preset slow.

Just my two cents really, but have experimented a lot lately and my personal experience is that 6 ref frames and 8 bframes are a total waste of time.
Wondering why leaving SAO active, as for deblock I have had mixed results but at the end of the day I disable it, together with SAO and Strong Intra smoothing.
As for qcomp1 really no idea, my default is 0.7 and going above have given me absolutely no benefit.

benwaggoner

5th April 2017, 17:54

How do you know you were looking at b frames and not p frames or even I frames?
If you are looking a single frames, you're not watching video anyway. I strongly feel that video quality only matters when it's moving at the fps that end users will see it at.

benwaggoner

5th April 2017, 17:55

Just my two cents really, but have experimented a lot lately and my personal experience is that 6 ref frames and 8 bframes are a total waste of time.
They can matter a lot if trying to get the best quality at a given bitrate, instead of just trying to get the best quality regardless of bitrate.

Wondering why leaving SAO active, as for deblock I have had mixed results but at the end of the day I disable it, together with SAO and Strong Intra smoothing.
As for qcomp1 really no idea, my default is 0.7 and going above have given me absolutely no benefit.
Also features that are useful when trying to maximize quality at a given bitrate.

benwaggoner

5th April 2017, 17:59

Are you using ABR or 2-pass? If so, increased compression due to b-frames would free up extra bits to be used for increased quality.

While it's very useful, I question the general methodology of using ABR or 2-pass for an apples to apples comparison of all parameters. In this case, wouldn't any parameter that increases compression create surplus bits and thus increase quality?
Well, yes, freeing up bits from where they aren't needed in order to spend them where they are is why codecs have psychoviual models and rate distortion optimization.

It does get down to what are the independent and dependent variables. I prefer testing at fixed bitrate or file size so that the independent variable(s) are the ones I am testing. Testing with CRF means both quality AND file size are changing together. Which makes sense if trying to tune for quality irrespective of file size, but not if trying to tune for optimal quality given a bitrate or file size constraint.

Tests and comparisons are always specific to a given context. So it's important to know what are your apples and what are other fruit.

benwaggoner

5th April 2017, 18:02

Thanks for that explanation. I have found CRF to be preferable as far as quality is concerned, but resulting file size is variable depending on the complexity and detail of the source material. I mostly recompress for archiving so prefer to limit file size by setting an ABR.
If you want to do CRF with a max file size, set --vbv-maxrate to the maximum bitrate you want to use, and --vbv-bufsize to something reasonable, and no more than the max allowed by Profile @ Level.

That will give you output that never exceeds --vbv-maxrate, but will use lower bitrates than that if your CRF target can be achieved at a smaller file size

need4speed

5th April 2017, 19:11

They can matter a lot if trying to get the best quality at a given bitrate, instead of just trying to get the best quality regardless of bitrate.

Also features that are useful when trying to maximize quality at a given bitrate.
Didn't mean to sound rude, sorry.
Thing is hevc is becoming more and more popular and worth it but personally speaking maxing out all settings and getting 3 fps or so is good for testing and for sure final results are worth the wait but the difference is so clear?
Nothing against endless encoding but in this forum we read about mixed results.
Quality is a personal perception beyond some point and 3 or 6 fps does make a difference for most of ppl I guess.
Hope this clears up my previous post!

Inviato dal mio GT-N7100 utilizzando Tapatalk

brumsky

5th April 2017, 21:26

If you are looking a single frames, you're not watching video anyway. I strongly feel that video quality only matters when it's moving at the fps that end users will see it at.

I agree for the most part. However, I do feel that comparing b frame to b frame has its role though.

brumsky

5th April 2017, 21:32

They can matter a lot if trying to get the best quality at a given bitrate, instead of just trying to get the best quality regardless of bitrate.

This is typically my goal, max quality per bit. I don't like just throwing bitrate at an encode just because. For example, if <5% increase in quality costs 20-25% more bitrate, then it's not worth it to me.

I've chased near perfect transparency, in b frames, before which resulted in encodes 50% larger. However, when played side by side it was very difficult to see the difference.

So both playback & b frame comparisons have their place - IMO.

Krautmaster

5th April 2017, 23:08

Hi guy, new here!

1st: sound is a good point. It easily made 30% until I noticed that something is wrong with my Video Quali / result size thing. With passing several GB of DTS HD tracks that made sence :)

2nd: I found the grain tune prerset and default to blocky in dark scenes.

3rd: I tried 30 movies the last two days and finally I use this params:

rc-lookahead=32:ssim-rd=1:ctu=32:crf=23:pbratio=1.22:subme=3:aq-mode=3:aq-strength=1.2:qcomp=0.64:no-sao=1:level-idc=4.1:high-tier=1:rd=4:psy-rd=2.5:psy-rdoq=3.5:rdoq-level=2:bframes=8:no-strong-intra-smoothing=1:weightb=1:b-intra=1:rect=1:limit-modes=1:rskip=1

+high quality sound by Opus 7.1 @ 320 kb/s. Kodi does well by HDMI + my 7.1 AV. Bare medium preset is fine but dark stuff is blocky. Would be interesting what "x265_Project" would think abt those params. :)

... above setting brings good quali in dark scenes ... that was the main issue to me. I have some movies which scales badly but most are around 5GB.
feel free to test and report please.

Edit: for batch jobs Im using handbrake nightly. Seems to use x265 2.3. Fine? Disadvantages? The setting above does ~10 FPS at a awesome qualiy, way better than the same with grainy tune, in terms of detail and all. I focus on 1pass encoding + HQ dark scenes + good quality / Site.

If someone needs a daily fresh master build of ffmpeg - I build them here:
https://1drv.ms/f/s!Ar_eIBtD4lGqg85_iDGvRCjP6xkc7g

Selur

7th April 2017, 22:31

Add dynamic rate-control reconfiguration
and this is intended to do what?

LigH

8th April 2017, 08:33

I suspect a feature not for a CLI encoder recoding one file, but rather for an integrated solution encoding live footage.

Or maybe, who knows ... zones support?

Selur

8th April 2017, 09:46

@LigH: Zones are supported for quite some time -> http://x265.readthedocs.io/en/latest/cli.html?highlight=zones#cmdoption-zones
But you might be right with the changing of the rate control during live encoding through the API and it really doesn't look like anything intended for CLI. :)
-> so probably not that interesting for normal users.

x265_Project

8th April 2017, 23:22

and this is intended to do what?

Dynamic rate control is only useful through the x265 API. It is for live encoding applications where you need to change the target bit rate on the fly, such as when the channel bandwidth varies, or for statistical multiplexing (statmux) of multiple video channels through a single transmission channel.

Statmux is a technique designed to maximize quality by taking advantage of the fact that while the complexity of one video varies by a large degree over time, the average complexity of multiple video signals varies much less. If you encode one video with constant quality, you get a widely varying bit rate. If you encode one video with constant bit rate, you get widely varying quality. With statistical multiplexing, you can encode multiple video signals at a constant channel bit rate with something much closer to constant quality.

Selur

8th April 2017, 23:44

Thanks for the info. :)

albt

9th April 2017, 05:15

--tune grain omits --rskip. Try adding that one back and see what happens both performance and quality-wise. I use it with --tune grain and have not noticed any difference in the quality of the result.

i'm doing the same, --rskip only waste time.

albt

9th April 2017, 05:21

aymanalz

9th April 2017, 09:00

i'm doing the same, --rskip only waste time.

To clarify, don't you mean NOT having --rskip wastes time?

albt

10th April 2017, 01:20

sorry, i mean --no-rskip.

cojj

10th April 2017, 07:23

Yes. Soon. Some time next week.
Has this been postponed? Would appreciate a simple update :) (simple update as in short message on the current status)

@LigH sorry for confusing you. I wish I was skilled enough haha

@aymanalz thank you for the clarification, that's what I meant!

LigH

10th April 2017, 07:29

If it was as simple as you believe, you could provide the patch... ;)

aymanalz

10th April 2017, 08:02

If it was as simple as you believe, you could provide the patch... ;)

I think he meant that a simple message would be appreciated, updating us on the progress. Not that the update of x265 itself is simple. Two different meanings for "update" - Updating the code, versus updating us about the progress.

Barough

10th April 2017, 12:42

x265 v2.3+30-c7b7c736696f (http://www88.zippyshare.com/v/XImWrLSA/file.html) (MSYS/MinGW, GCC 6.3.0, 32 & 64bit 8/10/12bit multilib EXEs)

x265 [info]: HEVC encoder version 2.3+30-c7b7c736696f
x265 [info]: build info [Windows][GCC 6.3.0][32 bit/64 bit] 8bit+10bit+12bit
x265 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX AVX2 FMA3 LZCNT BMI2

https://bitbucket.org/multicoreware/x265/commits/branch/default

Motenai Yoda

11th April 2017, 00:20

into source\common\param.cpp
line 1432
++ BOOL(p->bSsimRd, "ssim-rd");
line 1532/1533
++ BOOL(p->bSsimRd, "ssim-rd");

LigH

11th April 2017, 07:32

I just noticed there isn't any userdata or cli output to check if ssim-rd is enabled or not.

That has been fixed hereby, I believe.

BTW, line 1432: "TOOLOPT(param->bSsimRd, "ssim-rd");"

Magik Mark

11th April 2017, 08:43

Guys,

Has the 10bit lamda released?

LigH

11th April 2017, 08:48

No. Go check the x265 commit log (https://bitbucket.org/multicoreware/x265/commits/all) before asking. ;)

pingfr

11th April 2017, 16:51

microchip8

11th April 2017, 16:57

Out of sheer curiosity;

Are the presets defined in the docs available at https://x265.readthedocs.io/en/default/presets.html still "current" and accurate after the changes brought up at commit a0eee4b?

Thanks in advance.

x265_project said that due to the new lambda tables, presets need to be adjusted. I don't follow the commit history so I have no idea if this is done already. I suspect it is not yet

Selur

11th April 2017, 18:54

also doc about newest release is normally under http://x265.readthedocs.io/en/latest/ not https://x265.readthedocs.io/en/default/

Motenai Yoda

11th April 2017, 20:36

That has been fixed hereby, I believe.

BTW, line 1432: "TOOLOPT(param->bSsimRd, "ssim-rd");"

Asd I was so sleepy I pasted twice the same line

Barough

12th April 2017, 12:44

x265 v2.3+2.3+32-1ed218717877 (http://www76.zippyshare.com/v/tUu4Xqnt/file.html) (MSYS/MinGW, GCC 6.3.0, 32 & 64bit 8/10/12bit multilib EXEs)

x265 [info]: HEVC encoder version 2.3+32-1ed218717877
x265 [info]: build info [Windows][GCC 6.3.0][32 bit/64 bit] 8bit+10bit+12bit
x265 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX AVX2 FMA3 LZCNT BMI2

https://bitbucket.org/multicoreware/x265/commits/branch/default

LigH

12th April 2017, 13:13

CLI: informs if '--ssim-rd' is used;
Add dynamic rate-control reconfiguration (API only);
Improved sao implementation by limiting sao types:
--[no-]limit-sao Limit Sample Adaptive Offset types. Default disabled

x265 2.3+32-1ed218717877 (https://www.mediafire.com/file/3j72g2ed5pl9og0/x265_2.3%2B32-1ed218717877.7z)

x265 2.3+28-08a05ca9fd16 (http://www.mediafire.com/file/56aq5t6zhlvqv26/x265_2.3%2B28-08a05ca9fd16.7z) (belated)
_

Makes me wonder: Is it possibly to summarize in simple terms which SAO types get disabled when limited, and which tradeoff between quality and speed can be reported by the developers ... and whether this will change presets?

stax76

12th April 2017, 13:43

@LigH

the change summary helps, thanks.

@x265_Project

there is a issue with the docs: http://x265.readthedocs.io/en/latest/cli.html#cmdoption-limit-sao

avs/vpy reader would still be useful...

LigH

12th April 2017, 14:08

there is a issue with the docs: http://x265.readthedocs.io/en/latest/cli.html#cmdoption-limit-sao

Forwarded this to the developer mailinglist.

aymanalz

13th April 2017, 06:43

Improved sao implementation by limiting sao types:
--[no-]limit-sao Limit Sample Adaptive Offset types. Default disabled

Interesting, I wonder if this would address the complaints against SAO on this thread, namely that it causes too much blurring.

LigH

13th April 2017, 07:19

From the description in the online docs (adaptive early opt-outs), I would only expect some speed-up with rather little optical difference. But I did not yet test, have little spare time and only old CPU's.

NikosD

13th April 2017, 15:29

@all

I did earlier today a test comparison of latest 2.3+33 x265 compilation using MS VC 2017 AVX/AVX2 optimized binaries from here http://msystem.waw.pl/x265/ on my two systems - a Sandybridge and a Haswell.

I tested different SIMD architectures using these parameters --crf 24 --preset medium --tune grain --ssim --psnr --pme on a 4K sample.

Sandybridge and Haswell gained almost the same using SSE2fast (61% for Sandy and 72% for Haswell) and SSE4(4.2) (77% for Sandy and 88% for Haswell)

The gain is compared against the previous SIMD architecture starting from --no-asm and going like this:
NO ASM, MMX2, SSE2fast, SSSE3, SSE4.2, AVX, AVX2, FMA3, FMA3 LZCNT, FMA3 LZCNT BMI2

MMX2 and AVX gives ~0% for both Intel architectures (Sandy and Haswell)

SSSE3 gives ~13% for both and AVX2 gives only 18% for Haswell compared to SSE4.2 (AVX gives nothing)

So, first myth busted.

x265 doesn't gain a lot using AVX2, only 18% on Haswell compared to SSE4.2

The huge advantage is using SSE2fast and even more using SSE4.2

Now, using FMA3 the performance drops to the level of SSE4.2.
It looses any gains of AVX2.

When adding LZCNT and BMI2 changes nothing.

ATTENTION to x265 developers.

You should put FMA3 SIMD architecture before AVX2, just like x264 does, because it drops performance a lot.
Or maybe remove it completely.

Using LZCNT and BMI2 add nothing to FMA3 performance.

@all

Please, confirm or deny my results because I think we have a major performance issue here.

brumsky

13th April 2017, 17:51

@NikosD

thank you for posting this information! I am going to run some of my own tests now!! ;)

nevcairiel

13th April 2017, 18:55

Now, using FMA3 the performance drops to the level of SSE4.2.
It looses any gains of AVX2.

I did a search through the sources, and I couldn't find a single function that uses FMA3, so it doesn't seem like much of a surprise if its not any faster.

As an additional point of information, the console output may list AVX2 first in the list, however if you set it to FMA3, it does not actually use AVX2.
It uses the logical implications that the CPUs implementing those usually full-fill, ie. a FMA3 CPU implements AVX and all the SSE's (and MMX, naturally), and thats what x265 uses. It only uses AVX2 if you tell it to use AVX2 (or don't tell it anything at all and let it use everything).

The "main" optimizations like MMX, SSE* and AVX/2 all work like that, they imply the ones that came before it, so if you set it to SSE4.1, it automatically activates everything from MMX to SSE4.1. FMA3/4 came after AVX, so they imply AVX - but they do not imply AVX2.

So sure, if they want to, they could re-order the listing of things to be in the same logical order, but its only a cosmetical issue only.
If you just run x265 without specifying simd features (which you should), there is no "slowdown" from FMA3.

To activate all options (ie. the default), you would need to specify "AVX2,FMA3,FMA4,XOP,LZCNT,BMI1,BMI2" - a bit uncomfortable to write out everytime. :) (Disclaimer: I do not know if BMI or LZCNT are ever used anywhere, but this is the full list of instructions the command line currently takes)
It looks like specifying options your CPU doesn't support might crash though, so care has to be taken.

NikosD

13th April 2017, 19:45

Yes, if by putting nothing next to x265 there is no drop in performance, then we are OK.

From my comparison tests both x265 and x264 should stop to AVX2 and remove completely FMA3, LZCNT and BMI2.

Using x264 I saw a small drop-off in performance using those instructions along with AVX2 compared to net AVX2.

So, for x264 you should definitely set explicitly --asm AVX2 and not let it to auto (nothing next to binary) because the performance drops.

Luckily we are writing to x265 thread and there is no drop in performance as you say using FMA3, LZXNT, BMI2.

Have you done the test to set --asm AVX2 using x265 and then auto (nothing next to x265) and see the difference ?

Better, null or worse ?

A reverse of FMA3 and AVX2 order would make things better or the removal of FMA3 and the other weird instructions would be best (if it drops performance)

nevcairiel

13th April 2017, 20:19

FMA3/4, LZCNT and BMI1 don't seem to have any functions, the only that exist are a few in BMI2 and a few in XOP. XOP was AMD only and Zen even dropped support for it again, so it can mostly be ignored entirely.
The BMI2 functions used are only 2 small functions, so their performance impact is probably negligble (could test --asm AVX2 vs --asm AVX2,BMI2).

Either way, all of those are very "special" instruction sets designed for a particular niche, and their performance advantage is often not very huge - but no reason not to use them if available. Hopefully the developers do test if they are faster on most major CPUs. :)
I only know how this works in FFmpeg, where such functions are typically tested on a variety of generations of CPUs when submitted as patches - I don't follow how development of x265 works for this.

Natty

13th April 2017, 23:44

so GCC 7.0 - SSE4.1 is the fastest ?

nevcairiel

14th April 2017, 00:00

so GCC 7.0 - SSE4.1 is the fastest ?

Personally I try to avoid pre-release compilers (which GCC 7 is).
People have also said MSVC is fater for them, fwiw, but I have not compared compilers

Generally, use whatever your CPU supports though, if it can do AVX(2), use that.

Dclose

15th April 2017, 12:38

There's posts lately about updates to 10-bit. I don't know if there were updates before that in the past months, but I've been trying it again after not using it for a while and am surprised at the results. 10-bit was always helpful particularly with color banding, but I avoided it in x265 because x265 is so slow.

Now trying it, it's like a slight layer of grease is removed from the screen and things are sharper. And, I'm not sure yet, but it may not have the big fps hit it had 6+ months ago.

Well, it is extremely slow on one of my computers for some reason. A 720p video is over 2fps using 10-bit. In my other desktop, 10-bit drops the fps to .5. Both are using updated Hybrid, both have similar components other than one has older (slower) hard drives which shouldn't matter at this low fps. Both have the same i7 3770k CPU and are doing 100% usage. The main difference is the faster one is Windows 8.1 64-bit and the slow one is Windows 7 32-bit. I don't know why that would matter, and only matter for 10-bit not for 8-bit, but something strange is happening with it when I turn 10-bit on. If anyone has ideas on that, I'm listening.

When is SSIM RD used? The Hybrid tooltip says it's only used for certain presets, and then says RD over 3. I swear checking the box on and off made a difference months ago when I tried it, but trying it now with most settings in x265 maxed out the file sizes come out the same whether it's on or off. I'm assuming it's not just a Hybrid error.

mariush

15th April 2017, 12:50

The 32bit version of x265 may lack some assembly optimizations that are present in the 64bit version of x265.

The 32bit version would also be more memory constrained, since it would be limited to 2-3 GB of memory... 64 bit versions wouldn't have such limitations.

qyot27

15th April 2017, 15:47

The 32bit version of x265 may lack some assembly optimizations that are present in the 64bit version of x265.
Close. It's actually that there is no support for high bit depth 32-bit builds, and doing so you have to turn the asm off completely. So 8-bit builds of x265 32-bit have assembly, >8-bit builds don't. 64-bit builds have assembly, regardless of bit depth.

https://github.com/videolan/x265/blob/master/source/CMakeLists.txt#L346
https://github.com/qyot27/mpv/blob/extra-new/DOCS/crosscompile-mingw-tedious.txt#L2406

Dclose

15th April 2017, 17:51

Close. It's actually that there is no support for high bit depth 32-bit builds, and doing so you have to turn the asm off completely. So 8-bit builds of x265 32-bit have assembly, >8-bit builds don't. 64-bit builds have assembly, regardless of bit depth.
It sounds like Windows 32-bit doesn't do ASM in 10-bit which means 10-bit that runs at over 2 fps on Windows 64-bit is going to run at .5 fps on Windows 32-bit.

Damn. And just when I was getting excited about using 10-bit again. Well, at least I know what the problem is now. Thanks, guys.

Regarding other things, I had posted about increasing the AQ Strength to get more detail. RDOQ2 instead of RDOQ1 lessened the need for that. Switching to RDOQ2 from 1 has helped fix various problems that I was trying to fix with band-aids like AQS and more PSY.

LigH

15th April 2017, 22:39

The reason not to implement assembler optimized routines in 32 bit x265 versions is that it would be a waste of development time. Bit depths >8 would require twice the amount of RAM compared to the 8 bit version (addressing 16 bit even if only 10 or 12 bits precision are relevant), and already the allocated memory for 8 bit depth may be too much to encode 1080p video with a 32 bit build, depending on the complexity (preset).