x265 HEVC Encoder [Archive] - Page 42

LigH

1st April 2015, 10:29

I did not yet try this software, instead I (personally) will try to rely on freeware in the future too; but for these users who prefer it simple and specific, it is a limited time offer to "purchase" it with a 100% discount, you will only have to register to obtain a bundle of an "MP4 to HEVC"* converter plus a DirectShow HEVC decoder.

*I guess the result will be HEVC video + supported audio again in an MP4 container. But I have seen a modified x265 including L-SMASH as MP4 multiplexer already. And I can multiplex raw HEVC into MP4 later too, with MP4Box or L-SMASH. If you can handle our known-good free converters (like MeGUI, Hybrid, StaxRip), you may have little need for "x265 HEVC Update".

NikosD

1st April 2015, 10:56

Since there is no UHDcode plugin HEVC decoder thread yet, I have evaluated the decoder in my thread.
For anyone interested it's here:
http://forum.doom9.org/showthread.php?p=1715668#post1715668

LigH

1st April 2015, 13:01

Enabling the STATIC_LINK_CRT cmake option does not (yet) generate a Windows executable with statically linked CRT. I believe it is just "prepared support" for when it will be implemented...

Kurtnoise

1st April 2015, 13:31

Until now, HEVC has been a technology for experts and hard-core video enthusiasts only. We have a fast, reliable HEVC decoder, called UHDcode, but we needed to find a way to make this available for everyone to try. x265 was available in source code form, but we didn't have an application available directly from the x265 team.

To make it easier for anyone to try x265 and experience the benefits of HEVC, today we're launching the x265 HEVC Upgrade (https://x265.com). We've created a new Windows 64 bit application that lets you convert MP4 files to HEVC. There is a new Basic Mode which anyone non-technical person should be able to use. In Advanced Mode we expose the full feature set of x265.

The x265 Encoder application is bundled with our UHDcode HEVC decoder, packaged as a Windows DirectShow filter; a plug-in for Windows Media Player. This upgrades Windows Media Player (64 bit) to play back HEVC video files (raw bitstreams or MP4 files with HEVC).

The MSRP of the x265 HEVC Upgrade is $29.95, but for a limited time you can download for free at www.x265.com (https://x265.com). We hope you find these applications valuable.
Activation does not work for me...I got an ERR_NETWORK message. Probably because I'm behind a proxy ?

1st April 2015, 16:23

Enabling the STATIC_LINK_CRT cmake option does not (yet) generate a Windows executable with statically linked CRT. I believe it is just "prepared support" for when it will be implemented...

You can unpack attached ma.7z to build folder and use "./ma.sh" for native toolchain or "./ma-c.sh" for cross-compile toolchain. It works in "build/ma" folder instead of "build/msys".

If it works you can patch cmake updating one file from cmake-mod.7z -- it removes default -O3 optimize option which is added at the end of compiler options and unable to specify any option below -O3.

grumpy

1st April 2015, 20:15

x265_Project

1st April 2015, 21:23

So partially crippled decoder, a limited x.265 GUI, an installer that messes with file association requires activation and God knows what else, and a cart system that is iffy and requires email registration. Also little information for where this is going.
Grumpy,
As you know, most consumers only have 8 bit capable graphics chips/drivers, and displays. True 10 bit capable systems tend to be rather expensive and uncommon. UHDcode automatically accelerates on AMD OpenCL capable graphics (latest generation), including both APUs and discrete GPUs. We're working to extend this acceleration to Intel and NVIDIA graphics. I'm not sure why we didn't fare so well in a head to head with LAV, but I suspect that the problem is in the DirectShow communication. Our DShow filter decodes frames and gets them ready for the player, which requests them as needed. It wasn't built for benchmarking (decode as fast as you can... don't wait for the player). I'm just speculating... I'll definitely check into this.

We'll be happy to address any concerns about how the x265 HEVC upgrade sets file associations. I'll take this up with our dev team.

What limitations are you referring to in the x265 Encoder UI?

Our cart system doesn't require email registration (it doesn't make you confirm your email address before it lets you get the product). You can put a fake email address in, and still get the product. However, you won't be able to log into the cart later to see your order or retrieve your key if you use a fake email address.

Our license activation server suffered some hiccups yesterday, and our team has taken care of it. We'll keep an eye on this. In the meantime, if anyone had a problem, PM me the email you used when you checked out, and I can check into it. I can send you another key if needed.

I think we've been fairly open about what this is, and isn't, and I'm happy to shed more light on where it's going. It's a useful ready-to-use implementation of our HEVC libraries. We're not heading down a path of competing with companies that build complete video encoding systems (our customers). We simply wanted to make x265 and UHDcode more accessible to the average person. If you're a power user who is already happy with our command-line interface, or one of the many applications that incorporates x265, this is probably not for you.

vivan

1st April 2015, 21:49

As you know, most consumers only have 8 bit capable graphics chips/drivers, and displays. True 10 bit capable systems tend to be rather expensive and uncommon.So what? 10-bit encoding benefits (perfect gradients) are obvious not only on 8 bit displays, but even on cheap 6-bit displays - as long as real dithering is used.

I'm not sure why we didn't fare so well in a head to head with LAV, but I suspect that the problem is in the DirectShow communication. Our DShow filter decodes frames and gets them ready for the player, which requests them as needed. It wasn't built for benchmarking (decode as fast as you can... don't wait for the player). I'm just speculating... I'll definitely check into this.If low perfomance was caused by DS stalling then CPU load would be low. Which isn't true.
Please don't say that you never tested your "fastest in the world" claim against the most popular decoder, this is just silly.

huhn

1st April 2015, 21:55

it doesn't matter if you don't have a 10 bit display. it is well know that encoding in 10 bit is more efficient and a common practice these days. even the nvidia 960 GTX can do ASIC 10 bit HEVC and of cause it can even public broadcast is 10 bit.

BTW. nearly all directx 10 GPUs can output 10 bit or more using DP/HDMI by using a directx fullscreen surface.

MeteorRain

2nd April 2015, 01:48

Grumpy,
As you know, most consumers only have 8 bit capable graphics chips/drivers, and displays. True 10 bit capable systems tend to be rather expensive and uncommon.

Simple thing. Japan UHD broadcasting is HEVC 10 bit. Period.

benwaggoner

2nd April 2015, 02:27

Simple thing. Japan UHD broadcasting is HEVC 10 bit. Period.
And AFAIK, all UHD Smart TVs support 10-bit decode. At least all the ones I've been involved with.

Motenai Yoda

2nd April 2015, 03:42

So what? 10-bit encoding benefits (perfect gradients).
10bit don't assures you to get "perfect gradients", is only much less prone to visible banding at sane bitrate/crf (eg under 30).

@LigH have you tried to compile ffmpeg as shared to get x265's dll?
also which ver do you use, 3.5 doesn't work at all for me.

@x265_Project I noticed crf 22 is much faster than others values it's right?
and the -f (-f/--frames) switch seems do nothing, but --frames works.

Ajvar

2nd April 2015, 05:09

If you want 10 bit then HEVC Upgrade is for differen't audience. It's for those who doesn't see anything wrong with "convert mp4 to HEVC" albeit still that phrase should be changed I think.
Those who got it for free got it for feedback purpose but it should be constructive.

LigH

2nd April 2015, 07:40

@LigH have you tried to compile ffmpeg as shared to get x265's dll?

I won't build x265 regularly with GCC 4.9.2 via media-autobuild_suite, too many dependencies which could update to a failing state.

also which ver do you use, 3.5 doesn't work at all for me.

Which version of ... what? :confused: I don't remember anything having version 3.5 at the moment.

burfadel

2nd April 2015, 08:41

I mentioned something along this line in the Staxrip thread...

I was thinking about the --merange, and how the default is 57. This makes sense with UHD material, but if you are encoding lower resolution material, this value is 'too high' for standard encoding. Effectively, the lower the resolution, proportionally the higher the merange of the image as a whole, so such a value becomes pointless at lower resolutions. Even at 1080P, 24 seems good with hardly an efficiency loss, and is signficantly faster. If encoding 720P or even 480P like many people, having a merange of 24 is very beneficial in terms of speed, and on the faster presets even a value of 16 would probably make sense at 480P or lower.

I think an effective 'solution' to this would be an auto option as default. Under auto, it would encode at say, 16 range at less than 480P source material, 24 range at up to 1080P, then scale it towards 57 up to UHD material. Maybe even re-evaluating 57 as a target range at UHD under the standard preset. I hope that makes sense! Just I don't see a 'one-size-fits-all' as appropriate since a merange of 57 at 720x480P is considerably greater than 57 at 3840x2160.

Ajvar

2nd April 2015, 15:33

@burfadel, if I recall correct devs increased ME-range on faster presets because it didn't affect speeds.

littlepox

2nd April 2015, 17:42

burfadel

2nd April 2015, 21:54

Agreed on the criticism about me_range. I'm currently using me_range=25 for 1080p, which is about 2% larger in size, but with 30% speed improvement, and I don't see any noticeable visual quality loss.

However, --rect and --amp are the two parameters I believe to be least useful. they double your encoding time with zero gain; I don't see any change for the output, neither quality-wise nor size-wise. Maybe they bring you slightly better psnr/ssim, whatever.

Exactly my point :). A setting of 57 would fall more in the very slow preset range, at least at 1080P. I can understand it at 2160P, but without native (not upscaled) source material at that resolution I can't guarantee that. In any case, for 'normal' preset the difference in speed doesn't warrant the performance decrease.

Also try using the following settings (rest as default, except for including your --merange 25), and see what you think versus standard settings.

--tu-inter-depth 2
--b-intra
--rc-lookahead 80
--max-merge 5
--weightb
--aq-mode 2
--nr-intra 400
--nr-inter 400
--bframes 5
--ref 5

Interestingly enough, --fast-intra is fractionally slower, at least for me, than not using it at all...

I believe the noise reduction noticeably reduces output size, doesn't really affect encoding speed, and doesn't affect the visual quality noticeably (especially at 400). I believe it removes 'high-level' noise from the picture? This of course will affect SSIM since it's a change from the original picture, but if lowering the SSIM is always bad, then psy-rd is bad (which it isn't) ;).

x265_Project

2nd April 2015, 21:58

x265 1.6 has been released. The changes from the 1.5 release are mostly performance oriented, with heavy improvements for AVX2 capable platforms (Haswell and later Intel CPUs) and work efficiency improvements for multiple-socket machines.

= API changes =

--threads N replaced by --pools N,N and --lookahead-slices N --[no-]rdoq-level N - finer control over RDOQ effort --min-cu-size N - trade-off compression for performance --max-tu-size N - trade-off compression for performance --[no-]temporal-layers - code unreferenced B frames in temporal layer 1 --[no-]cip aliases added for --[no-]constrained-intra

Added support for new color transfer functions "smpte-st-2084" and "smpte-st-428

--limit-refs N was added, but not yet implemented

Deprecated x265_setup_primitives() was removed from the public API and is no longer exported DLLs

See the online documentation for full descriptions:

http://x265.readthedocs.org/en/1.6/

= Threading changes =

The x265 thread pool has been made NUMA aware. The --threads parameter, which used to specify a global pool size, has been replaced with a --pools parameter which allows you to specify a pool size per NUMA node (aka CPU socket or package). The default is still to allocate one pool worker thread per logical core on the machine, but with --pools one can isolate those threads to a given socket.

Other than socket isolation, the biggest visible change in the NUMA aware thread pools is the increase in work efficiency. The total utilization will generally decrease but the performance will increase since worker threads spend less time context switching. Also, the threading of the lookahead was made more work-efficient. Each lookahead job is a much larger piece of work.

Before (1.5):

disable thread pool: --threads 1
default thread pool: --threads 0
restrict to 4 threads: --threads 4

After (1.6):

disable thread pools: --pools 0
default thread pools: --pools *
restrict to 4 threads: --pools 4
restrict to 4 threads on socket 1: --pools -,4 restrict to all threads on socket 0: --pools +,-

= Multi-lib interface =

In order to support runtime selection of a libx265 shared library, we have introduced an x265_api structure and an x265_api_get() function.
Applications which use this interface to acquire the libx265 functional interface will be able to use shim libraries to bind a particular build of libx265 at run time. See the API documentation for full details.

stax76

3rd April 2015, 16:33

I think an effective 'solution' to this would be an auto option as default. Under auto, it would encode at say, 16 range at less than 480P source material, 24 range at up to 1080P, then scale it towards 57 up to UHD material. Maybe even re-evaluating 57 as a target range at UHD under the standard preset. I hope that makes sense! Just I don't see a 'one-size-fits-all' as appropriate since a merange of 57 at 720x480P is considerably greater than 57 at 3840x2160.

I could add it like so:

Property MErange As New NumParam With {
.Switch = "--merange",
.Text = "ME Range (0=auto):",
.Help = "Motion search range. The default is derived from the default CTU size (64) minus the luma interpolation half-length (4) minus maximum subpel distance (2) minus one extra pixel just in case the hex search method is used. If the search range were any larger than this, another CTU row of latency would be required for reference frames. Range of values: an integer from 0 to 32768.",
.MinMaxStep = {0, 32768, 1},
.ArgsFunc = Function() If(MErange.Value = 0, " --merange " & CInt(Calc.GetYFromTwoPointForm(480, 16, 2160, 57, p.TargetHeight)), If(MErange.Value <> MErange.defaultvalue, " --merange " & CInt(MErange.Value), ""))}

This would behave exactly like before except when you enter 0 it would use your suggested calculation:

Calc.GetYFromTwoPointForm(480, 16, 2160, 57, p.TargetHeight)

x265_Project

3rd April 2015, 17:03

I could add it like so:

Property MErange As New NumParam With {
.Switch = "--merange",
.Text = "ME Range (0=auto):",
.Help = "Motion search range. The default is derived from the default CTU size (64) minus the luma interpolation half-length (4) minus maximum subpel distance (2) minus one extra pixel just in case the hex search method is used. If the search range were any larger than this, another CTU row of latency would be required for reference frames. Range of values: an integer from 0 to 32768.",
.MinMaxStep = {0, 32768, 1},
.ArgsFunc = Function() If(MErange.Value = 0, " --merange " & CInt(Calc.GetYFromTwoPointForm(480, 16, 2160, 57, p.TargetHeight)), If(MErange.Value <> MErange.defaultvalue, " --merange " & CInt(MErange.Value), ""))}

This would behave exactly like before except when you enter 0 it would use your suggested calculation:

Calc.GetYFromTwoPointForm(480, 16, 2160, 57, p.TargetHeight)
First, we've got one more week before we head to the NAB show in Las Vegas, so we're pretty busy, focused on finishing up certain things that we're working on. So thanks for your patience if I don't read/respond to every thread every day.

We're certainly willing to look at this. There are 2 schools of thought on dynamic/variable values. One group would say you should never do this, because the user won't know (remember) what setting they would be getting. The other group (me included) would say that it's OK - do whatever it takes to automatically get the best speed vs quality at any preset. Besides, the user can always manually set any option.

Feel free to share any test data you have on the effects on performance and quality when you vary merange at each picture size. The recent change in merange was prompted by testing a large batch of 1080P clips, where we noticed that merange 57 had slightly better quality with no tradeoff in speed. But we can never do as much testing as the whole x265 community, and as an open source project, we love the benefit of being able to extend our own R&D by crowd-sourcing some of the tuning / performance presets. This feedback is very valuable.

Atak_Snajpera

3rd April 2015, 17:28

Under auto, it would encode at say, 16 range at less than 480P source material, 24 range at up to 1080P, then scale it towards 57 up to UHD material. Maybe even re-evaluating 57 as a target range at UHD under the standard preset. I hope that makes sense!
Ok what you say makes sense but why exactly 24 for 1920x1080? I think "correct" value would be 28.

In delphi code I would use this simple formula
x265_auto_merange := Trunc( Height * Width / (3840 * 2160) * 57 * 2 );

[1.77:1] 3840x2160 => 57
[1.85:1] 3840x2076 => 54
[2.35:1] 3840x1632 => 43
[2.40:1] 3840x1600 => 42
[1.33:1] 2880x2160 => 42

[1.77:1] 1920x1080 => 28
[1.85:1] 1920x1038 => 27
[2.35:1] 1920x816 => 21
[2.40:1] 1920x800 => 21
[1.33:1] 1440x1080 => 21

[1.77:1] 1280x720 => 12
[1.85:1] 1280x692 => 12
[2.35:1] 1280x544 => 9
[2.40:1] 1280x534 => 9
[1.33:1] 960x720 => 9

Ofcourse the question is should we calculate this on total number of pixels or just on width or height?

x265_Project

3rd April 2015, 19:08

Ok what you say makes sense but why exactly 24 for 1920x1080? I think "correct" value would be 28.

In delphi code I would use this simple formula
x265_auto_merange := Trunc( Height * Width / (3840 * 2160) * 57 * 2 );

[1.77:1] 3840x2160 => 57
[1.85:1] 3840x2076 => 54
[2.35:1] 3840x1632 => 43
[2.40:1] 3840x1600 => 42
[1.33:1] 2880x2160 => 42

[1.77:1] 1920x1080 => 28
[1.85:1] 1920x1038 => 27
[2.35:1] 1920x816 => 21
[2.40:1] 1920x800 => 21
[1.33:1] 1440x1080 => 21

[1.77:1] 1280x720 => 12
[1.85:1] 1280x692 => 12
[2.35:1] 1280x544 => 9
[2.40:1] 1280x534 => 9
[1.33:1] 960x720 => 9

Ofcourse the question is should we calculate this on total number of pixels or just on width or height?
The ideal merange setting can't be calculated theoretically. An optimization like this should be based on lots of actual test data, showing the ideal setting given the speed vs. quality tradeoff at the affected performance presets (presumably, only at faster presets). A large basket of test content (with a variety of content types) would need to be used at every picture size.

burfadel

4th April 2015, 05:42

My observations of merange was kind of like littlepox's, it did show a very noticeable speed difference with a lower value without much impact on the file size. Mind you, it could settings based as well, and where other bottlenecks exist etc.

My observations were based on the following settings, which were based on the standard preset.
--tu-inter-depth 2
--b-intra
--rc-lookahead 60
--max-merge 5
--aq-mode 2
--nr-intra 400
--nr-inter 400
--bframes 5
--ref 5

The --tu-inter-depth 2 gave small gains with negligible performance loss, this was not the case with --tu-intra-depth 2. So, I used 2 for the tu-inter-depth, and the standard 1 for the tu-intra-depth. I used a rc-lookahead value of 60 (mistakenly thought it was 80 before this edit), because it seemed to provide some benefit over 20. I would have presumed the rc-lookahead should be at least a little higher than the minimum keyframe interval?...

The max-merge of 5 (maximum) was because again it showed some improvement with negligible performance loss, although this may not be the case at higher resolutions. Reference frames and b-frames are at 5 because that seems to be the ideal speed/file size tradeoff numbers for me. I noticed the number of consecutive b-frames in the different test clips were reasonable numbers up until 5, after which the percentages were very low (such as 1.5 percent), so 5 became my preferred setting. In reading the stats, I realise the first number is actually '0' consecutive b-frames, as the first b-frame isn't consecutive as it has nothing to be consecutive to. Therefore, there are six numbers when consecutive B-frames is set to 5 :) (just in case people aren't aware of that).

The last three settings I chose all affect SSIM, so using SSIM to determine whether these are any good doesn't really mean much. The aq-mode 2 is a little harder to quantify since it's purely a perceptive thing, I believe it is beneficial over 1. This has been improved upon in x264 recently with the addition of -aq-mode 3. This is the same as --aq-mode 2, but biases dark scenes. I would probably choose -aq-mode 3 if it was available!

The noise reduction settings are a real impact on SSIM, since they purposefully 'change' the picture. Having 400 set for both (for inter and intra) is because it seems to reduce the final file size a noticeable amount, and perceptively it doesn't affect the picture. Of course, that may be different for person to person. A figure higher than 400 gives diminishing file size benefits. Probably a key thing to take away from the noise reduction is whether you can actually get a better image at the same file size, since you can lower the CRF (the lower the CRF the better the quality) to make up for the file size difference!

I've dropped --weightb from my earlier decision (default is disabled anyway) as it turned out it wasn't beneficial.

So, that's why I've chosen the settings I've chosen, with good speed vs quality in mind.

Motenai Yoda

4th April 2015, 14:01

x265_1.6+64-335c728bbd62_x64_gcc4.9.2 (UPX-EXE) (http://www.mediafire.com/download/3br6idwnwvxxfcv/x265_1.6+64-335c728bbd62_x64_gcc4.9.2.zip)

benwaggoner

4th April 2015, 19:41

it doesn't matter if you don't have a 10 bit display. it is well know that encoding in 10 bit is more efficient and a common practice these days. even the nvidia 960 GTX can do ASIC 10 bit HEVC and of cause it can even public broadcast is 10 bit.
Do we actually know that HEVC is more efficient at 10-bit than 8-bit? We know H.264 and x265 are more efficient coding 8-bit content in 10-bit mode since the higher precision quantization yield smaller rounding errors and you wind up with fewer "1" values scattered in the lower right of the quant tables.

But I don't know if that's been demonstrated with HEVC in general. I know that x265 was less efficient in 10-bit than 8-bit mode a while back, although that gap has at least closed.

BTW. nearly all directx 10 GPUs can output 10 bit or more using DP/HDMI by using a directx fullscreen surface.
The tricky part is figuring out which players will actually DO that. The most reliable way I've found to get 10-bit output is to use the Adobe Creative Suite products out to a monitor with 10-bit DisplayPort support. However, the Adobe products require OpenGL to do 10-bit output, and with NVidia 10-bit OpenGL requires a Quadro GPU.

Does anyone have a good mechanism to get 10-bit HEVC playback to a 10-bit screen in all 10 bits?

huhn

4th April 2015, 20:02

Do we actually know that HEVC is more efficient at 10-bit than 8-bit? We know H.264 and x265 are more efficient coding 8-bit content in 10-bit mode since the higher precision quantization yield smaller rounding errors and you wind up with fewer "1" values scattered in the lower right of the quant tables.

But I don't know if that's been demonstrated with HEVC in general. I know that x265 was less efficient in 10-bit than 8-bit mode a while back, although that gap has at least closed.
this here is an old test which said clearly yes 10 bit is way more efficient:
http://forum.doom9.org/showpost.php?p=1691801&postcount=45

examples:
http://abload.de/img/70_5s_10drool.png
http://abload.de/img/70_5s_8c9ju6.png
http://abload.de/img/70_orgcmslk.png

The tricky part is figuring out which players will actually DO that. The most reliable way I've found to get 10-bit output is to use the Adobe Creative Suite products out to a monitor with 10-bit DisplayPort support. However, the Adobe products require OpenGL to do 10-bit output, and with NVidia 10-bit OpenGL requires a Quadro GPU.

Does anyone have a good mechanism to get 10-bit HEVC playback to a 10-bit screen in all 10 bits?
MPDN can send 8, 10 and 16 bit to the GPU driver. i don't have a screen with 10 bit so i can't make sure if it works.

http://forum.doom9.org/showthread.php?t=171120

and in the end we get float point data with a YCbCr source so we need to dither this data down anyway. so a 8 bit panel is not that bad to judge such things.

sneaker_ger

4th April 2015, 20:11

There's also this on 8 bit vs 10 bit:
http://phenix.it-sudparis.eu/jct/doc_end_user/current_document.php?id=7417

How much this holds true for current x265 I cannot say.

benwaggoner

4th April 2015, 20:53

There's also this on 8 bit vs 10 bit:

http://phenix.it-sudparis.eu/jct/doc_end_user/current_document.php?id=7417

How much this holds true for current x265 I cannot say.

I don't know how relevant a 10-bit PSNR is in this comparison. It'll catch lots of invisible differences, since only 25% of 8-bit values have a matching 10-bit value.

LazyNcoder

4th April 2015, 21:00

Not much related to what is going on here, but I did some test and here is the results:

x265 [info]: HEVC encoder version 1.5
x265 [info]: build info [Windows][GCC 4.9.0][64 bit] 8bpp
x265 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2

--pmode 1 --pme 1 --threads 16 --frame-threads 16
average encoding speed: 8.823878 fps

--pmode 1 --frame-threads 16
average encoding speed: 7.477486 fps

x265 [info]: HEVC encoder version 1.6
x265 [info]: build info [Windows][GCC 4.9.0][64 bit] 8bpp
x265 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2

--pmode 1 --pme 1 --frame-threads 16
average encoding speed: 10.062576 fps

--pmode 1 --frame-threads 16
average encoding speed: 10.709824 fps

Seems like --pme is slowing me down now in v1.6 :confused:

benwaggoner

4th April 2015, 21:05

Seems like --pme is slowing me down now in v1.6 :confused:

Totally expected for normal use cases. You need a high threads/rows ratio before --pme starts being helpful. I think I found an improvement doing 384x288 with 16 threads once, but it wasn't huge.

Ajvar

4th April 2015, 21:37

benwaggoner

4th April 2015, 21:44

Why didn't you use --pools 16 in the 1.6 version?

For myself, I use pools to specify which socket to use, and use --pools "+,-" and --pools "-,+" for a dual-socket system.

Also, why the super-high frame-threads? That's also impacting performance, and is way above optimal for speed/perf.

sneaker_ger

4th April 2015, 21:46

He has two CPUs and probably does not want to use "--pools 16" because it would only allocate one pool if I understand correctly. Most interesting would be how fast default settings would be.

For myself, I use pools to specify which socket to use, and use --pools "+,-" and --pools "-,+" for a dual-socket system.
That's not how I understand the docs.

http://x265.readthedocs.org/en/latest/cli.html#cmdoption--pools

/edit:
Or do you actually want to use only a single socket at a time?

BadFrame

4th April 2015, 22:08

x265 1.6 has been released.

Looks like another great release.

I sure don't envy you guys, a lot of people seem to think x264 became as good as it is today in no time at all (memory sure is subjective), it took many many years (it's over ten years old if I recall correctly) for it to reach what is now the pinnacle of what h264 can offer, x265 is an infant project by comparison (what is it, two years?).

kolak

4th April 2015, 23:12

vivan

4th April 2015, 23:18

and in the end we get float point data with a YCbCr source so we need to dither this data down anyway. so a 8 bit panel is not that bad to judge such things.Also 10 bit YCbCr holds only ~9 bit of valid RGB data (for TV range BT.709 only 172122707 values are valid (result in RGB in 0..1 range), log2 (values) / 3 = 9.12).
Also chroma is subsampled, 2x2 block gives 4 values (+2 bits of precision). But luma is not, so 2*2/3 = +4/3 bit of display precision.
So when you don't scale 8 bit display is enough to cover all unique 10-bit 4:2:0 values (they give unique result or something).

But once you add floating point conversions (that had to be performed), chroma scaling (and image scaling) and, most importantly, dithering - it stops to make any sense to compare video and display bitdepth.
Higher display bitdepth (refresh rate and resolution) gives less noise. E.g. 2160p 8-bit display should be comparable to 10-bit 1080p display.
Higher video bitdepth gives better gradients. People that know what they are doing (in contrary to the professionals that don't and get (http://i.imgur.com/rZV2Aod.png) payed (http://i.imgur.com/PkHiQaZ.png) for (http://i.imgur.com/Ksrb443.png) doing (http://i.imgur.com/2Xn8acd.png) this (http://i.imgur.com/InppoFB.png)) proved it long time ago and only now we're starting to get 10-bit HW decoders.

x265 is an infant project by comparison (what is it, two years?).Don't forget that they took all the x264 experience/codebase and also x264 name (for all the false (http://forum.doom9.org/showpost.php?p=1677487&postcount=635) advertising (https://x265.com/) they do to their commerical consumers).

Motenai Yoda

5th April 2015, 00:10

Also 10 bit YCbCr holds only ~9 bit of valid RGB data (for TV range BT.709 only 172122707 values are valid (result in RGB in 0..1 range), log2 (values) / 3 = 9.12).
Err, if I'm not wrong that "9.12" should be the minimum number of bits to display all those values, so it isn't "10 bit YCbCr holds only ~9 bit of valid RGB data" but "To holds all valid RGB data in YCbCr at least > 9.12 bits are needed" (w/o put into that should be added ~0.66 bit cos 16/235 restrictions).
Also I don't get the chroma downsample part (booth of them are 420, and bitdepth or reproducible values aren't related to chroma downscaling).

And yep a 10bit1080p and a 8bit2160p display can be comparable but only regarding how many colors can be archieved (10bit vs 8bit+dither), a 2160p display is better on many other aspects.
Finally those banded pics show only the fail of not to dither when converting to 8 bit.

I don't know how relevant a 10-bit PSNR is in this comparison. It'll catch lots of invisible differences, since only 25% of 8-bit values have a matching 10-bit value.
Imho psnr is too sensible to grain/noise and not optimal to valutate a dithered picture vs not dithered, they should at least dedither first (or use ssim).

Nevilne

5th April 2015, 00:19

professionals that don't and get (http://i.imgur.com/rZV2Aod.png) payed (http://i.imgur.com/PkHiQaZ.png) for (http://i.imgur.com/Ksrb443.png) doing (http://i.imgur.com/2Xn8acd.png) this (http://i.imgur.com/InppoFB.png)

Is that Crunchyroll stuff? lol.

vivan

5th April 2015, 01:25

Err, if I'm not wrong that "9.12" should be the minimum number of bits to display all those values, so it isn't "10 bit YCbCr holds only ~9 bit of valid RGB data" but "To holds all valid RGB data in YCbCr at least > 9.12 bits are needed" (w/o put into that should be added ~0.66 bit cos 16/235 restrictions).Well, yes, but since videos are supposed to be shown on RGB displays going off range is wasting bitrate...
I'm not sure were you got 0.66 bits, log2 (880 / 1024) = -0.22.
With fullrange 10-bit I got 259728971, which is around 9.32 bpp.

Also I don't get the chroma downsample part (booth of them are 420, and bitdepth or reproducible values aren't related to chroma downscaling).It's dithering (maybe even without it just bilinear upsampling will do something similar) and like 1080p @ 10 -> 2160p @ 8 analogy. You have 2x2 pixel block that have do display 1 sample. It could display (in a unique way) sample with 2 bit higher bitdepth.

Finally those banded pics show only the fail of not to dither when converting to 8 bit.They just show terrible production and encoding that professionals (crunchyroll) do. Guess what other professionals (funimation) do? They serve fullrange (http://i.imgur.com/u4wFb4p.png) videos tagged as limited range (http://i.imgur.com/dy4BIOn.png), which, of course, screws even their own players :rolleyes:
With proper encoding banding is less severe, but still quite noticeable. You can get rid of it only by adding noise, which inflates bitrate.

MeteorRain

5th April 2015, 01:38

Yes, but x264 code is out there and can be 'applied' to x265.
Can you demonstrate how to just "apply" that? And then we all can save years on developing x265.

:script:

vivan

5th April 2015, 01:47

Can you demonstrate how to just "apply" that? And then we all can save years on developing x265.

:script:While it's not "just", inventing and porting are completely different things. Psy, aq, etc. - even other H.264 encoders are terrible at them.

MeteorRain

5th April 2015, 10:04

While it's not "just", inventing and porting are completely different things. Psy, aq, etc. - even other H.264 encoders are terrible at them.

True. But that still takes time. So many commits on the repo everyday and I believe they will (eventually) port these to x265 when they are happy with the speed optimization.;)

Ajvar

5th April 2015, 10:47

I would be happy with opencl on Nvidia 6xx series and Ivy processor but that doesn't happen close time.

Anyone else is worried about where is Ma? His latest build is 1.6+3 and which is more important he misses oportunity with bench comparison of 1.6 vs 1.5 on AVX CPU. That is disturbing.

@x265, if you want to become real popular then create fancy benchmark for all geeks who have powerfull hardware which isn't nearly loaded by games. Introduce X265 BMARK. Fire around those letters.
Important rules:
- results must be consistent on same machine;
- database with online graphs where someone already has insane results (but possible to outperform on heavily OCed 6-core 4770K).
Not a joke. 4 minute test and at the end user watches played video which was encoded for that time (longer=better score) and at the beginning user can click button to play original video (at insane mbps, 30-60 fps, preferably PC game scenes in one with epic dragon teaser at the end and lighting X265 letters so users would want to get better score to see 2nd half which may be show only as score result.
And 2 modes: 360p at ultralow bitrate but still better than x264 comparison and 720p with good bitrate OR 240p and 480p.

kolak

5th April 2015, 11:55

Yes, but x264 code is out there and can be 'applied' to x265. Still disappointed about x265 and grain/details retention.

Yes, it's better at very low bitrates, but if we think about 4K Blu-ray it may end up that using x264 may give better quality, which would be 'crazy ':)

Not a programmer, but you can apply the same ideas or adjust them. Saves years of development- exactly. Looks like many ideas have been already adjusted and put into x256, but even so it's disappointing in some scenarios.

How do you explain this:

http://forum.doom9.org/showpost.php?p=1716059&postcount=159

other than disappointing?

MeteorRain

5th April 2015, 12:36

Not a programmer, but you can apply the same ideas or adjust them. Saves years of development- exactly. Looks like many ideas have been already adjusted and put into x256

No. That only saves you years of thinking and testing, not developing. Hugh differences are there between these two, especially on the code design. Adapting all the ideas takes time, bro.

huhn

5th April 2015, 12:55

How do you explain this:

http://forum.doom9.org/showpost.php?p=1716059&postcount=159

other than disappointing?
a bug? not yet good optimized?
the x265 decoder is simply not perfect yet

kolak

5th April 2015, 13:20

No. That only saves you years of thinking and testing, not developing. Hugh differences are there between these two, especially on the code design. Adapting all the ideas takes time, bro.

Yes, but at least half job is done, so it's easier than starting from scratch.
Even so x265 is very disappointing in preserving details and grain.

20mbit to have same details as 5mbit x264? Looks like room for improvements is huuuuge :)
H265 is designed to “make money“, not deliver good quality, but this is standard these days :)
4K Blu-ray will delivery relatively waaay worse video than current HD Blu-ray. At the moment it's not looking promising at all for those who want to watch movies at very good quality at home. If 4K BD standard would be released today I don't see a single h265 encoder which could deliver reference encodes- you would be way better with x264 :) This is not about x265 itself, but h265 standard overall.

huhn

5th April 2015, 13:53

it's not like all BD are created with x264. there is a high chance they are created with encoder not as powerful as x264 in the first place.

and now the next thing. yeah sneaker_ger found some serious issue with 1000 5000 and 20000 but even 20000 is low for a BD.
and as a reminder: http://forum.doom9.org/showpost.php?p=1715578&postcount=134

"It becomes harder and harder to find places were x265 loses against x264. I actively have to search for them now instead of just posting random screenshots. One remaining problem is some larger blocks turning flat."

x265 doing pretty darn well for the time it exist and they have more than 3 month to fix this issue