x264 development [Archive] - Page 42

Selur

21st December 2014, 22:20

So for now you can use builds made by komisar or others.
I wrote:
in regard to Mac OS X binaries?
didn't know that you and/or komisar provide Mac OS X builds (+ can't find any )

MasterNobody

21st December 2014, 23:39

Sorry, I don't know anyone building Mac OS X builds. This info was for all people who can come to this topic with question where to get build (and most of those who have such questions are Windows users because Linux users usually don't have problem to build it themself) and why there none at videolan site.

the_weirdo

22nd December 2014, 04:32

and also avoiding message like "Programm cannot start becase avcodec-56.dll is not found" :confused:

"--enable-static --disable-shared". However, I think they should be on by default. If your build configure has "--enable-shared" and/or "--disable-static", remove them.

Selur

22nd December 2014, 06:47

... because Linux users usually don't have problem to build it themself) and why there none at videolan site.
Ehmm,.. there were static up-to-date builds for: freebsd10-x86_64, freebsd9-x86_64, linux-i386, linux-x86_64, macosx-x86-64, win32, win64
over at videolan, not just Windows builds.

the_weirdo

22nd December 2014, 10:02

Ehmm,.. there were static up-to-date builds for: freebsd10-x86_64, freebsd9-x86_64, linux-i386, linux-x86_64, macosx-x86-64, win32, win64
over at videolan, not just Windows builds.

I think you've misunderstood what he meant. He meant if there's someone who comes to ask where to get the builds and why there's none at videolan site then his previous post has the answer for them. Those people are likely Windows users, because "Linux users usually don't have problem to build it themself" if they couldn't find the builds. He didn't mean there're only Windows builds over at videolan.

Selur

22nd December 2014, 14:24

You are right, seems like I misunderstood that. :)

Selur

23rd December 2014, 09:26

btw. win32 got updated over at http://download.videolan.org/x264/binaries/ (hoping win64 and mac will follow soon :))
-> they now went back to an older version of the folder :/

MasterNobody

25th February 2015, 19:36

A little bit late (sorry) but official buildbot again working and the latest version (r2538-121396c) is already uploaded to http://download.videolan.org/x264/binaries/

Selur

25th February 2015, 21:00

Thanks for the info!

tfouto

4th March 2015, 18:29

The anime preset for encoding h264, should be used for all CGI movies like pixar? Or it's more of flat 2d anime?

For movies of pixar, should i use anime or film preset?

Many thanks

Meanwhile i found the answer.

http://forum.doom9.org/showthread.php?p=1682913#post1682913

jpsdr

27th March 2015, 18:22

Here part of OreAQ code :

get_image_mb( h, mb_x, mb_y, frame, PARAM_INTERLACED, &luma, &bluePoint );
x264_emms();

_energy_y = X264_MAX( energy[0], 1 );
_energy_uv = X264_MAX( energy[1], 1 );

h->rc->aq3_threshold = logf( powf( h->param.rc.f_aq3_sensitivity, 4 ) / 2.0 );
// logf(energy) = 1.0397 * x264_log2( energy ) / 1.5
// energy_y = 1.2 * (logf(_energy_y ) - ((h->rc->aq3_threshold + 2*(BIT_DEPTH-8)) * .91) + 0.5);
// energy_uv = 0.8 * (logf(_energy_uv) - ((h->rc->aq3_threshold + 2*(BIT_DEPTH-8)) * .91) + 0.5);
energy_y = 0.83176f * x264_log2( _energy_y ) - ((h->rc->aq3_threshold + 2*(BIT_DEPTH-8)) * 1.092f) + 0.5f;
energy_uv = 0.55451f * x264_log2( _energy_uv ) - ((h->rc->aq3_threshold + 2*(BIT_DEPTH-8)) * 0.728f) + 0.5f;
f_qp_adj = 0.f;

if( luma > h->param.rc.i_aq3_boundary[0] )
{
// *** Bright ***
// Y & UV Flat -> qp up
// Y Flat / UV Bump -> qp y up
// Y Bump / UV Flat -> even
// Y & UV Bump -> qp down
mode = 0x00;

// qp up
if( !bluePoint && energy_y < 0 && energy_uv < 0 )
f_qp_adj = X264_MIN( energy_y, energy_uv );
// qp y up
else if( !bluePoint && energy_y < 0 && energy_uv >= 0 )
f_qp_adj = energy_y;
// qp down
else if( energy_y >= 0 && energy_uv >= 0 )
f_qp_adj = X264_MAX( energy_y, energy_uv ) * 0.5f;
}

energy is result of x264_ac_energy_mb.

I found the idea interesting, but i think it's missing a "middle" case, where nothing is done (not flat, but not bumpy enough).
Here it's using some formula with 0 threshold, i want to change it to 2 threshold.

So, my question for experienced dev. : What are standard/classical/average values limit for energy[0] and energy[1] can be used to limit "flat" and "bumpy" ?
Here they are limited, but, what are the value without limits ?

:thanks:

Edit :
Maybe i can, from "threshold" parameter, create 2 threshold values (high and low), for exemple +/-25%, and with these, calul and "high" and "low" value of energy_y and energy_uv.

Dan203

21st July 2015, 00:01

Does anyone here know if you can build x264 in Visual Studio 2015 without the Intel compiler? Apparently VS2015 adds C90 support and a little C99, and it's now possible to build ffmpeg with VS2015 so I was wondering if anyone had tried building x264 standalone using VS2015 yet? We have a commercial license and are currently build using VS2010 and the Intel compiler. But we're looking at upgrading to VS2015 and we're wondering if we need the Intel compiler any more?

MasterNobody

21st July 2015, 06:06

It should be possible to build x264 with MSVS 2013 Update 2 (http://git.videolan.org/gitweb.cgi?p=x264.git;a=commit;h=6fbbb5b0c05a1d95cbd6efa7f01808ea87a39dc9) or newer (but I don't tested Visual Studio 2015 yet if they broke something or not).

jpsdr

17th September 2015, 20:20

Thanks a lot! I think it's a very useful tutorial.
Here is a 7mod x264 version which is very similar to tmod and still update with offical git.
https://github.com/Freecom/x264
Thanks. The potentiel interesting things i've noticed and are not included in the t_mod are the k-means and the VBV patch of MasterNobody. This last one is from 2014, so i'm wondering why it's still not put in standard release.
I'll work on this to include them on my t_mod branch on github. Could be interesting, but i'm wondering if k-means is Blu-Ray compliant, not caring of buggy chipset...

jpsdr

18th September 2015, 10:08

Is there somewhere informations about the k-means weightp mode ? What's it supposed/intend to do ? Is it an improvement of weightp mode 2 ? Is it made with a specific purpose in mind ?
If anyone has any information, i'm interested in.

kotuwa

30th September 2015, 19:36

FULLRANGE
In 2597 build, when --fullrange used, it says unknown parameter fullrange.
Why is that? Is it removed?
!

sneaker_ger

30th September 2015, 19:49

It was replaced years ago:
--range <string> Specify color range ["auto"]
- auto, tv, pc
--input-range <string> Specify input color range ["auto"]
- auto, tv, pc
Use --fullhelp to see all options.

Romario

28th April 2017, 02:31

Halo guys, one question for x264 devs. Is is possible to optimize x264 specificcly for AMD Ryzen and to add optimisation for upcoming AVX-512 instruction set, which come now in new XEON processors ?

sneaker_ger

28th April 2017, 06:47

The first AVX-512 optimizations are already in the development git. But this is not something that's implemented once-and-for-all. It's being added in small increments, function for function. I don't know how much faster these optimizations are at this point.

jpsdr

28th April 2017, 09:36

With for now a little trouble for me to build new version (when commited in official release), the nasm required is 2.13rc2, and the nasm provided with msys2 is 2.12... :(
But maybe this will be solved during the time the new release arrives...:D

Midzuki

28th April 2017, 17:40

.....the nasm required is 2.13rc2, and the nasm provided with msys2 is 2.12... :(

Don't you know that you can update nasm.exe manually? :confused:

http://www.nasm.us/pub/nasm/releasebuilds/2.13rc22/

Romario

28th April 2017, 18:03

The first AVX-512 optimizations are already in the development git. But this is not something that's implemented once-and-for-all. It's being added in small increments, function for function. I don't know how much faster these optimizations are at this point.

From what I know, is that AVX-512 is speccificaly wroted for Video encoding, 3D/CAD/CAM. So, I deeply believe that x264 (and x265, of course), will have visible improvments, when AVX-512 correct applied.

jpsdr

28th April 2017, 18:20

I'm only using "pacman -needed" and "pacman -Syuu". What happens if i change the nasm version manualy and do a "pacman -Syuu" ? Won't the version check be screwed up ? (Or anything similar)

benwaggoner

28th April 2017, 19:49

From what I know, is that AVX-512 is speccificaly wroted for Video encoding, 3D/CAD/CAM. So, I deeply believe that x264 (and x265, of course), will have visible improvments, when AVX-512 correct applied.
After all, an 8x8 block with 8-bit samples is exactly 512 bits!

Although from talking to codec developers, they seem to believe that HEVC will get a bigger perf improvement than H.264 from AVX-512, since there are lots of bigger block sizes. One row of a 32x32 TU in >8-bit is also 512 bits. And you can get a 8x8 chroma plane in a 16x16 TU.

NikosD

3rd May 2017, 19:09

Latest heavily optimized H.264 binaries since 2013 (first year of AVX2), have a speed-up of around 3%-5% using AVX2.

Expect AVX512 to add nothing to H.264 performance.

dbart

7th May 2017, 14:57

Latest heavily optimized H.264 binaries since 2013 (first year of AVX2), have a speed-up of around 3%-5% using AVX2.

Expect AVX512 to add nothing to H.264 performance.

which version ?

LoRd_MuldeR

7th May 2017, 16:10

which version ?

Initial support for AVX2 assembler code has been added to x264 in late 2012, around revision 2243/2242. But work to implement more AVX2-optimized functions was going on until mid 2013 (revision 2352).

commit ccda1ba4d8d902945c68aa25ec20867055d1b079 [revision 2243]
Author: Fiona Glaser <fiona@x264.com>
Date: Mon Nov 12 10:28:53 2012 -0800

AVX2/FMA3 version of mbtree_propagate
First AVX2 function for testing.
Bump yasm version to 1.2.0 for AVX2 support.

commit 8a9608bbbdf77ceb3ee537271549111468175a2b [revision 2242]
Author: Henrik Gramner <hengar-6@student.ltu.se>
Date: Tue Dec 11 16:05:34 2012 +0100

x86inc: Use VEX-encoded instructions in AVX functions
Automatically use VEX-encoding in AVX/AVX2/XOP/FMA3/FMA4 functions for all instructions that exists in a VEX-encoded version.
This change makes it easier to extend existing code to use AVX2.
Also add support for AVX emulation of a few instructions that were missing before.

nevcairiel

7th May 2017, 19:38

Initial support for AVX2 assembler code has been added to x264 in late 2012, around revision 2243/2242. But work to implement more AVX2-optimized functions was going on until mid 2013 (revision 2352).

There have been new AVX2 optimizations as recently as January 2017, fwiw.

sneaker_ger

11th June 2017, 19:19

Looks like lossless is gonna break again...

http://git.videolan.org/?p=x264/x264-sandbox.git;a=commit;h=9b6adb376b32ab68fa9bada32225652b0fce93a7
http://git.videolan.org/?p=x264/x264-sandbox.git;a=commit;h=45ee092bfa5e52391740a0cdcc3c2891bf02611d

nevcairiel

11th June 2017, 19:35

avcodec just needs a check to switch to spec-compliant mode with newer x264 versions, it already checks for x264 as the encoder to enable the x264-compatibility decoding mode.

sneaker_ger

11th June 2017, 19:36

Hmm, yes. So "only" "old" decoders gonna break. Personally, I don't think the compression gain is worth it (and it also comes at a speed cost).

Since the changelog mentions some mixed lossy/lossless mode: is that something new/yet to come?

MasterNobody

11th June 2017, 19:49

First of all it is not decided yet when "Remove compatibility workarounds" will be pushed (but it probably would be after avcodec will be able to decode them). And yes avcodec will need check for old x264 version to decode old streams (there could be problems if someone removed x264 SEI). Same for the 4:4:4 decoding.

P.S. Imho default (i.e. without SEI) decoding in avcodec should be according to specs. But that is debatable.

sneaker_ger

11th June 2017, 19:53

For default settings, what is the difference between --partitions p8x8,b8x8,i4x4 and --no-8x8dct? What should we enable to match current lossless (and 4:4:4 lossy?) behavior?

MasterNobody

11th June 2017, 20:03

For default settings, what is the difference between --partitions p8x8,b8x8,i4x4 and --no-8x8dct? What should we enable to match current lossless (and 4:4:4 lossy?) behavior?
1) --patritioins only influence inter-frames analyse
2) inter-macroblocks also can use 8x8dct transform.
To be compatible with current decoders you will need --no-8x8dct (but it wouldn't exactly match current behavior).

sneaker_ger

11th June 2017, 20:05

but it wouldn't exactly match current behavior
Why not?

MasterNobody

11th June 2017, 20:17

Why not?
1) Because currently 8x8dct is allowed in inter-macroblocks of lossless encoding and only disabled for intra-blocks (disabled i8x8 in intra/inter frames).
2) 4:4:4 encoding currently is out of specs with cabac+8x8dct and you wouldn't be able to return to out of specs behavior.

sneaker_ger

11th June 2017, 20:21

I see. Thx.

And yes avcodec will need check for old x264 version to decode old streams (there could be problems if someone removed x264 SEI).
On the other hand, 2014 ~ 2017 ffmpeg will break if you don't remove the SEI. Fun times. :D

LoRd_MuldeR

11th June 2017, 20:29

IMO it is much better to break old libavcodec/ffmpeg once (and disable the compatibility workarounds in new libavcodec/ffmpeg), instead of continuing to produce out-of-spec streams for ever and ever.

What x264 currently produces in "lossless" mode probably has never been working with any H.264 decoders, except for libavcodec/ffmpeg...

MasterNobody

11th June 2017, 20:52

What x264 currently produces in "lossless" mode probably has never been working with any H.264 decoders, except for libavcodec/ffmpeg...
Wrong. Currently lossless produce correct streams with out of spec feature disabled 3 years ago (commit (http://git.videolan.org/?p=x264.git;a=commitdiff;h=af8e768e2bd3b4398bca033998f83b0eb8874914))
4:4:4+cabac+8x8dct is out of spec now so I would recommend anyone encoding 4:4:4 with cabac to disable 8x8dct.

LoRd_MuldeR

11th June 2017, 21:03

Wrong. Currently lossless produce correct streams with out of spec feature disabled 3 years ago (commit (http://git.videolan.org/?p=x264.git;a=commitdiff;h=af8e768e2bd3b4398bca033998f83b0eb8874914))
4:4:4+cabac+8x8dct is out of spec now so I would recommend anyone encoding 4:4:4 with cabac to disable 8x8dct.

Thanks for clarification.

Anyway, I think it's safe to assume that disabling the "out of spec" features in lossless mode costs some compression efficiency. So it's preferable to finally have it fixed and re-enabled.

sneaker_ger

11th June 2017, 21:41

Efficiency loss in lossless mode is 1%, if that.

asarian

19th December 2018, 13:14

(Maybe I should post this here instead)

Hmm, just tried the latest x264, with 10-bit encoding:

x264 [warning]: OpenCL: not compiled with OpenCL support, disabling

That's disappointing. OpenCL works just fine for 8-bit encodings. Is there a particular reason OpenCL can't/doesn't work using 10bit encodings?

LoRd_MuldeR

19th December 2018, 13:23

That's disappointing. OpenCL works just fine for 8-bit encodings. Is there a particular reason OpenCL can't/doesn't work using 10bit encodings?

Let's have a look at x264 code (https://git.videolan.org/?p=x264.git;a=blob;f=encoder/encoder.c;h=074c4a5c2ba8f0a9df3778d2f1d593b22e09fa20#l593) (current master):
int validate_parameters( x264_t *h, int b_open )
{
[...]

#if !HAVE_OPENCL
x264_log( h, X264_LOG_WARNING, "OpenCL: not compiled with OpenCL support, disabling\n" );
h->param.b_opencl = 0;
#elif BIT_DEPTH > 8
x264_log( h, X264_LOG_WARNING, "OpenCL lookahead does not support high bit depth, disabling opencl\n" );
h->param.b_opencl = 0;
#else
if( h->param.i_width < 32 || h->param.i_height < 32 )
{
x264_log( h, X264_LOG_WARNING, "OpenCL: frame size is too small, disabling opencl\n" );
h->param.b_opencl = 0;
}
#endif

[...]
}

So, no OpenCL support for bit-depths greater than 8-Bit, or for very small frames.

I'd assume that's either because nobody bothered porting the OpenCL code to "high bit-depth". Or it's because GPUs tend to be orders of magnitude slower when doing calculations on data-types that the haven't been optimized for, and therefore OpenCL may not actually be worth it at "high bit-depth" (on most GPUs). For example, FP64 (double precision) math is 24 times to 32 times slower than FP32 (single precision) math on Kepler/Maxwell GPUs (details (https://arrayfire.com/explaining-fp64-performance-on-gpus/)).

asarian

19th December 2018, 13:30

^^ That code is pretty self-explanatory, I guess. Thx. Except I would then expect to get the error msg for '#elif BIT_DEPTH > 8', and not the one for not having OpenCL ('#if !HAVE_OPENCL'), which is the one I got, right?

asarian

19th December 2018, 13:37

So, no OpenCL support for bit-depths greater than 8-Bit, or for very small frames.

I'd assume that's either because nobody ever bothered porting the OpenCL code to "high bit-depth". Or it's because of the fact that GPUs tend to be orders of magnitude slower when doing calculations on data-types that the haven't been optimized for, and therefore OpenCL may not actually be worth it at "high bit-depth" (on most GPUs). Or a combination of both reasons.

Sorry, I had missed that part of your post. Good explanation. Makes sense. Thanks. :goodpost:

LoRd_MuldeR

19th December 2018, 14:02

^^ That code is pretty self-explanatory, I guess. Thx. Except I would then expect to get the error msg for '#elif BIT_DEPTH > 8', and not the one for not having OpenCL ('#if !HAVE_OPENCL'), which is the one I got, right?

Pre-processor macros like BIT_DEPTH or HAVE_OPENCL are set at compile-time, not run-time.

Also, since the "8/10 bits unification", the exactly same source code files will be compiled twice, once to generate the machine code for "8-Bit" encoding, and once to generate the machine code for "10-Bit" encoding.

Of course, pre-processor macros will be set differently for "8-Bit" and "10-Bit" compilation, so the generated machine code will actually be different for the "8-Bit" and "10-Bit" paths.

Now, it would seem that HAVE_OPENCL simply was not defined at the time when the "10-Bit" version has been compiled – which makes some sense considering that we know beforehand that OpenCL is for 8-Bit only.

(The "BIT_DEPTH > 8" check may seem a bit redundant then. But maybe it's not guaranteed that HAVE_OPENCL will always be unset for "BIT_DEPTH > 8" in every possible situation)

[UPDATE]

Indeed, HAVE_OPENCL is not simply defined as "0" or "1". It is actually defined as "(BIT_DEPTH == 8)", when building x264 with OpenCL support enabled; would probably be defined to "0" otherwise.

So, it may actually be preferable to change the code to:
#if !HAVE_OPENCL
#if BIT_DEPTH > 8
x264_log( h, X264_LOG_WARNING, "OpenCL lookahead does not support high bit depth, disabling opencl\n" );
#else
x264_log( h, X264_LOG_WARNING, "OpenCL: not compiled with OpenCL support, disabling\n" );
#endif
h->param.b_opencl = 0;
#else
[...]

(But that's nitpicking, I suppose)

asarian

19th December 2018, 14:12

Pre-processor macros like BIT_DEPTH or HAVE_OPENCL are set at compile-time, not run-time.

Doh on me! :o

Also, since the "8/10 bits unification", the exactly same source code files will be compiled twice, once to generate the machine code for "8-Bit" encoding, and once to generate the machine code for "10-Bit" encoding.

Of course, pre-processor macros will be set differently for "8-Bit" and "10-Bit" compilation, so the generated machine code will actually be different for "8-Bit" and "10-Bit" .

Now, it would seem that HAVE_OPENCL simply was not defined at the time when the "10-Bit" version has been compiled – which makes some sense, because we know beforehand that OpenCL is for 8-Bit only.

(The "BIT_DEPTH > 8" check may seem a bit redundant then. But maybe it's not guaranteed that HAVE_OPENCL will always be unset for "BIT_DEPTH > 8" in every possible situation)

As usual, thanks for the deep insight. :)

hydra3333

20th December 2018, 02:00

:goodpost: :thanks:

FranceBB

20th December 2018, 08:24

I'd assume that's either because nobody bothered porting the OpenCL code to "high bit-depth". Or it's because GPUs tend to be orders of magnitude slower when doing calculations on data-types that the haven't been optimized for, and therefore OpenCL may not actually be worth it at "high bit-depth" (on most GPUs). For example, FP64 (double precision) math is 24 times to 32 times slower than FP32 (single precision) math on Kepler/Maxwell GPUs (details (https://arrayfire.com/explaining-fp64-performance-on-gpus/)).

Very interesting article indeed.
NVIDIA performs better in Single-Precision Floating Point 16 and 32, but less for 64 on consumer-grade GPUs 'cause there aren't as many 64-capable units as the 32 ones, while AMD consumer-grade GPUs have better 64 performance due to more 64 capable units at the expense of the 32 and 16 ones.
However, on an enterprise level, NVIDIA has better performance on both Single-Precision Floating Point 32 and 64 then AMD has.
An interesting thing is that NVIDIA GPUs have 32-capable units that are also 16-bit capable, therefore not wasting space on 16-bit capable units.
The White Paper (https://images.nvidia.com/content/pdf/tesla/whitepaper/pascal-architecture-whitepaper.pdf) at page 12 says "One new capability that has been added [...] is the ability to process both 16-bit and 32-bit precision instructions and data, as described later in this paper. FP16 operation throughput is up to twice FP32 operation throughput". Page 14: " Using FP16 computation improves
performance up to 2x compared to FP32 arithmetic, and similarly FP16 data transfers take less time than FP32 or FP64 transfers."

LigH

20th December 2018, 10:12

Just a side note ... the mentioned precision may be convenient for video processing; but there are applications which would gain severe speed boost from GPGPU parallelization if it just had the required precision for their demands (like astronomical multi body simulations, see Universe Sandbox forums: PhysX had to be rejected, OpenCL is only partially used).