Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Video Encoding > MPEG-4 AVC / H.264

Reply
 
Thread Tools Search this Thread Display Modes
Old 11th June 2017, 19:36   #2081  |  Link
sneaker_ger
Registered User
 
Join Date: Dec 2002
Posts: 5,565
Hmm, yes. So "only" "old" decoders gonna break. Personally, I don't think the compression gain is worth it (and it also comes at a speed cost).

Since the changelog mentions some mixed lossy/lossless mode: is that something new/yet to come?

Last edited by sneaker_ger; 11th June 2017 at 19:51. Reason: I didn't think it through.
sneaker_ger is offline   Reply With Quote
Old 11th June 2017, 19:49   #2082  |  Link
MasterNobody
Registered User
 
Join Date: Jul 2007
Posts: 552
First of all it is not decided yet when "Remove compatibility workarounds" will be pushed (but it probably would be after avcodec will be able to decode them). And yes avcodec will need check for old x264 version to decode old streams (there could be problems if someone removed x264 SEI). Same for the 4:4:4 decoding.

P.S. Imho default (i.e. without SEI) decoding in avcodec should be according to specs. But that is debatable.

Last edited by MasterNobody; 11th June 2017 at 19:51.
MasterNobody is offline   Reply With Quote
Old 11th June 2017, 19:53   #2083  |  Link
sneaker_ger
Registered User
 
Join Date: Dec 2002
Posts: 5,565
For default settings, what is the difference between --partitions p8x8,b8x8,i4x4 and --no-8x8dct? What should we enable to match current lossless (and 4:4:4 lossy?) behavior?
sneaker_ger is offline   Reply With Quote
Old 11th June 2017, 20:03   #2084  |  Link
MasterNobody
Registered User
 
Join Date: Jul 2007
Posts: 552
Quote:
Originally Posted by sneaker_ger View Post
For default settings, what is the difference between --partitions p8x8,b8x8,i4x4 and --no-8x8dct? What should we enable to match current lossless (and 4:4:4 lossy?) behavior?
1) --patritioins only influence inter-frames analyse
2) inter-macroblocks also can use 8x8dct transform.
To be compatible with current decoders you will need --no-8x8dct (but it wouldn't exactly match current behavior).
MasterNobody is offline   Reply With Quote
Old 11th June 2017, 20:05   #2085  |  Link
sneaker_ger
Registered User
 
Join Date: Dec 2002
Posts: 5,565
Quote:
Originally Posted by MasterNobody View Post
but it wouldn't exactly match current behavior
Why not?
sneaker_ger is offline   Reply With Quote
Old 11th June 2017, 20:17   #2086  |  Link
MasterNobody
Registered User
 
Join Date: Jul 2007
Posts: 552
Quote:
Originally Posted by sneaker_ger View Post
Why not?
1) Because currently 8x8dct is allowed in inter-macroblocks of lossless encoding and only disabled for intra-blocks (disabled i8x8 in intra/inter frames).
2) 4:4:4 encoding currently is out of specs with cabac+8x8dct and you wouldn't be able to return to out of specs behavior.
MasterNobody is offline   Reply With Quote
Old 11th June 2017, 20:21   #2087  |  Link
sneaker_ger
Registered User
 
Join Date: Dec 2002
Posts: 5,565
I see. Thx.

Quote:
Originally Posted by MasterNobody View Post
And yes avcodec will need check for old x264 version to decode old streams (there could be problems if someone removed x264 SEI).
On the other hand, 2014 ~ 2017 ffmpeg will break if you don't remove the SEI. Fun times.
sneaker_ger is offline   Reply With Quote
Old 11th June 2017, 20:29   #2088  |  Link
LoRd_MuldeR
Software Developer
 
LoRd_MuldeR's Avatar
 
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,248
IMO it is much better to break old libavcodec/ffmpeg once (and disable the compatibility workarounds in new libavcodec/ffmpeg), instead of continuing to produce out-of-spec streams for ever and ever.

What x264 currently produces in "lossless" mode probably has never been working with any H.264 decoders, except for libavcodec/ffmpeg...
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊

Last edited by LoRd_MuldeR; 11th June 2017 at 20:32.
LoRd_MuldeR is offline   Reply With Quote
Old 11th June 2017, 20:52   #2089  |  Link
MasterNobody
Registered User
 
Join Date: Jul 2007
Posts: 552
Quote:
Originally Posted by LoRd_MuldeR View Post
What x264 currently produces in "lossless" mode probably has never been working with any H.264 decoders, except for libavcodec/ffmpeg...
Wrong. Currently lossless produce correct streams with out of spec feature disabled 3 years ago (commit)
4:4:4+cabac+8x8dct is out of spec now so I would recommend anyone encoding 4:4:4 with cabac to disable 8x8dct.
MasterNobody is offline   Reply With Quote
Old 11th June 2017, 21:03   #2090  |  Link
LoRd_MuldeR
Software Developer
 
LoRd_MuldeR's Avatar
 
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,248
Quote:
Originally Posted by MasterNobody View Post
Wrong. Currently lossless produce correct streams with out of spec feature disabled 3 years ago (commit)
4:4:4+cabac+8x8dct is out of spec now so I would recommend anyone encoding 4:4:4 with cabac to disable 8x8dct.
Thanks for clarification.

Anyway, I think it's safe to assume that disabling the "out of spec" features in lossless mode costs some compression efficiency. So it's preferable to finally have it fixed and re-enabled.
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊
LoRd_MuldeR is offline   Reply With Quote
Old 11th June 2017, 21:41   #2091  |  Link
sneaker_ger
Registered User
 
Join Date: Dec 2002
Posts: 5,565
Efficiency loss in lossless mode is 1%, if that.
sneaker_ger is offline   Reply With Quote
Old 19th December 2018, 13:14   #2092  |  Link
asarian
Registered User
 
Join Date: May 2005
Posts: 1,462
(Maybe I should post this here instead)

Hmm, just tried the latest x264, with 10-bit encoding:

x264 [warning]: OpenCL: not compiled with OpenCL support, disabling

That's disappointing. OpenCL works just fine for 8-bit encodings. Is there a particular reason OpenCL can't/doesn't work using 10bit encodings?
__________________
Gorgeous, delicious, deculture!
asarian is offline   Reply With Quote
Old 19th December 2018, 13:23   #2093  |  Link
LoRd_MuldeR
Software Developer
 
LoRd_MuldeR's Avatar
 
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,248
Quote:
Originally Posted by asarian View Post
That's disappointing. OpenCL works just fine for 8-bit encodings. Is there a particular reason OpenCL can't/doesn't work using 10bit encodings?
Let's have a look at x264 code (current master):
Code:
int validate_parameters( x264_t *h, int b_open )
{
        [...]

#if !HAVE_OPENCL
        x264_log( h, X264_LOG_WARNING, "OpenCL: not compiled with OpenCL support, disabling\n" );
        h->param.b_opencl = 0;
#elif BIT_DEPTH > 8
        x264_log( h, X264_LOG_WARNING, "OpenCL lookahead does not support high bit depth, disabling opencl\n" );
        h->param.b_opencl = 0;
#else
        if( h->param.i_width < 32 || h->param.i_height < 32 )
        {
            x264_log( h, X264_LOG_WARNING, "OpenCL: frame size is too small, disabling opencl\n" );
            h->param.b_opencl = 0;
        }
#endif

        [...]
}
So, no OpenCL support for bit-depths greater than 8-Bit, or for very small frames.

I'd assume that's either because nobody bothered porting the OpenCL code to "high bit-depth". Or it's because GPUs tend to be orders of magnitude slower when doing calculations on data-types that the haven't been optimized for, and therefore OpenCL may not actually be worth it at "high bit-depth" (on most GPUs). For example, FP64 (double precision) math is 24 times to 32 times slower than FP32 (single precision) math on Kepler/Maxwell GPUs (details).
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊

Last edited by LoRd_MuldeR; 19th December 2018 at 13:39.
LoRd_MuldeR is offline   Reply With Quote
Old 19th December 2018, 13:30   #2094  |  Link
asarian
Registered User
 
Join Date: May 2005
Posts: 1,462
^^ That code is pretty self-explanatory, I guess. Thx. Except I would then expect to get the error msg for '#elif BIT_DEPTH > 8', and not the one for not having OpenCL ('#if !HAVE_OPENCL'), which is the one I got, right?
__________________
Gorgeous, delicious, deculture!
asarian is offline   Reply With Quote
Old 19th December 2018, 13:37   #2095  |  Link
asarian
Registered User
 
Join Date: May 2005
Posts: 1,462
Quote:
Originally Posted by LoRd_MuldeR View Post
So, no OpenCL support for bit-depths greater than 8-Bit, or for very small frames.

I'd assume that's either because nobody ever bothered porting the OpenCL code to "high bit-depth". Or it's because of the fact that GPUs tend to be orders of magnitude slower when doing calculations on data-types that the haven't been optimized for, and therefore OpenCL may not actually be worth it at "high bit-depth" (on most GPUs). Or a combination of both reasons.
Sorry, I had missed that part of your post. Good explanation. Makes sense. Thanks.
__________________
Gorgeous, delicious, deculture!
asarian is offline   Reply With Quote
Old 19th December 2018, 14:02   #2096  |  Link
LoRd_MuldeR
Software Developer
 
LoRd_MuldeR's Avatar
 
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,248
Quote:
Originally Posted by asarian View Post
^^ That code is pretty self-explanatory, I guess. Thx. Except I would then expect to get the error msg for '#elif BIT_DEPTH > 8', and not the one for not having OpenCL ('#if !HAVE_OPENCL'), which is the one I got, right?
Pre-processor macros like BIT_DEPTH or HAVE_OPENCL are set at compile-time, not run-time.

Also, since the "8/10 bits unification", the exactly same source code files will be compiled twice, once to generate the machine code for "8-Bit" encoding, and once to generate the machine code for "10-Bit" encoding.

Of course, pre-processor macros will be set differently for "8-Bit" and "10-Bit" compilation, so the generated machine code will actually be different for the "8-Bit" and "10-Bit" paths.

Now, it would seem that HAVE_OPENCL simply was not defined at the time when the "10-Bit" version has been compiled – which makes some sense considering that we know beforehand that OpenCL is for 8-Bit only.

(The "BIT_DEPTH > 8" check may seem a bit redundant then. But maybe it's not guaranteed that HAVE_OPENCL will always be unset for "BIT_DEPTH > 8" in every possible situation)

[UPDATE]

Indeed, HAVE_OPENCL is not simply defined as "0" or "1". It is actually defined as "(BIT_DEPTH == 8)", when building x264 with OpenCL support enabled; would probably be defined to "0" otherwise.

So, it may actually be preferable to change the code to:
Code:
#if !HAVE_OPENCL
#if BIT_DEPTH > 8
        x264_log( h, X264_LOG_WARNING, "OpenCL lookahead does not support high bit depth, disabling opencl\n" );
#else
        x264_log( h, X264_LOG_WARNING, "OpenCL: not compiled with OpenCL support, disabling\n" );
#endif
        h->param.b_opencl = 0;
#else
       [...]
(But that's nitpicking, I suppose)
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊

Last edited by LoRd_MuldeR; 19th December 2018 at 15:08.
LoRd_MuldeR is offline   Reply With Quote
Old 19th December 2018, 14:12   #2097  |  Link
asarian
Registered User
 
Join Date: May 2005
Posts: 1,462
Quote:
Originally Posted by LoRd_MuldeR View Post
Pre-processor macros like BIT_DEPTH or HAVE_OPENCL are set at compile-time, not run-time.
Doh on me!

Quote:
Also, since the "8/10 bits unification", the exactly same source code files will be compiled twice, once to generate the machine code for "8-Bit" encoding, and once to generate the machine code for "10-Bit" encoding.

Of course, pre-processor macros will be set differently for "8-Bit" and "10-Bit" compilation, so the generated machine code will actually be different for "8-Bit" and "10-Bit" .

Now, it would seem that HAVE_OPENCL simply was not defined at the time when the "10-Bit" version has been compiled – which makes some sense, because we know beforehand that OpenCL is for 8-Bit only.

(The "BIT_DEPTH > 8" check may seem a bit redundant then. But maybe it's not guaranteed that HAVE_OPENCL will always be unset for "BIT_DEPTH > 8" in every possible situation)
As usual, thanks for the deep insight.
__________________
Gorgeous, delicious, deculture!
asarian is offline   Reply With Quote
Old 20th December 2018, 02:00   #2098  |  Link
hydra3333
Registered User
 
Join Date: Oct 2009
Location: crow-land
Posts: 540
hydra3333 is offline   Reply With Quote
Old 20th December 2018, 08:24   #2099  |  Link
FranceBB
Broadcast Encoder
 
FranceBB's Avatar
 
Join Date: Nov 2013
Location: Royal Borough of Kensington & Chelsea, UK
Posts: 2,901
Quote:
Originally Posted by LoRd_MuldeR View Post
I'd assume that's either because nobody bothered porting the OpenCL code to "high bit-depth". Or it's because GPUs tend to be orders of magnitude slower when doing calculations on data-types that the haven't been optimized for, and therefore OpenCL may not actually be worth it at "high bit-depth" (on most GPUs). For example, FP64 (double precision) math is 24 times to 32 times slower than FP32 (single precision) math on Kepler/Maxwell GPUs (details).
Very interesting article indeed.
NVIDIA performs better in Single-Precision Floating Point 16 and 32, but less for 64 on consumer-grade GPUs 'cause there aren't as many 64-capable units as the 32 ones, while AMD consumer-grade GPUs have better 64 performance due to more 64 capable units at the expense of the 32 and 16 ones.
However, on an enterprise level, NVIDIA has better performance on both Single-Precision Floating Point 32 and 64 then AMD has.
An interesting thing is that NVIDIA GPUs have 32-capable units that are also 16-bit capable, therefore not wasting space on 16-bit capable units.
The White Paper at page 12 says "One new capability that has been added [...] is the ability to process both 16-bit and 32-bit precision instructions and data, as described later in this paper. FP16 operation throughput is up to twice FP32 operation throughput". Page 14: " Using FP16 computation improves
performance up to 2x compared to FP32 arithmetic, and similarly FP16 data transfers take less time than FP32 or FP64 transfers."
FranceBB is offline   Reply With Quote
Old 20th December 2018, 10:12   #2100  |  Link
LigH
German doom9/Gleitz SuMo
 
LigH's Avatar
 
Join Date: Oct 2001
Location: Germany, rural Altmark
Posts: 6,779
Just a side note ... the mentioned precision may be convenient for video processing; but there are applications which would gain severe speed boost from GPGPU parallelization if it just had the required precision for their demands (like astronomical multi body simulations, see Universe Sandbox forums: PhysX had to be rejected, OpenCL is only partially used).
__________________

New German Gleitz board
MediaFire: x264 | x265 | VPx | AOM | Xvid
LigH is offline   Reply With Quote
Reply

Tags
coding, development, x264 dev

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 05:52.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.