Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Video Encoding > MPEG-4 AVC / H.264

Reply
 
Thread Tools Search this Thread Display Modes
Old 28th March 2019, 21:00   #2121  |  Link
jpsdr
Registered User
 
Join Date: Oct 2002
Location: France
Posts: 2,245
Unless i've missed it in the code, but it seems that there is no check value on --output-depth, you can put whatever you want ! Didn't see any check on i_bitdepth, and i also didn't see anything on validate_parameters in encoder.c file...
Don't you think you should add a check on allowed value ? And the help maybe more specific, instead of "just int value"
__________________
My github.
jpsdr is offline   Reply With Quote
Old 29th March 2019, 00:09   #2122  |  Link
MasterNobody
Registered User
 
Join Date: Jul 2007
Posts: 551
Quote:
Originally Posted by jpsdr View Post
Unless i've missed it in the code, but it seems that there is no check value on --output-depth, you can put whatever you want ! Didn't see any check on i_bitdepth, and i also didn't see anything on validate_parameters in encoder.c file...
Don't you think you should add a check on allowed value ? And the help maybe more specific, instead of "just int value"
check
MasterNobody is offline   Reply With Quote
Old 29th March 2019, 11:15   #2123  |  Link
jpsdr
Registered User
 
Join Date: Oct 2002
Location: France
Posts: 2,245
Thanks, i had missed it, it was odd that i didn't find any check, it makes more sense indeed like this...
__________________
My github.
jpsdr is offline   Reply With Quote
Old 18th January 2022, 07:26   #2124  |  Link
LigH
German doom9/Gleitz SuMo
 
LigH's Avatar
 
Join Date: Oct 2001
Location: Germany, rural Altmark
Posts: 6,688
There is a (quite demanding) thread in the VideoHelp forum asking for a comprehensive official documentation of x264, similar to ReadTheDocs for x265. Does any exist which I missed? The VideoLAN homepage of x264 seems not to mention any, and the docs directory in the repo is hardly worth mentioning. So I guess the best source of knowledge appears to be the fullhelp combined with searching back in the developer mailinglist for discussions about every parameter...
__________________

New German Gleitz board
MediaFire: x264 | x265 | VPx | AOM | Xvid
LigH is offline   Reply With Quote
Old 22nd January 2022, 01:22   #2125  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 4,578
Quote:
Originally Posted by LigH View Post
There is a (quite demanding) thread in the VideoHelp forum asking for a comprehensive official documentation of x264, similar to ReadTheDocs for x265. Does any exist which I missed? The VideoLAN homepage of x264 seems not to mention any, and the docs directory in the repo is hardly worth mentioning. So I guess the best source of knowledge appears to be the fullhelp combined with searching back in the developer mailinglist for discussions about every parameter...
I am pretty confident such a thing doesn't exist, although it should. x265.readthedocs.io is, by a huge margin, the best documentation for an encoder that's existed in the last 20 years. The old Terran Interactive manuals around 1995-2000 were the closest, although there were a lot fewer parameters to document.
__________________
Ben Waggoner
Principal Video Specialist, Amazon Prime Video

My Compression Book
benwaggoner is offline   Reply With Quote
Old 28th February 2022, 14:31   #2126  |  Link
FranceBB
Broadcast Encoder
 
FranceBB's Avatar
 
Join Date: Nov 2013
Location: Royal Borough of Kensington & Chelsea, UK
Posts: 2,670
Ok, so, I have a task for you, guys: optimizing the quality of a command line x264 encode, but this time is for professional use, therefore we have some constraints.

This is the current Command Line:

Quote:
x264-10b.exe "\\mibcisilonsc\avisynth\Scambio\FILM\4CS00091.avs" --preset medium --profile high422 --level 5.2 --keyint 1 --no-cabac --slices 8 --bitrate 500000 --vbv-maxrate 500000 --vbv-bufsize 500000 --deblock -1:-1 --overscan show --colormatrix bt2020nc --range tv --log-level info --thread-input --transfer arib-std-b67 --colorprim bt2020 --videoformat component --nal-hrd cbr --aud --output-csp i422 --output-depth 10 --output "I:\Scambio\FILM\raw_video.h264"
what you see in the command line so far is mandatory, in fact the constraints are:

- Profile High 422 10bit
- Level 5.2
- keyint 1 (i.e All Intra)
- no-cabac
- slices 8
- 500 Mbit/s constant bitrate
- aud (access unit delimiter NAL at the start of each slice access unit)


Input files are UHD HDR PQ 12bit 4:4:4 files in Apple ProRes 23,976p at 1502 Mbit/s or 25p at 1600 Mbit/s or 29,970p at 1990 Mbit/s or 50p at 3318 Mbit/s or 60p at 3981 Mbit/s which are indexed, brought to 16bit, frame-rate converted to 50p in all cases, LUT converted to HLG and downscaled in chroma to 4:2:2 with Avisynth.
The resulting 16bit 4:2:2 UHD HDR HLG stream is then delivered to x264 and dithered down by x264 itself to 10bit planar with the Sierra-2-4A error diffusion and encoded as above.

A few questions for you:

1) Would it make sense to optimize x264 for quality and use like a slower preset like --preset veryslow at such an high bitrate?

2) Would it make sense to perform a two pass encode given that it's CBR and All Intra and at such an high bitrate?

3) Do you think it would make sense to mess up with --aq-mode and such at such an high bitrate?


Keep in mind that the end user will never ever see those files. Those files are sent to the playout in which an hardware playback port plays them, delivers the signal through an SDI cable and then such a signal is re-encoded live by an hardware H.265 encoder which resizes the chroma and encodes the final UHD 4:2:0 25Mbit/s 10bit satellite feed that the user receives live in .ts.
FranceBB is offline   Reply With Quote
Old 28th February 2022, 18:10   #2127  |  Link
rwill
Registered User
 
Join Date: Dec 2013
Posts: 314
For the use case 500Mbit/s is not high, just average. If you care about the quality of your intermediate file use high quality settings.

The H.265 hardware encoder at the end of the chain will botch the quality anyway though.

I don't think the x264 RC uses multipass statistics effectively for CBR btw.
rwill is offline   Reply With Quote
Old 1st March 2022, 21:26   #2128  |  Link
Emulgator
Big Bit Savings Now !
 
Emulgator's Avatar
 
Join Date: Feb 2007
Location: close to the wall
Posts: 1,406
1. Although I am using it always, I guess veryslow is not needed here
2. No 2-pass, I don't see blu-ray restriction kind of distribution as the main problem here,
3. I guess no.
I guess you want to make sure that enough bits are spent in any case.
What I did in such cases was simply limiting qp decisions and test for that on the most complex scene.
Release any bitrate restrictions, decrease qpmax as much as you see it driving bitrate up.
Maybe you see going below --qpmax 30 will definitely force bitrate upwards.
Then step back, meaning increase qpmax again a few ticks --qpmax 36 ?
till you feel safe giving enough headroom.
Now you shouldn't get too low rate decisions anymore, so hopefully safe from harm.
__________________
"To bypass shortcuts and find suffering...is called QUALity" (Die toten Augen von Friedrichshain)
"Data reduction ? Yep, Sir. We're working on that issue. Synce invntoin uf lingöage..."

Last edited by Emulgator; 1st March 2022 at 21:31.
Emulgator is offline   Reply With Quote
Old 2nd March 2022, 08:17   #2129  |  Link
stax76
Registered User
 
stax76's Avatar
 
Join Date: Jun 2002
Location: On thin ice
Posts: 6,802
There is this:

http://www.chaneru.com/Roku/HLS/X264_Settings.htm

Used in staxrip for the context (right-click) help.
stax76 is offline   Reply With Quote
Old 12th February 2023, 13:06   #2130  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 988
Moved from 'getting latest' thread: About better performance of MPEG encoders (including all x26x projects) at 'big' and 'large' architectures like AVX2 and AVX512 using multi-blocks processing program redesign from single block processing.
About I-frames only example:
I got C-sources from https://github.com/ShiftMediaProject...master/encoder and it is built with VisualStudio 2017. Other versions (including jpsdr) looks like not compatible with MSVS.
After some profiling I see some significant time is in the intra_satd_x9_4x4() function and track its call stack to the all macroblocks walking through:
It is loop in the encoder.c file: https://github.com/ShiftMediaProject...ncoder.c#L2812 . It walks through all frame macroblocks one by one by rows and columns (using MB number advancing mb_xy = i_mb_x + i_mb_y * h->mb.i_mb_width; where i_mb_x is current x-pos in MBs array and i_mb_y is current y-pos).
So practical 'workunit' size for each 'macro loop' pass is 1 macroblock only. If macroblock is of 16x16 size it mean the total CPU core executing this thread have only workunit size of 16x16 (lets 10bit proc and 16bit values per sample) - 16x16x(2 bytes per sample) = 512 bytes. Too few for CPU capable of processing up to kilobytes workunits.

So to make this part of encoder faster we need to re-design this 'macro loop' and all downstream called functions to process several macroblocks in a single pass. But it not very easy task and also if not all macroblocks in a 'group pass' are processed equally it need some more branching (like fallback to single macroblock proc if its processing is not equal to others).

The final 'macroloop' advancing at https://github.com/ShiftMediaProject...ncoder.c#L3068 will be not simple
i_mb_x++; (for progressive encode mode)
+1 advancing but
i_mb_x+=num_macroblocks_per_pass;

But program re-design to this simple 'internal parallelling to use SIMD' may take lots of time.

More close to reality of fixing example: At the processing of 16x16 macroblock with partititions down to 4x4 it split macroblock to 4x4 blocks of 4x4 and check some predictors for each 4x4 block. So it is the much smaller loop of https://github.com/ShiftMediaProject...analyse.c#L924
Code:
                    for( ; *predict_mode >= 0; predict_mode++ )
                    {
                        int i_satd;
                        int i_mode = *predict_mode;

                        if( h->mb.b_lossless )
                            x264_predict_lossless_4x4( h, p_dst_by, 0, idx, i_mode );
                        else
                            h->predict_4x4[i_mode]( p_dst_by );

                        i_satd = h->pixf.mbcmp[PIXEL_4x4]( p_src_by, FENC_STRIDE, p_dst_by, FDEC_STRIDE );
                        if( i_pred_mode == x264_mb_pred_mode4x4_fix(i_mode) )
                        {
                            i_satd -= lambda * 3;
                            if( i_satd <= 0 )
                            {
                                i_best = i_satd;
                                a->i_predict4x4[idx] = i_mode;
                                break;
                            }
                        }

                        COPY2_IF_LT( i_best, i_satd, a->i_predict4x4[idx], i_mode );
                    }
where h->pixf.mbcmp[PIXEL_4x4]( p_src_by, FENC_STRIDE, p_dst_by, FDEC_STRIDE ); is call to single-block of SATD(SAD depending on options ?) of 2 4x4 blocks (assembly function typically for each SIMD family). Count of loop spins is typically number of non-negative members in predict_mode pointed vector (about 3 or 4).

When running of the very old architectures like SSE(2) the 2 of 4x4 16bit blocks for SATD calculation takes 64 bytes to load and at x86 SSE2 with 8 only 128 bit (16 bytes) SIMD register file of 128 bytes total size it takes about half of register file and close to no space left for immediate values if try to load 2 pairs of blocks. So this implementation is 'internally limited' to SSE2 32bit build target architecture. It is optimal for speed at that architecture because at each iteration it can break by condition i_satd <= 0 and skip some predictors and save some time.

At the larger architectures it is possible to process more SATD computing of 4x4 16bit pairs blocks in single SIMD pass. So this program block may be rearranged to more SATD computing per single pass using new multi-block SATD computing SIMD function and the cycle may be changed to processing groups of predictors (typically to single pass when using up to 4 predictors) and after single SIMD function call analyse for minimal i_satd value from vector of SATD values and select minimal (also can be attempted to do with SIMD min member of vector instruction _mm_minpos_epu16() from SSE 4.1 set if SATD not great than 16bit unsigned value - unfortunately no 32bit copy of this nice instructon even at AVX512 set). But this new program block need to be guarded by 'architecture' if() block like only for AVX2 and x64 or larger and it make total program text bigger and harder to understand (and debug and support and so on).
DTL is offline   Reply With Quote
Old 28th March 2023, 13:00   #2131  |  Link
DTL
Registered User
 
Join Date: Jul 2018
Posts: 988
Based on current state of development:
- The activity of MPEG coder and preprocessing temporal denoiser is enough collaborative in both motion vectors search and usage.
- Current mvtools can use motion vectors stream from both system hardware MPEG encoding accelerators via standard now DX12-ME API from some generation of Win10 or full onCPU search. The format of motion vectors stream of both hardware MPEG encoder and mvtools is about equal. Precision may be down to qpel (current the only supported by hardware API precision).
- The multu-generation motion search for natural nosied sources shows significant improuvement in quality of motion vectors already at second generation (examples of execution structure - https://forum.doom9.org/showthread.p...52#post1984152 )

It may be interesting to reuse refined motion vectors in pre-MPEG denoising in x264 encoder (also making some offloading of mvs search to system hardware accelerators if present). May be someone with good knowledge in x264 motion vectors usage can make some fork and quick tests for performance/quality ?

Last edited by DTL; 28th March 2023 at 13:04.
DTL is offline   Reply With Quote
Old 29th March 2023, 03:15   #2132  |  Link
Selur
Registered User
 
Selur's Avatar
 
Join Date: Oct 2001
Location: Germany
Posts: 7,086
Why the resolution limitation to 16384 since https://code.videolan.org/videolan/x...7c43b418a73b36 ?
One can't encode something like, 24800x90 anymore, which should work fine with level 5.2 at 30fps and works fine with older versions (see: discussion over at videohelp).

Cu Selur
__________________
Hybrid here in the forum, homepage
Selur is offline   Reply With Quote
Old 29th March 2023, 09:21   #2133  |  Link
nevcairiel
Registered Developer
 
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,314
Quote:
Originally Posted by Selur View Post
One can't encode something like, 24800x90 anymore, which should work fine with level 5.2
This is not accurate. Such a dimension is not supported in any defined level.

From the specification:
Code:
f) PicWidthInMbs <= Sqrt( MaxFS * 8 )
g) FrameHeightInMbs <= Sqrt( MaxFS * 8 )
The highest MaxFS, on Level 6(.1/.2) is 139264 Macroblocks.
A macroblock is 16x16 pixel, just for reference.

This makes the maximum for any single dimension .. sqrt(139264 * 8) = 1055 Mbs, or 16880 pixel. Pretty close to the limit chosen, which is a neat power of 2 close to this (or 1024 Mbs)
__________________
LAV Filters - open source ffmpeg based media splitter and decoders

Last edited by nevcairiel; 29th March 2023 at 14:40.
nevcairiel is offline   Reply With Quote
Old 29th March 2023, 17:40   #2134  |  Link
Selur
Registered User
 
Selur's Avatar
 
Join Date: Oct 2001
Location: Germany
Posts: 7,086
Argh, totally forgot about PicWidthInMbs and PicWidthInMbs which makes my calculation unimportant since it only took MaxMBPS into account
Thanks for clearing that up.
__________________
Hybrid here in the forum, homepage
Selur is offline   Reply With Quote
Old 29th March 2023, 21:22   #2135  |  Link
MasterNobody
Registered User
 
Join Date: Jul 2007
Posts: 551
This change was made for security reasons, and because the limit has to be somewhere, it was made high enough to be reasonable and not overflow some intermediate SAD/SSD cost calculations for row of MBs in 32-bit integers (especially for 10-bit output). In special cases, you can always compile x264 without this limitation and use it at your own risk without any warranty.
MasterNobody is offline   Reply With Quote
Old 3rd May 2023, 16:39   #2136  |  Link
PoeBear
Registered User
 
Join Date: Jan 2017
Posts: 46
Do any of the custom x264 builds have aq-bias-strength enabled? It's been useful for spreading crf bitrate out in x265, and I was curious if it existed in x264, and came across these patches when searching, but no binaries. Would it be as useful in x264?

https://gist.github.com/noizuy/83ba8...f67de333f90e0d
https://gist.github.com/noizuy/58844...2c9406bc0b1416
PoeBear is offline   Reply With Quote
Old 17th May 2023, 17:48   #2137  |  Link
jpsdr
Registered User
 
Join Date: Oct 2002
Location: France
Posts: 2,245
Made a new build with the aq-bias-strength patch, check on my github.
__________________
My github.
jpsdr is offline   Reply With Quote
Reply

Tags
coding, development, x264 dev

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 22:42.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, vBulletin Solutions Inc.