Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
![]() |
#2121 | Link |
Registered User
Join Date: Oct 2002
Location: France
Posts: 2,234
|
Unless i've missed it in the code, but it seems that there is no check value on --output-depth, you can put whatever you want ! Didn't see any check on i_bitdepth, and i also didn't see anything on validate_parameters in encoder.c file...
Don't you think you should add a check on allowed value ? And the help maybe more specific, instead of "just int value" ![]()
__________________
My github. |
![]() |
![]() |
![]() |
#2122 | Link | |
Registered User
Join Date: Jul 2007
Posts: 549
|
Quote:
|
|
![]() |
![]() |
![]() |
#2124 | Link |
German doom9/Gleitz SuMo
Join Date: Oct 2001
Location: Germany, rural Altmark
Posts: 6,659
|
There is a (quite demanding) thread in the VideoHelp forum asking for a comprehensive official documentation of x264, similar to ReadTheDocs for x265. Does any exist which I missed? The VideoLAN homepage of x264 seems not to mention any, and the docs directory in the repo is hardly worth mentioning. So I guess the best source of knowledge appears to be the fullhelp combined with searching back in the developer mailinglist for discussions about every parameter...
|
![]() |
![]() |
![]() |
#2125 | Link | |
Moderator
![]() Join Date: Jan 2006
Location: Portland, OR
Posts: 4,490
|
Quote:
|
|
![]() |
![]() |
![]() |
#2126 | Link | |
Broadcast Encoder
Join Date: Nov 2013
Location: Royal Borough of Kensington & Chelsea, UK
Posts: 2,579
|
Ok, so, I have a task for you, guys: optimizing the quality of a command line x264 encode, but this time is for professional use, therefore we have some constraints.
This is the current Command Line: Quote:
- Profile High 422 10bit - Level 5.2 - keyint 1 (i.e All Intra) - no-cabac - slices 8 - 500 Mbit/s constant bitrate - aud (access unit delimiter NAL at the start of each slice access unit) Input files are UHD HDR PQ 12bit 4:4:4 files in Apple ProRes 23,976p at 1502 Mbit/s or 25p at 1600 Mbit/s or 29,970p at 1990 Mbit/s or 50p at 3318 Mbit/s or 60p at 3981 Mbit/s which are indexed, brought to 16bit, frame-rate converted to 50p in all cases, LUT converted to HLG and downscaled in chroma to 4:2:2 with Avisynth. The resulting 16bit 4:2:2 UHD HDR HLG stream is then delivered to x264 and dithered down by x264 itself to 10bit planar with the Sierra-2-4A error diffusion and encoded as above. A few questions for you: 1) Would it make sense to optimize x264 for quality and use like a slower preset like --preset veryslow at such an high bitrate? 2) Would it make sense to perform a two pass encode given that it's CBR and All Intra and at such an high bitrate? 3) Do you think it would make sense to mess up with --aq-mode and such at such an high bitrate? Keep in mind that the end user will never ever see those files. Those files are sent to the playout in which an hardware playback port plays them, delivers the signal through an SDI cable and then such a signal is re-encoded live by an hardware H.265 encoder which resizes the chroma and encodes the final UHD 4:2:0 25Mbit/s 10bit satellite feed that the user receives live in .ts. |
|
![]() |
![]() |
![]() |
#2127 | Link |
Registered User
Join Date: Dec 2013
Posts: 289
|
For the use case 500Mbit/s is not high, just average. If you care about the quality of your intermediate file use high quality settings.
The H.265 hardware encoder at the end of the chain will botch the quality anyway though. I don't think the x264 RC uses multipass statistics effectively for CBR btw. |
![]() |
![]() |
![]() |
#2128 | Link |
Big Bit Savings Now !
Join Date: Feb 2007
Location: close to the wall
Posts: 1,361
|
1. Although I am using it always, I guess veryslow is not needed here
2. No 2-pass, I don't see blu-ray restriction kind of distribution as the main problem here, 3. I guess no. I guess you want to make sure that enough bits are spent in any case. What I did in such cases was simply limiting qp decisions and test for that on the most complex scene. Release any bitrate restrictions, decrease qpmax as much as you see it driving bitrate up. Maybe you see going below --qpmax 30 will definitely force bitrate upwards. Then step back, meaning increase qpmax again a few ticks --qpmax 36 ? till you feel safe giving enough headroom. Now you shouldn't get too low rate decisions anymore, so hopefully safe from harm.
__________________
"To bypass shortcuts and find suffering...is called QUALity" (Die toten Augen von Friedrichshain) "Data reduction ? Yep, Sir. We're working on that issue. Synce invntoin uf lingöage..." Last edited by Emulgator; 1st March 2022 at 21:31. |
![]() |
![]() |
![]() |
#2129 | Link |
Registered User
Join Date: Jun 2002
Location: On thin ice
Posts: 6,800
|
There is this:
http://www.chaneru.com/Roku/HLS/X264_Settings.htm Used in staxrip for the context (right-click) help.
__________________
https://github.com/stax76/software-list https://www.youtube.com/@stax76/playlists |
![]() |
![]() |
![]() |
#2130 | Link |
Registered User
Join Date: Jul 2018
Posts: 899
|
Moved from 'getting latest' thread: About better performance of MPEG encoders (including all x26x projects) at 'big' and 'large' architectures like AVX2 and AVX512 using multi-blocks processing program redesign from single block processing.
About I-frames only example: I got C-sources from https://github.com/ShiftMediaProject...master/encoder and it is built with VisualStudio 2017. Other versions (including jpsdr) looks like not compatible with MSVS. After some profiling I see some significant time is in the intra_satd_x9_4x4() function and track its call stack to the all macroblocks walking through: It is loop in the encoder.c file: https://github.com/ShiftMediaProject...ncoder.c#L2812 . It walks through all frame macroblocks one by one by rows and columns (using MB number advancing mb_xy = i_mb_x + i_mb_y * h->mb.i_mb_width; where i_mb_x is current x-pos in MBs array and i_mb_y is current y-pos). So practical 'workunit' size for each 'macro loop' pass is 1 macroblock only. If macroblock is of 16x16 size it mean the total CPU core executing this thread have only workunit size of 16x16 (lets 10bit proc and 16bit values per sample) - 16x16x(2 bytes per sample) = 512 bytes. Too few for CPU capable of processing up to kilobytes workunits. So to make this part of encoder faster we need to re-design this 'macro loop' and all downstream called functions to process several macroblocks in a single pass. But it not very easy task and also if not all macroblocks in a 'group pass' are processed equally it need some more branching (like fallback to single macroblock proc if its processing is not equal to others). The final 'macroloop' advancing at https://github.com/ShiftMediaProject...ncoder.c#L3068 will be not simple i_mb_x++; (for progressive encode mode) +1 advancing but i_mb_x+=num_macroblocks_per_pass; But program re-design to this simple 'internal parallelling to use SIMD' may take lots of time. More close to reality of fixing example: At the processing of 16x16 macroblock with partititions down to 4x4 it split macroblock to 4x4 blocks of 4x4 and check some predictors for each 4x4 block. So it is the much smaller loop of https://github.com/ShiftMediaProject...analyse.c#L924 Code:
for( ; *predict_mode >= 0; predict_mode++ ) { int i_satd; int i_mode = *predict_mode; if( h->mb.b_lossless ) x264_predict_lossless_4x4( h, p_dst_by, 0, idx, i_mode ); else h->predict_4x4[i_mode]( p_dst_by ); i_satd = h->pixf.mbcmp[PIXEL_4x4]( p_src_by, FENC_STRIDE, p_dst_by, FDEC_STRIDE ); if( i_pred_mode == x264_mb_pred_mode4x4_fix(i_mode) ) { i_satd -= lambda * 3; if( i_satd <= 0 ) { i_best = i_satd; a->i_predict4x4[idx] = i_mode; break; } } COPY2_IF_LT( i_best, i_satd, a->i_predict4x4[idx], i_mode ); } When running of the very old architectures like SSE(2) the 2 of 4x4 16bit blocks for SATD calculation takes 64 bytes to load and at x86 SSE2 with 8 only 128 bit (16 bytes) SIMD register file of 128 bytes total size it takes about half of register file and close to no space left for immediate values if try to load 2 pairs of blocks. So this implementation is 'internally limited' to SSE2 32bit build target architecture. It is optimal for speed at that architecture because at each iteration it can break by condition i_satd <= 0 and skip some predictors and save some time. At the larger architectures it is possible to process more SATD computing of 4x4 16bit pairs blocks in single SIMD pass. So this program block may be rearranged to more SATD computing per single pass using new multi-block SATD computing SIMD function and the cycle may be changed to processing groups of predictors (typically to single pass when using up to 4 predictors) and after single SIMD function call analyse for minimal i_satd value from vector of SATD values and select minimal (also can be attempted to do with SIMD min member of vector instruction _mm_minpos_epu16() from SSE 4.1 set if SATD not great than 16bit unsigned value - unfortunately no 32bit copy of this nice instructon even at AVX512 set). But this new program block need to be guarded by 'architecture' if() block like only for AVX2 and x64 or larger and it make total program text bigger and harder to understand (and debug and support and so on). |
![]() |
![]() |
![]() |
#2131 | Link |
Registered User
Join Date: Jul 2018
Posts: 899
|
Based on current state of development:
- The activity of MPEG coder and preprocessing temporal denoiser is enough collaborative in both motion vectors search and usage. - Current mvtools can use motion vectors stream from both system hardware MPEG encoding accelerators via standard now DX12-ME API from some generation of Win10 or full onCPU search. The format of motion vectors stream of both hardware MPEG encoder and mvtools is about equal. Precision may be down to qpel (current the only supported by hardware API precision). - The multu-generation motion search for natural nosied sources shows significant improuvement in quality of motion vectors already at second generation (examples of execution structure - https://forum.doom9.org/showthread.p...52#post1984152 ) It may be interesting to reuse refined motion vectors in pre-MPEG denoising in x264 encoder (also making some offloading of mvs search to system hardware accelerators if present). May be someone with good knowledge in x264 motion vectors usage can make some fork and quick tests for performance/quality ? Last edited by DTL; 28th March 2023 at 13:04. |
![]() |
![]() |
![]() |
#2132 | Link |
Registered User
Join Date: Oct 2001
Location: Germany
Posts: 7,034
|
Why the resolution limitation to 16384 since https://code.videolan.org/videolan/x...7c43b418a73b36 ?
One can't encode something like, 24800x90 anymore, which should work fine with level 5.2 at 30fps and works fine with older versions (see: discussion over at videohelp). Cu Selur |
![]() |
![]() |
![]() |
#2133 | Link | |
Registered Developer
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,292
|
Quote:
From the specification: Code:
f) PicWidthInMbs <= Sqrt( MaxFS * 8 ) g) FrameHeightInMbs <= Sqrt( MaxFS * 8 ) A macroblock is 16x16 pixel, just for reference. This makes the maximum for any single dimension .. sqrt(139264 * 8) = 1055 Mbs, or 16880 pixel. Pretty close to the limit chosen, which is a neat power of 2 close to this (or 1024 Mbs)
__________________
LAV Filters - open source ffmpeg based media splitter and decoders Last edited by nevcairiel; 29th March 2023 at 14:40. |
|
![]() |
![]() |
![]() |
#2135 | Link |
Registered User
Join Date: Jul 2007
Posts: 549
|
This change was made for security reasons, and because the limit has to be somewhere, it was made high enough to be reasonable and not overflow some intermediate SAD/SSD cost calculations for row of MBs in 32-bit integers (especially for 10-bit output). In special cases, you can always compile x264 without this limitation and use it at your own risk without any warranty.
|
![]() |
![]() |
![]() |
#2136 | Link |
Registered User
Join Date: Jan 2017
Posts: 46
|
Do any of the custom x264 builds have aq-bias-strength enabled? It's been useful for spreading crf bitrate out in x265, and I was curious if it existed in x264, and came across these patches when searching, but no binaries. Would it be as useful in x264?
https://gist.github.com/noizuy/83ba8...f67de333f90e0d https://gist.github.com/noizuy/58844...2c9406bc0b1416 |
![]() |
![]() |
![]() |
Tags |
coding, development, x264 dev |
Thread Tools | Search this Thread |
Display Modes | |
|
|