Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
![]() |
#1 | Link |
Lost my old account :(
Join Date: Jul 2017
Posts: 359
|
Nvenc 4:2:2
Hi,
Im not sure if this is the correct subforum, but as I suspect that its HEVC-relevant I thought I might as well put it here. I've saw that one of the features in the new 5000-series from nvdia is actually 4:2:2 support in NVENC https://blogs.nvidia.com/blog/genera...rtx-50-series/ That is tbh big news! As someone in the broadcasting-world that deals a lot with big amounts of different 4:2:2 mxf formats its a bit of game changer of we could get HW-support for these. Does anyone know anything more about this? Is it only HEVC-flavors (so mostly pro-summer formats) or also AVC-flavors (AVC-I, XAVC)? Is it decode as well as encode? Cause I saw that this was a bit of footnote feature for the latest Intel cards as well that they were going to support the sony XAVC-H format, which is a bit useless atm for most anyway as its a 8K format only atm afaik. But its atleast a good start in getting pro format support on these GPU:s. https://www.phoronix.net/image.php?i...tlemage_8_show So what do you guys think, what format support can we expect on with this 4:2:2 update on nvidia cards? |
![]() |
![]() |
![]() |
#3 | Link | |
Lost my old account :(
Join Date: Jul 2017
Posts: 359
|
Quote:
But judging from this I guess it’s both? ” The problem is that editing 4:2:2 can be very CPU-intensive. Blackwell will fix that, and a sample workflow using a Core i9-14900K went from taking 110 minutes to encode to just 10 minutes with a Blackwell GPU.” https://www.tomshardware.com/pc-comp...audio-and-more But I think you are right about hevc, unfortunately, as mentioned it’s mostly pro-sumer cameras that does the hevc-flavors. I would have loved to see xavc-i mxf support. |
|
![]() |
![]() |
![]() |
#6 | Link |
Lost my old account :(
Join Date: Jul 2017
Posts: 359
|
Some more details:
"In DaVinci Resolve, the H.265 4:2:2 10-bit processing was more than twice as fast as software decoding and exceeded even what we see from Intel Quick Sync." https://www.pugetsystems.com/labs/ar...eation-review/ So HEVC 4:2:2 hw-decoding is confirmed and already implemented in Resolve at least. Also saw this in the comments: "There was a question in the Facebook Premiere Pro group regarding the new NVDEC capabilities with the RTX 50xx cards and Fergus from Adobe answered: "Adobe has a very close partnership with Nvidia and we’ve been working with them to support the RTX 50 series GPU. We're particularly excited that the new GPU supports hardware acceleration of 10-bit 4:2:2 media in HEVC and H.264, as these formats provide a great combination of high quality and small file size. We use Nvidia’s CUDA SDK (software development kit) to provide support for their products in ours. We’ve been testing the RTX 50 series cards with a prerelease version of that SDK. Today, Nvidia has made available a final version of that SDK which means we can finalize our support and release it. That will come first in a public beta, then a final release. I don’t have a date to share about the timing of that but it is a high priority for us. I’ll post to this group when the beta is available." " So AVC support, e.g. xavc/avc-i might still be on the table. Lets hope it both on the decoder and encoder side. Last edited by excellentswordfight; 27th January 2025 at 12:46. |
![]() |
![]() |
![]() |
#11 | Link |
Registered User
Join Date: Nov 2004
Location: Poland
Posts: 2,869
|
Encoding is not much interesting, but decoding very.
Whole post world is waiting for 4:2:2 h264/5 decoding as this is used a lot in semi pro cameras and taxing CPU a lot. Only Intel and new Macs had support, where in reality Nvidia dominates that market. Resolve users are excited ![]() |
![]() |
![]() |
![]() |
#13 | Link | |||
Broadcast Encoder
Join Date: Nov 2013
Location: Royal Borough of Kensington & Chelsea, UK
Posts: 3,210
|
Quote:
![]() I managed to get my hands on an NVIDIA RTX 5080 and I started experimenting a bit. ![]() The idea was to see whether it could produce a compliant AVC Intra Class file, so I started with an AVC Intra Class 100. In particular a 1920x1080 H.264 High 4:2:2 Intra Profile, Level 4.1, 114 Mbit/s 4:2:2 25i TFF 10bit planar BT709 SDR file in a similar way to what x264 can do. By looking at the documentation, I came up with the following command line: Quote:
![]() As we can see the bitrate is all over the place, while in a proper Intra Class mode it should be constant without any deviation. Not only that, but if we look at the mediainfo we can see the references being 2 instead of 1 (which x264 can instead achieve with --ref 1): Quote:
All in all this was a pretty big disappointment. Sure, NVEnc can produce a 4:2:2 10bit H.264 file, but it doesn't seem to be able to produce a valid AVC Intra Class output, no matter which options are used. For those interested, here's the full command line option for NVEnc via FFMpeg (latest master): Code:
Encoder h264_nvenc [NVIDIA NVENC H.264 encoder]: General capabilities: dr1 delay hardware Threading capabilities: none Supported hardware devices: cuda cuda d3d11va d3d11va Supported pixel formats: yuv420p nv12 p010le yuv444p p016le yuv444p16le bgr0 bgra rgb0 rgba x2rgb10le x2bgr10le gbrp gbrp16le cuda d3d11 h264_nvenc AVOptions: -preset <int> E..V....... Set the encoding preset (from 0 to 18) (default p4) default 0 E..V....... slow 1 E..V....... hq 2 passes medium 2 E..V....... hq 1 pass fast 3 E..V....... hp 1 pass hp 4 E..V....... hq 5 E..V....... bd 6 E..V....... ll 7 E..V....... low latency llhq 8 E..V....... low latency hq llhp 9 E..V....... low latency hp lossless 10 E..V....... losslesshp 11 E..V....... p1 12 E..V....... fastest (lowest quality) p2 13 E..V....... faster (lower quality) p3 14 E..V....... fast (low quality) p4 15 E..V....... medium (default) p5 16 E..V....... slow (good quality) p6 17 E..V....... slower (better quality) p7 18 E..V....... slowest (best quality) -tune <int> E..V....... Set the encoding tuning info (from 1 to 4) (default hq) hq 1 E..V....... High quality ll 2 E..V....... Low latency ull 3 E..V....... Ultra low latency lossless 4 E..V....... Lossless -profile <int> E..V....... Set the encoding profile (from 0 to 3) (default main) baseline 0 E..V....... main 1 E..V....... high 2 E..V....... high444p 3 E..V....... -level <int> E..V....... Set the encoding level restriction (from 0 to 62) (default auto) auto 0 E..V....... 1 10 E..V....... 1.0 10 E..V....... 1b 9 E..V....... 1.0b 9 E..V....... 1.1 11 E..V....... 1.2 12 E..V....... 1.3 13 E..V....... 2 20 E..V....... 2.0 20 E..V....... 2.1 21 E..V....... 2.2 22 E..V....... 3 30 E..V....... 3.0 30 E..V....... 3.1 31 E..V....... 3.2 32 E..V....... 4 40 E..V....... 4.0 40 E..V....... 4.1 41 E..V....... 4.2 42 E..V....... 5 50 E..V....... 5.0 50 E..V....... 5.1 51 E..V....... 5.2 52 E..V....... 6.0 60 E..V....... 6.1 61 E..V....... 6.2 62 E..V....... -rc <int> E..V....... Override the preset rate-control (from -1 to INT_MAX) (default -1) constqp 0 E..V....... Constant QP mode vbr 1 E..V....... Variable bitrate mode cbr 2 E..V....... Constant bitrate mode vbr_minqp 8388609 E..V....... Variable bitrate mode with MinQP (deprecated) ll_2pass_quality 8388609 E..V....... Multi-pass optimized for image quality (deprecated) ll_2pass_size 8388610 E..V....... Multi-pass optimized for constant frame size (deprecated) vbr_2pass 8388609 E..V....... Multi-pass variable bitrate mode (deprecated) cbr_ld_hq 8388610 E..V....... Constant bitrate low delay high quality mode cbr_hq 8388610 E..V....... Constant bitrate high quality mode vbr_hq 8388609 E..V....... Variable bitrate high quality mode -rc-lookahead <int> E..V....... Number of frames to look ahead for rate-control (from 0 to INT_MAX) (default 0) -surfaces <int> E..V....... Number of concurrent surfaces (from 0 to 64) (default 0) -cbr <boolean> E..V....... Use cbr encoding mode (default false) -2pass <boolean> E..V....... Use 2pass encoding mode (default auto) -gpu <int> E..V....... Selects which NVENC capable GPU to use. First GPU is 0, second is 1, and so on. (from -2 to INT_MAX) (default any) any -1 E..V....... Pick the first device available list -2 E..V....... List the available devices -rgb_mode <int> E..V....... Configure how nvenc handles packed RGB input. (from 0 to INT_MAX) (default yuv420) yuv420 1 E..V....... Convert to yuv420 yuv444 2 E..V....... Convert to yuv444 disabled 0 E..V....... Disables support, throws an error. -delay <int> E..V....... Delay frame output by the given amount of frames (from 0 to INT_MAX) (default INT_MAX) -no-scenecut <boolean> E..V....... When lookahead is enabled, set this to 1 to disable adaptive I-frame insertion at scene cuts (default false) -forced-idr <boolean> E..V....... If forcing keyframes, force them as IDR frames. (default false) -b_adapt <boolean> E..V....... When lookahead is enabled, set this to 0 to disable adaptive B-frame decision (default true) -spatial-aq <boolean> E..V....... set to 1 to enable Spatial AQ (default false) -spatial_aq <boolean> E..V....... set to 1 to enable Spatial AQ (default false) -temporal-aq <boolean> E..V....... set to 1 to enable Temporal AQ (default false) -temporal_aq <boolean> E..V....... set to 1 to enable Temporal AQ (default false) -zerolatency <boolean> E..V....... Set 1 to indicate zero latency operation (no reordering delay) (default false) -nonref_p <boolean> E..V....... Set this to 1 to enable automatic insertion of non-reference P-frames (default false) -strict_gop <boolean> E..V....... Set 1 to minimize GOP-to-GOP rate fluctuations (default false) -aq-strength <int> E..V....... When Spatial AQ is enabled, this field is used to specify AQ strength. AQ strength scale is from 1 (low) - 15 (aggressive) (from 1 to 15) (default 8) -cq <float> E..V....... Set target quality level (0 to 51, 0 means automatic) for constant quality mode in VBR rate control (from 0 to 51) (default 0) -aud <boolean> E..V....... Use access unit delimiters (default false) -bluray-compat <boolean> E..V....... Bluray compatibility workarounds (default false) -init_qpP <int> E..V....... Initial QP value for P frame (from -1 to 51) (default -1) -init_qpB <int> E..V....... Initial QP value for B frame (from -1 to 51) (default -1) -init_qpI <int> E..V....... Initial QP value for I frame (from -1 to 51) (default -1) -qp <int> E..V....... Constant quantization parameter rate control method (from -1 to 51) (default -1) -qp_cb_offset <int> E..V....... Quantization parameter offset for cb channel (from -12 to 12) (default 0) -qp_cr_offset <int> E..V....... Quantization parameter offset for cr channel (from -12 to 12) (default 0) -weighted_pred <int> E..V....... Set 1 to enable weighted prediction (from 0 to 1) (default 0) -coder <int> E..V....... Coder type (from -1 to 2) (default default) default -1 E..V....... auto 0 E..V....... cabac 1 E..V....... cavlc 2 E..V....... ac 1 E..V....... vlc 2 E..V....... -b_ref_mode <int> E..V....... Use B frames as references (from -1 to 2) (default -1) disabled 0 E..V....... B frames will not be used for reference each 1 E..V....... Each B frame will be used for reference middle 2 E..V....... Only (number of B frames)/2 will be used for reference -a53cc <boolean> E..V....... Use A53 Closed Captions (if available) (default true) -dpb_size <int> E..V....... Specifies the DPB size used for encoding (0 means automatic) (from 0 to INT_MAX) (default 0) -multipass <int> E..V....... Set the multipass encoding (from 0 to 2) (default disabled) disabled 0 E..V....... Single Pass qres 1 E..V....... Two Pass encoding is enabled where first Pass is quarter resolution fullres 2 E..V....... Two Pass encoding is enabled where first Pass is full resolution -ldkfs <int> E..V....... Low delay key frame scale; Specifies the Scene Change frame size increase allowed in case of single frame VBV and CBR (from 0 to 255) (default 0) -extra_sei <boolean> E..V....... Pass on extra SEI data (e.g. a53 cc) to be included in the bitstream (default true) -udu_sei <boolean> E..V....... Pass on user data unregistered SEI if available (default false) -intra-refresh <boolean> E..V....... Use Periodic Intra Refresh instead of IDR frames (default false) -single-slice-intra-refresh <boolean> E..V....... Use single slice intra refresh (default false) -max_slice_size <int> E..V....... Maximum encoded slice size in bytes (from 0 to INT_MAX) (default 0) -constrained-encoding <boolean> E..V....... Enable constrainedFrame encoding where each slice in the constrained picture is independent of other slices (default false) -lookahead_level <int> E..V....... Specifies the lookahead level. Higher level may improve quality at the expense of performance. (from -1 to 15) (default -1) auto 15 E..V....... 0 0 E..V....... 1 1 E..V....... 2 2 E..V....... 3 3 E..V....... |
|||
![]() |
![]() |
![]() |
#14 | Link |
Registered User
Join Date: Nov 2004
Location: Poland
Posts: 2,869
|
There is always some deviation. There is no such a thing like CBR at constant 100Mbits. How would you encode black and white or bars frames with 100mbit with codecs like h264? Impossible without stuffing bits. It's not like nvenc can't do it. It's just not written to do it, because Nvidia has not real interest in it.
It took long time before x264 was able to do it. And without few people who pushed it, probably it would never happen. Why bother with nvenc to do AVC-I if x264 can do it quite fast and works fine. Last edited by kolak; 14th February 2025 at 22:28. |
![]() |
![]() |
![]() |
#15 | Link |
Registered User
Join Date: Aug 2024
Posts: 439
|
Nvenc can write zero padding bits, or at least the downstream software (ffmpeg in this case) should be able to do it.
But for very strict CBR without zero padding... yeah that's pretty much impossible, if you have sample that can achieve that verified in the same way, I'd be very curious about it. (zero padding bits are probably included in the framesize plot, so if it's zero padding then nothing to curious about) I think the nvenc interface provided by ffmpeg doesn't support specifying the slices number, but the tool made by rigaya can probably do it: https://github.com/rigaya/NVEnc By the way, x264 usually falls short when it comes to very small vbv buffer size, "1 frame" size is an extreme example. Even for a pure random noise source it falls behind by whopping 10%, in more "daily" contents the number can be like 30%, 50%, 70%, or really anything. So if the filler (zero padding) is ON, the bitrate you see is probably having a bunch of useless zeros... I can't imagine a situation where the padding is actually useful. Code:
>ffmpeg -lavfi color=gray:s=1920x1080:r=25:d=60,format=yuv420p,noise=allf=t:alls=100 -c:v libx264 -b:v 113664k -maxrate 113664k -bufsize 4547k -x264opts filler=0 Noiz++.264 [libx264 @ 00000276419ac7c0] kb/s:102813.98 Last edited by Z2697; 15th February 2025 at 21:39. |
![]() |
![]() |
![]() |
#16 | Link |
Acid fr0g
Join Date: May 2002
Location: Italy
Posts: 2,913
|
Contact their tech support directly (perhaps you could register as enterprise/broadcasting and have special treatment) of writing on the forum.
Usually ther are really helpful.
__________________
@turment on Telegram |
![]() |
![]() |
![]() |
#17 | Link | ||||||
Broadcast Encoder
Join Date: Nov 2013
Location: Royal Borough of Kensington & Chelsea, UK
Posts: 3,210
|
Quote:
NVEnc: ![]() x264: ![]() if we open the raw_video.h264 created by x264 with an Hex Editor we can easily see that it has been zero-filled, in fact we have a part in which there's the bitstream followed by a bunch of zeros followed by the bitstream again in a repeating pattern (which is normal): ![]() Quote:
x264 10bit doesn't have AVX512 support and it can't scale more than vertical resolution / 40, which means that for a FULL HD footage it can use up to 1080/40 = 27 threads, while for a UHD footage it can use up to 2160/40 = 54 threads. Unfortunately this comes at the expense of speed. Especially the UHD in XAVC Intra Class 300 they're 50p and even with a beast like a 56c/112th Intel Xeon we're only getting ~36fps which is slower than real time, which is a bit of an issue when it comes to time critical things like Sports highlights. Quote:
Quote:
Mediainfo of the newly created file with NVEnc: Quote:
Quote:
If I can get it to output with 1 reference frame and with zero filling / padding I might have something in my hands here. ![]() |
||||||
![]() |
![]() |
![]() |
#18 | Link |
Registered User
Join Date: Aug 2024
Posts: 439
|
My question is, does the "very strict CBR" all that important?
x264 is wasting bits with zero padding in this case, nvenc is actually maintaining a frame size close to the target, which means no wasted bits. The obvious choice is to disable the filler in x264, or ignore the fluctuations in nvenc, let them just average out. The reference frame number shouldn't matter if you encoding in all intra mode. Intra frames don't have any reference frame. Oh, and the bitrate deviation and marcoblock difference seems not very severe in all intra mode, I didn't notice it's the all intra case we are talking, sorry. (not very severe but still ~10% zero bits) As for the speed of x264 in this case, I think you can tune down a lot of things to make it a lot faster, without significant difference in the quality, because you are encoding all intra and the frame size is constricted to basically constant, there's no much room for adjusment that can be done by x264 parameters. (Tune down things from high preset, or only crank up the important ones from low preset) Last edited by Z2697; 18th February 2025 at 14:38. |
![]() |
![]() |
![]() |
#19 | Link | |
Lost my old account :(
Join Date: Jul 2017
Posts: 359
|
Quote:
@FranceBB have you looked if the issues with slices etc is a limitation in the sdk/api or just not implemented in ffmpeg? Gonna have check with some of the vendors to see if there is any plans, and if they have had any discussions with nvidia. edit. XAVC-flavors encoding using nvenc is under development in commercial products at least. So if its not possible today to configure nvenc to create compliant streams, maybe at least that work would lead to changes in the sdk to expose better control so it could later find support in FOSS solutions . Last edited by excellentswordfight; 19th February 2025 at 12:13. |
|
![]() |
![]() |
![]() |
#20 | Link | |||||
Broadcast Encoder
Join Date: Nov 2013
Location: Royal Borough of Kensington & Chelsea, UK
Posts: 3,210
|
Yes, the use case for those kind of files is an hardware playback port that is used for playout, the likes of Imagine Versio, so compliance is paramount. Those hardware playback ports are the bread and butter of linear tv and they're what used for playout, so the last thing you want is for them to crash and make the channel go black (that's potentially thousands of quids wasted very quickly).
Quote:
Quote:
Quote:
Quote:
1) why -refs 1 is being ignored when I pass it in FFMpeg 2) someone to implement enableFillerDataInsertion in FFMpeg as it's already supported by the NVIDIA SDK On this second point, one of my colleagues from Sky News talked to Timo Rothenpieler, one of the maintainers of the FFMpeg NVEnc integration and Timo replied with: Quote:
Once that is out of the way, if we can also get -refs 1 interpreted and populated correctly in the metadata (point 1), I'd say that we achieved our result and we can start seriously testing on hardware playout ports. ![]() Last edited by FranceBB; 19th February 2025 at 21:33. |
|||||
![]() |
![]() |
![]() |
Thread Tools | Search this Thread |
Display Modes | |
|
|