Log in

View Full Version : What is current status for hardware H.265 encoding.


Pages : 1 2 3 4 5 6 7 8 [9] 10 11 12 13 14

Yups
7th March 2021, 22:13
I don't see any visual differences, even compared to the original the difference looks relatively small which I would expect from such high VMAF scores, the original looks a bit more detailed. I have to go much lower with the bitrate.

ReinerSchweinlin
7th March 2021, 22:16
Rocketlake-S iGPU is based on Xe architecture:



https://newsroom.intel.com/wp-content/uploads/sites/11/2020/10/Intel-Rocket-Lake-S-Architecture.pdf

Thanx for posting infos about the performance of the new iGPUs from Intel and the Encoder. Appreciate it, usually there is not much substantial on the net about these..

ReinerSchweinlin
18th March 2021, 16:01
I am trying to find solid information on whether the Encoding Engine of the new UHD Quicksync Encoders in the Tiger Lake i3 are the same as the XE-Variants of the ULV i5 and I7. My thoughts are these:

Given the power draw of CPUs and the possibility of "quite good XE hardware-encodings" it might be worth a shot to switch to GEN12 Intel iGPU for transcoding a lot of files... Speed is not that important, as long as itīs effiecency is good. I donīt need 300fps for 1080p, but if I get more than realtime for a few watts - then a sloar-powered transcoding rig might be possible.

Iīd simply buy a NUC at the moment, but they seem hard to get...

@YUPS
Could you upload a small comparison encode of your XE Encoding with a very low bitrate? Letīs say a few seconds of a 720p cartoon at 200kbit/s, compared to x265 slow/anime/bframes8.. or similar? I think an extreme setting like that could reveal the differences the best.

RanmaCanada
19th March 2021, 03:57
I am trying to find solid information on whether the Encoding Engine of the new UHD Quicksync Encoders in the Tiger Lake i3 are the same as the XE-Variants of the ULV i5 and I7. My thoughts are these:

Given the power draw of CPUs and the possibility of "quite good XE hardware-encodings" it might be worth a shot to switch to GEN7 Intel iGPU for transcoding a lot of files... Speed is not that important, as long as itīs effiecency is good. I donīt need 300fps for 1080p, but if I get more than realtime for a few watts - then a sloar-powered transcoding rig might be possible.

Iīd simply buy a NUC at the moment, but they seem hard to get...

@YUPS
Could you upload a small comparison encode of your XE Encoding with a very low bitrate? Letīs say a few seconds of a 720p cartoon at 200mbit/s, compared to x265 slow/anime/bframes8.. or similar? I think an extreme setting like that could reveal the differences the best.

I would say that the encode engine is not the same, as the engine is more than likely baked into the Xe Iris graphics.

Second, 200mbit/s is NOT a low bitrate. Maybe you mean 200kbit/s? And as for solar a solar powered unit, I would suggest just to get an older i3-8130u laptop. I am currently using one as my Plex/Emby server and it rarely goes above 15 watts in cpu usage. I do not know the full power usage, but it only has a 45 watt adapter.

If an older laptop is not your thang, then you could easily get a 10th gen i3 NUC for well under $400 USD (https://www.amazon.ca/dp/B083GH7KTX/?coliid=I1ZBAYVXNNNLMC&colid=18US8TN9FA5SO&psc=1&ref_=lv_ov_lig_dp_it). Or a 10th gen laptop for a few dollars more. (https://www.amazon.ca/dp/B08H4YTTLP/?coliid=I1X9GXI33BT7E2&colid=18US8TN9FA5SO&psc=1&ref_=lv_ov_lig_dp_it)

Yups
19th March 2021, 13:09
I am trying to find solid information on whether the Encoding Engine of the new UHD Quicksync Encoders in the Tiger Lake i3 are the same as the XE-Variants of the ULV i5 and I7. My thoughts are these:

Given the power draw of CPUs and the possibility of "quite good XE hardware-encodings" it might be worth a shot to switch to GEN7 Intel iGPU for transcoding a lot of files... Speed is not that important, as long as itīs effiecency is good. I donīt need 300fps for 1080p, but if I get more than realtime for a few watts - then a sloar-powered transcoding rig might be possible.

Iīd simply buy a NUC at the moment, but they seem hard to get...

@YUPS
Could you upload a small comparison encode of your XE Encoding with a very low bitrate? Letīs say a few seconds of a 720p cartoon at 200mbit/s, compared to x265 slow/anime/bframes8.. or similar? I think an extreme setting like that could reveal the differences the best.



Yes I can on the weekend. Do you have a small cartoon sample? I would expect all Xe based iGPUs have the same quicksync encoder, the datasheet (https://d2pgu9s4sfmw1s.cloudfront.net/UAM/Prod/Done/a062E00001Zc09kQAB/bed7d2ba-23ee-44e3-a1bb-b874e7c5f746?Expires=1616155978&Key-Pair-Id=APKAJKRNIMMSNYXST6UA&Signature=cV5g091pMQei4oe2Ne4DzYlB6q80AB47n6Zr3l1JkheoTc2y0D0mM90uxhAm8eJtridkXsSnPmPp8s2IF1Cc4y7hv58JV~qL9lcTls9~EroVt7nf-VuSSG4dbK0UdfJLAHhuiRP9Y59Ma-z5Vbm4sNWyBI-t2Y-q9PlPN3FLn4kmfoUa65NPb4taK-Mt1q2jznr5zMi6wwZQRKZwAOqERaC5BAFqlGSkUXuACToOwhbZ3UDUWCO6jEkVIOdVbqAtzVQvB7zc6UeJXrJo0VDfolWPwi0KFH9zH-G4sBD1D3fY0gkO8IF6xN7caIYDI7WzvciLPbtC8UEVhco5Jj2wFg__) is relatively clear on this.

If it's branded UHD or Iris Xe graphics is not relevant when they both are Xe based and RKL-S won't get Xe brand either and you are talking about Gen7 which is Ivy Bridge based.

RanmaCanada
19th March 2021, 19:39
@YUPS the only creative commons or open source "modern cartoon" I know of is Sol Levante (https://opencontent.netflix.com/) from Netflix. There are of course the standards like Buck Bunny, Tears of Steel, but they are over a decade old at this point and do not represent the current quality of animation, be it CGI or hand drawn. The data sheet you've linked also appears to be broken!

benwaggoner
20th March 2021, 01:15
@YUPS the only creative commons or open source "modern cartoon" I know of is Sol Levante (https://opencontent.netflix.com/) from Netflix. There are of course the standards like Buck Bunny, Tears of Steel, but they are over a decade old at this point and do not represent the current quality of animation, be it CGI or hand drawn. The data sheet you've linked also appears to be broken!
I don't know that Sol Levante represents "modern animation" so much as "future animation." It's UHD HDR and a serious compression stress test with all the sharp circular lines moving around.

It's the hardest-to-encode publicly available source without grain I can think of. x265 gets serious artifacts with it at some sections, even in 1080p24 at 9 Mbps.

benwaggoner
20th March 2021, 01:20
Just looking at a test encode, with --crf 20.5 and --vbv-maxrate of 900, I get >700 frames with a QP >40. I set up a 25 Mbps peak placebo encode to run overnight to see what's possible here.

Yups
20th March 2021, 03:48
@YUPS the only creative commons or open source "modern cartoon" I know of is Sol Levante (https://opencontent.netflix.com/) from Netflix. There are of course the standards like Buck Bunny, Tears of Steel, but they are over a decade old at this point and do not represent the current quality of animation, be it CGI or hand drawn. The data sheet you've linked also appears to be broken!


Try this link: https://cdrdv2.intel.com/v1/dl/getContent/631121

If it doesn't work you have to go to Intel ark. I was trying the Sol Levante....on both Quicksync and NVENC there is no hardware decoding and therefore high CPU utilization when I encode it. It's a 12bit ProRes 4444 video, maybe that's why. Should I use CRF or bitrate mode for x265?

RanmaCanada
20th March 2021, 06:13
Try this link: https://cdrdv2.intel.com/v1/dl/getContent/631121

If it doesn't work you have to go to Intel ark. I was trying the Sol Levante....on both Quicksync and NVENC there is no hardware decoding and therefore high CPU utilization when I encode it. It's a 12bit ProRes 4444 video, maybe that's why. Should I use CRF or bitrate mode for x265?

From what I take away from the Intel Documentation is that only the Xe graphics have the new advanced features. As for what to try, the poster wanted 200kbit encode, so I would just do that, if possible.

Even looking at the processors, the core i3 (https://ark.intel.com/content/www/us/en/ark/products/208920/intel-core-i3-1115g4-processor-6m-cache-up-to-4-10-ghz-with-ipu.html) 11th gen only have UHD graphics, while the core i5 (https://ark.intel.com/content/www/us/en/ark/products/208659/intel-core-i5-1140g7-processor-8m-cache-up-to-4-20-ghz-with-ipu.html)and above have Iris Xe.

Yups
20th March 2021, 11:02
From what I take away from the Intel Documentation is that only the Xe graphics have the new advanced features. As for what to try, the poster wanted 200kbit encode, so I would just do that, if possible.

Even looking at the processors, the core i3 (https://ark.intel.com/content/www/us/en/ark/products/208920/intel-core-i3-1115g4-processor-6m-cache-up-to-4-10-ghz-with-ipu.html) 11th gen only have UHD graphics, while the core i5 (https://ark.intel.com/content/www/us/en/ark/products/208659/intel-core-i5-1140g7-processor-8m-cache-up-to-4-20-ghz-with-ipu.html)and above have Iris Xe.



All of the Tigerlake models are Xe architecture based, depending on the EU count they do get the Iris Xe branding or just UHD like for Rocket Lake-S. The datasheet says up to 96 EUs, there is no restriction for the lower EU count Xe, the FF/media unit should be the same.

i7 Tigerlake= 96 EUs Xe
i5 Tigerlake= 80 EUs Xe
i3 Tigerlake= 48 EUs Xe
Rocketlake-S= 32/24 EUs Xe

200 kbit on this UHD HDR 10bit video is not a feasible bitrate, the content isn't easy. I think 1500 kbit is a more realistic very low starting point. What VMAF scores are acceptable for you?

RanmaCanada
20th March 2021, 23:30
All of the Tigerlake models are Xe architecture based, depending on the EU count they do get the Iris Xe branding or just UHD like for Rocket Lake-S. The datasheet says up to 96 EUs, there is no restriction for the lower EU count Xe, the FF/media unit should be the same.

i7 Tigerlake= 96 EUs Xe
i5 Tigerlake= 80 EUs Xe
i3 Tigerlake= 48 EUs Xe
Rocketlake-S= 32/24 EUs Xe

200 kbit on this UHD HDR 10bit video is not a feasible bitrate, the content isn't easy. I think 1500 kbit is a more realistic very low starting point. What VMAF scores are acceptable for you?
interesting. And I dunno, we need to ask ReinerSchweinlin as they were the ones that wanted the tests in the first place!

Yups
20th March 2021, 23:52
@YUPS
Could you upload a small comparison encode of your XE Encoding with a very low bitrate? Letīs say a few seconds of a 720p cartoon at 200mbit/s, compared to x265 slow/anime/bframes8.. or similar? I think an extreme setting like that could reveal the differences the best.


I have tested Sol Levante on my GPUs and also did one x265 run. I've choosen x265 slow main10 10 bit and max 150 gop (for all). And software decoding on all GPUs, the decoding bottleneck on Iris Xe FF is quite big, it's usually a lot faster than the GPU+FF version.



Sol Levante VMAF speed bitrate

Quicksync H265 (27.20.100.9316)
HD630 CQP best 63.83 15 fps 1353 kbit

Iris Xe CQP FF best 71.09 22 fps 1338 kbit
Iris Xe CQP best 71.70 20 fps 1320 kbit

NVENC H265 (470.05)
GTX1080 CQP best 60.57 24 fps 1328 kbit


x265 (Staxrip 2.1.8.5)
i7-1165G7 slow 68.55 1.4 fps 1326 kbit


Link: https://drive.google.com/file/d/12oTKgEr2SaMt9Tnnq7IRl3jcVol1Pmzg/view?usp=sharing


VMAF below 80 is not that great, I would choose a higher bitrate.

ReinerSchweinlin
21st March 2021, 12:47
I would say that the encode engine is not the same, as the engine is more than likely baked into the Xe Iris graphics.

Second, 200mbit/s is NOT a low bitrate. Maybe you mean 200kbit/s? [/URL]
Thanx for catching the type, of course kbit/s :)

@YUPS Thanks for taking the effort to compare and make a testsample. The results are very interesting, indeed. Too bad I sold all my RTX Cards at the moment - a GEN20 NVENC Encode to compare would be interesting.

The reason why I mentioned a low bitrate cartoon was that this is my usual workflow for determining the limits and capabilities of an encoder setting... Of course higher bitrates are always better for the result, but to see what an encoder does if he is really limited gives me a better point for judgement. I am fully aware that this is biased somehow and that someone not caring about efficiency or drive space or bandwith might have other priorities.

The used Example really is a tough one :) With cartoon, I meant something like Family Guy (line art style) - which is a little easier to judge (for me at least). Modern Anime rarely has anything from a classic cartoon. But the example still shows that Iris XE seems quite capable..

I will look for a suitable, free sample :)

hm, the question remains if the smaller models of the tiger lake series with the UHD called iGPUs have the exact same encoders. The argument of course is valid, that its derived from XE Grafix, but I canīt find 100% evidence if thatīs the case. IMHO the datasheet leaves room for interpreation (unless I miss something, maybe you can the point me to the page). I remember cases like the 1650, which officially is TURING Generation, but the one thing that wasn`t was the NVENC Engine, which is VOLTA, so no B-Frame support..

If the smaller and cheaper Tiger Lake had the same Encoding Engine, that could really make up for a very nice low power encoding/Streaming/transcoding rig... If itīs a little less powerful than the Variants with the higher EU Count - wouldnīt harm my usecase :)

@RanmaCanada
Thanx for the suggestions about older laptops.. I have a bunch of older machines with low power chips, NUCs, SOCs, Laptops, etc... and these are the current candidates I use for solar powered usecases... So far, the Quality of the hardware-encoding engines wasnīt en par with x265 for a given bitrate, but with the XE Engines it seems to become competetive (while still being not power hungry)... The GEN20 NVENC isnīt too bad either... but I know of now system with lets say a 1660 which consumes as less as an XE bases system promisses to be capable off..

ReinerSchweinlin
21st March 2021, 13:08
This free video file of a blender Project would be interesting to test with a very low bitrate. I am fully aware that the results won`t be what one wants as end result - itīs more of a test how the encoder is dealing with very limited conditions.

https://upload.wikimedia.org/wikipedia/commons/a/a9/HERO_-_Blender_Open_Movie-full_movie.webm

Yes, iīts already compressed, but thats fine for me and represents a usecase for many of us anyway :) This video has a nice mix of "cartoon style" still scenes with a little gradient in the back, defined, sharp lines in the front, scenes with a little more action, some scrolling/paning, some textures, etc... The resolution is fine, if itīs no trouble for you, Iīd love to see it in 720p. (There is a reason for that: Modern encoders seem to be more optimized for higher resolutions and also the usecase calls for smaller bandwiths, which results in smaller resolutions... I found some interesting effects in these scenarios: While the same footage in 4K looks very good at edges in all encoders used - scaling it down to 720p and then viewing it on the same screen again (upscaled while playing) reveals artefacts at sharp edges - which now are much more visible than before - even if the footage wasnīt "visually super sharp" and from the viewing distance looks almost identical in terms of "optical resolution"... To deal with this effect on smaller resolutions, we all "know" that lowering the Q-Factor in constant-Q Encodings is a good idea..
So I came up with this tesprocedure of mine - encoding in a low resolution with a low bitrate reveals "a lot I need to know about an encoder to judge it".. Then i trust my eyes :)

Thanx for helping me out!
And thanx @all for commenting, of course.

ReinerSchweinlin
21st March 2021, 13:20
While looking at you encodes, I noticed something:

The hardware-encoders all were able to detect the end-credits as simple paning from bottom->up and therefore use very little bitrate (as it should be), while the x265 encode uses a LOT more bitrate for this section.. Ill try to replicate that and dig a little into the search patterns. I am sure that this makes an impact of the overall score, because if almost 1/4 of the clip gets a magnitude more bitrate than needed, the demanding rest of the clip is starving...


BTW: Back in "the days", there was "bitrate viewer 1.4" for MPEG2 files... Any successor around?

just downlaoding "SOL LEVANTE" - the prores version is so big and the download speed so slow (not maxing out my Internet at all) - which version did you use? (Or did you wait 16 hours?).

butterw2
21st March 2021, 13:48
Intel 11th Gen Desktop (Rocket Lake S, 14nm), only the i5 and up have Xe graphics (UHD-750 and UHD-730 for the i5-11400).
They have a fraction of the igpu Execution Units of the 10nm mobile Tiger Lake parts, but it is assumed that the media encoder/decoder block is the same (AV1 hw decoder, hevc encoder).
! The i3 and pentium are just rebranded current gen.

The 6core/12threads i5-11500 and i5-11400 are stated to have 65W TDP, and the T parts 35W TDP. They are sub-200$ msrp parts which can be used with the new B560 motherboards (PCIe-4 and DDR4 3200MHz).

ReinerSchweinlin
21st March 2021, 13:59
thanks for adding.

Intel 11th Gen Desktop (Rocket Lake S, 14nm), ...... but it is assumed that the media encoder/decoder block is the same

And thats exactly where clarification would be welcome :)

butterw2
21st March 2021, 14:19
The feature set is confirmed as being the same, whether the result is 100% the same (after the 14nm backport) we will likely only learn when the chips become widely available.

The main takeaway for me was that the new decoder/encoder will not be available for the cheapest chips/motherboards right now.

UHD-750/730 also has HDMI 2.0b support.
https://en.wikipedia.org/wiki/Rocket_Lake#GPU

https://en.wikipedia.org/wiki/Intel_Graphics_Technology#Integrated

Yups
21st March 2021, 17:24
hm, the question remains if the smaller models of the tiger lake series with the UHD called iGPUs have the exact same encoders. The argument of course is valid, that its derived from XE Grafix, but I canīt find 100% evidence if thatīs the case. IMHO the datasheet leaves room for interpreation (unless I miss something, maybe you can the point me to the page). I remember cases like the 1650, which officially is TURING Generation, but the one thing that wasn`t was the NVENC Engine, which is VOLTA, so no B-Frame support..



RKL-S datasheet volume 1 isn't available yet but not sure if you will find any more concrete in it. The thing is Intels media engine version is tied to the graphics architecture up to now, there is no differentiation like on Nvidia. That's why you most likely won't find something more concrete in the RKL-S datasheet because there is no need for it. The feature list (https://github.com/intel/media-driver/blob/master/docs/media_features.md#media-features-summary) from Intels open source driver is identical on Tigerlake and Rocketlake.

And by the way this might change in the future starting with Alder Lake, the various IP blocks can be upgraded at a different cadence regardless of the graphics architecture, the Xe LP graphics in future generations may get a media or display upgrade.

The arrival of this new display architecture coincides with a general disaggregation of Intel GPUs' architecture version numbering for the different component IP blocks. Going forward it isn't accurate to talk about a platform using INTEL_GEN() anymore since the various IP blocks (graphics, media, display) are moving to independent internal numbering schemes that may have different granularity and move at different cadences; the hardware teams have asked us to start tracking these values separately for "graphics," "media," and "display" such that anywhere that we need to do a numerical comparison on the architecture version, we should need to use an IP-specific version number instead of INTEL_GEN().
https://patchwork.freedesktop.org/series/87886/

Intel 11th Gen Desktop (Rocket Lake S, 14nm), only the i5 and up have Xe graphics (UHD-750 and UHD-730 for the i5-11400).
They have a fraction of the igpu Execution Units of the 10nm mobile Tiger Lake parts, but it is assumed that the media encoder/decoder block is the same (AV1 hw decoder, hevc encoder).
! The i3 and pentium are just rebranded current gen.

The 6core/12threads i5-11500 and i5-11400 are stated to have 65W TDP, and the T parts 35W TDP. They are sub-200$ msrp parts which can be used with the new B560 motherboards (PCIe-4 and DDR4 3200MHz).


The rebranded i3 are Cometlake and 10th Gen, they are not sold as RKL-S 11th Gen.

Yups
21st March 2021, 17:36
While looking at you encodes, I noticed something:

The hardware-encoders all were able to detect the end-credits as simple paning from bottom->up and therefore use very little bitrate (as it should be), while the x265 encode uses a LOT more bitrate for this section.. Ill try to replicate that and dig a little into the search patterns. I am sure that this makes an impact of the overall score, because if almost 1/4 of the clip gets a magnitude more bitrate than needed, the demanding rest of the clip is starving...


BTW: Back in "the days", there was "bitrate viewer 1.4" for MPEG2 files... Any successor around?

just downlaoding "SOL LEVANTE" - the prores version is so big and the download speed so slow (not maxing out my Internet at all) - which version did you use? (Or did you wait 16 hours?).


I haven't tried CRF on x265 for this video because it's so slow, the bitrate allocation might differ there. CQP is usually the best choice on my GPUs. Sol Levante is 37.8GB big, it was fast when I downloaded.

Yups
21st March 2021, 21:33
This free video file of a blender Project would be interesting to test with a very low bitrate. I am fully aware that the results won`t be what one wants as end result - itīs more of a test how the encoder is dealing with very limited conditions.

https://upload.wikimedia.org/wikipedia/commons/a/a9/HERO_-_Blender_Open_Movie-full_movie.webm

Yes, iīts already compressed, but thats fine for me and represents a usecase for many of us anyway :) This video has a nice mix of "cartoon style" still scenes with a little gradient in the back, defined, sharp lines in the front, scenes with a little more action, some scrolling/paning, some textures, etc... The resolution is fine, if itīs no trouble for you, Iīd love to see it in 720p.

There is a also a 720p (https://upload.wikimedia.org/wikipedia/commons/transcoded/a/a9/HERO_-_Blender_Open_Movie-full_movie.webm/HERO_-_Blender_Open_Movie-full_movie.webm.720p.vp9.webm) version on this site. The CRF mode is a lot better than the bitrate mode for this type of content.


HERO - Blender Open Movie VMAF speed bitrate

Quicksync H265 (27.20.100.9316)
Iris Xe CQP FF best 86.56 550 fps 199 kbit
Iris Xe CQP best 88.59 167 fps 200 kbit


x265 (Staxrip 2.1.9.0)
i7-1165G7 slow 84.61 26 fps 200 kbit
i7-1165G7 CRF slow 88.18 34 fps 200 kbit

Link: https://drive.google.com/file/d/1zNPEmhvDEvOZBBX9hFnDTSROZgHiLMG4/view?usp=sharing

Yups
23rd March 2021, 16:54
There is a also a 720p (https://upload.wikimedia.org/wikipedia/commons/transcoded/a/a9/HERO_-_Blender_Open_Movie-full_movie.webm/HERO_-_Blender_Open_Movie-full_movie.webm.720p.vp9.webm) version on this site.


Some more reference points for this. CRF mode only for x264/265, it's much better than bitrate mode for this test case.



HERO - Blender Open Movie VMAF speed bitrate

Quicksync H265 (27.20.100.9316)
Iris Xe CQP FF best 86.56 550 fps 199 kbit
Iris Xe CQP FF balanced 85.66 850 fps 201 Kbit
Iris Xe CQP FF speed 76.25 1550 fps 200 Kbit

Iris Xe CQP best 88.59 167 fps 200 kbit
Iris Xe CQP balanced 87.18 280 fps 201 Kbit
Iris Xe CQP speed 86.22 520 fps 200 Kbit


x265 (Staxrip 2.1.9.0)
i7-1165G7 CRF slower 90.25 8 fps 200 Kbit
i7-1165G7 CRF slow 88.18 34 fps 200 kbit
i7-1165G7 CRF medium 85.72 56 fps 200 Kbit
i7-1165G7 CRF very fast 83.36 73 fps 200 Kbit


x264 (Staxrip 2.1.9.0)
i7-1165G7 CRF slower 75.37 81 fps 200 Kbit



Fixed function speed preset does not support bframes on Iris Xe with current driver, 16 bframes for the other presets. Offset mostly 2:6:8, on some a bit lower depending on the output bitrate.

benwaggoner
23rd March 2021, 19:39
I haven't tried CRF on x265 for this video because it's so slow, the bitrate allocation might differ there. CQP is usually the best choice on my GPUs. Sol Levante is 37.8GB big, it was fast when I downloaded.
Sol Levante is my new favorite encoder stress test, as it has a lot of complex content of a type that encoders aren't typically optimized for. It's definitely the most challenging to encode source without film grain I've played with.

I'd expect HW encoders to lie down and cry (aka have a lot of visible artifacts) without using High Tier or raising level above the minimum required for frame size/fps. x265 placebo falls totally apart doing a 1080p24 Level 4.0 encode.

ReinerSchweinlin
24th March 2021, 14:15
The main takeaway for me was that the new decoder/encoder will not be available for the cheapest chips/motherboards right now.

Thanx, thats too bad. Oh well, an i5 then probably would be the way to go.

Fixed function speed preset does not support bframes on Iris Xe with current driver, 16 bframes for the other presets. Offset mostly 2:6:8, on some a bit lower depending on the output bitrate.
Thank you for testing "hero". Very interesting. Also thanx for the info about b-frames. Given the speed gain, this really looks like a usable alternative to software encoding for my planed "low power encoding rig". I expected Software Encoding to be more efficient in terms of quality/bitrate, but given the speeds and power consumption - the new XE Encoders seem "good enough". Expecting your testencodes visually, they really look good!

Yups
24th March 2021, 20:41
Iris Xe CQP is really good (for a hardware encoder), CBR/VBR are not that good due to some missing features (Lookahead). I wonder if Turing/Ampere can match or beat Iris Xe with its constant rate mode, is there a x265 comparison somewhere, how is it in comparison? Bframes support for fixed function speed preset might come in a later driver because their Linux Media SDK (https://github.com/Intel-Media-SDK/MediaSDK/blob/a636bfae6373b4e40cd800da4b5f3e8d8c97dba4/doc/mediasdk_release_notes.md) enabled it:

HEVC encode

Extended B frames support across all target usage with LowPower on


(LowPower= fixed function)

Tenkei
24th March 2021, 21:28
Iris Xe CQP is really good (for a hardware encoder), CBR/VBR are not that good due to some missing features (Lookahead). I wonder if Turing/Ampere can match or beat Iris Xe with its constant rate mode, is there a x265 comparison somewhere, how is it in comparison? Bframes support for fixed function speed preset might come in a later driver because their Linux Media SDK (https://github.com/Intel-Media-SDK/MediaSDK/blob/a636bfae6373b4e40cd800da4b5f3e8d8c97dba4/doc/mediasdk_release_notes.md) enabled it:

HEVC encode

Extended B frames support across all target usage with LowPower on


(LowPower= fixed function)

Wouldn't 2 pass negate the lack of lookahead?

benwaggoner
25th March 2021, 00:15
While looking at you encodes, I noticed something:

The hardware-encoders all were able to detect the end-credits as simple paning from bottom->up and therefore use very little bitrate (as it should be), while the x265 encode uses a LOT more bitrate for this section.. Ill try to replicate that and dig a little into the search patterns. I am sure that this makes an impact of the overall score, because if almost 1/4 of the clip gets a magnitude more bitrate than needed, the demanding rest of the clip is starving....
Yes, x265 with default settings spends an inexplicable amount of bits on title cards and scrolling credits. My guess is a defect in the Rate Factor implementation which is way overestimating how low QPs need to be for that kind of content. It's nigh-impossible to tell the difference between credits at CRF 20 and CRF 40.

It's possible it is a better match for x264. With 4x4 CUs, --amp, --rect, and --tskip, x265 can encode details down to a single pixel width, and SAO is quite good at suppressing ringing artifacts for text and line art. Intra-frame prediction with 1/8th pel is also excellent for big pages of text in the same font; basically each repeated letter gets a near-perfect prediction without residual.

benwaggoner
25th March 2021, 00:16
Wouldn't 2 pass negate the lack of lookahead?
Exactly.

Historically, it's more that lookhead partially negated the need for 2 passes, of course ;)

ReinerSchweinlin
25th March 2021, 04:27
Iris Xe CQP is really good (for a hardware encoder), CBR/VBR are not that good due to some missing features (Lookahead). I wonder if Turing/Ampere can match or beat Iris Xe with its constant rate mode, is there a x265 comparison somewhere, how is it in comparison? Bframes support for fixed function speed preset might come in a later driver because their Linux Media SDK (https://github.com/Intel-Media-SDK/MediaSDK/blob/a636bfae6373b4e40cd800da4b5f3e8d8c97dba4/doc/mediasdk_release_notes.md) enabled it:

HEVC encode

Extended B frames support across all target usage with LowPower on


(LowPower= fixed function)
Itīs really too bad I donīt have my RTX Cards any more (Sold them to get bigger ones - and then... well not buying one right now ....the remaining AMD Cards will do for the moment). The Turing Encoder really isnīt that bad either, but as far as I remember it wasnīt as good as what XE can do. In lack of a direct comparison, this is only "my feeling"...

2-pass with nvencc: As far as I rmemeber its not really 2-pass in the traiditonal sense of doing the whole file twice / the encoder takes a GOP or other small number of frames and runs them twice internally. The hybrid encoder from mainconcept offloads rate control to the CPU while using nvenc of Turing - but that resultet in much lower speeds (pure nvencc in my tests above 150fps - same input files with the hybrid encoder: around 30fps).

Maybe someone with a Turing Encoder wants to jump in and encode the examples from above?

I could test my AMD Encoder - but without b-frames we can predict the outcome :)


Thanx for confirming. I couldnīt find a "sane" setting for normal content which also works "as intented" on credit-szenes with x265.

rwill
25th March 2021, 22:13
Wouldn't 2 pass negate the lack of lookahead?

It depends on how its implemented.

Lookahead shows the encoder how things probably will develop
short term wise while 2-pass will show an encoder the average
rate @ quantizer ( rate@CRF ), among other things, over the
whole sequence plus the dirty per frame details.

Lookahead is needed for short term decisions like estimating
the video buffer development over the next N frames. This can
be done up to the end of sequence in the second pass but also
over the next N frames with buffering delay in the first pass.

If it is implemented good 2-pass can 'almost' negate the lack
of lookahead in a first pass. It depends on how well the
"rate@quantizer" predictors for short term decisions are done.
If one predicts short term frame sizes from rate@quantizer
from the previous pass or rate@distortion_cost from the current
pass or a mixture of both and how good the overall system is...

rate@quantizer correlation might go bad if quantizer differs too
much between passes and rate@distortion_cost might be bad due
to bad rate correlation with distortion_cost.

It depends.

2 pass will always beat 1 pass though, just imagine a sequence
with 5000 frames of "bitrate breaking action" followed by 5000
frames of "solid black". Then imagine what happens if you swap
the display order of the two 5k parts around.

Yups
25th March 2021, 23:36
Yes, x265 with default settings spends an inexplicable amount of bits on title cards and scrolling credits. My guess is a defect in the Rate Factor implementation which is way overestimating how low QPs need to be for that kind of content. It's nigh-impossible to tell the difference between credits at CRF 20 and CRF 40.


x265 Bitrate mode is not good for this sample, possibly because of the scrolling credits which is quite long. I will add CRF scores later this week, CRF scores are a lot higher.

excellentswordfight
26th March 2021, 09:01
Playing a bit bit with quicksync on Xe, when i try to use ICQ mode with lookahead on an i7-1165G7 with QSVEncC i get the following message:

"LA-ICQ (Intelligent Const. Quality with Lookahead) mode is not supported on current platform"

Anyone know why? I thought that Xe should have support for most (all?) features?

I also noticed that VBV isnt supported in ICQ mode, is that a feature missing on the intel side or on the encoder side? Is it on any roadmap to support it? I'm not a fan of doing encodes that are not vbv compliant with selected level, cause with lookahead I would assume that it should be possible.

Yups
26th March 2021, 16:02
Are you trying Lookahead with H265? Intel does not support Lookahead on H265, only with H264 and it works there. About VBV, this is what I found in the Media SDK:

For variable bitrate control, the MaxKbps parameter specifies the maximum bitrate at which the encoded data enters the Video Buffering Verifier (VBV) buffer. If MaxKbps is equal to zero, the value is calculated from bitrate, frame rate, profile, level, and so on.

There is a max-bitrate option in QSVEnc, I think it works for VBR and LA_VBR (h264).

Yups
28th March 2021, 13:13
As promised, here my x265 CRF results from Sol Levante at extremely low 4k bitrates. The best Iris Xe CQP preset is basically comparable to x265 medium CRF which I believe is the worst I have tested so far (based on the VMAF scores).



Sol Levante VMAF speed bitrate

Quicksync H265 (27.20.100.9316)
Iris Xe CQP best 72.45 20 fps 1378 kbit


x265 (Staxrip 2.1.9.0)
i7-1165G7 CRF slow 75.44 1.7 fps 1365 kbit
i7-1165G7 CRF medium 72.63 3.8 fps 1378 Kbit
i7-1165G7 CRF very fast 69.30 5.9 fps 1371 Kbit

https://drive.google.com/file/d/1GZ6aIfh1jubKkhKuuIIJVV6k51Wr5GS1/view?usp=sharing


main10 10 bit
gop 120
CQP 47_47_50
offset 2_5_10

ReinerSchweinlin
29th March 2021, 10:56
Thanx for all the work!!

Yups
5th April 2021, 01:47
Another x265/x265 CRF vs Iris Xe CQP comparison using this (https://drive.google.com/file/d/1YX1V0SeSkYaq6Ui41vv1wcOatbLnuLSL/view?usp=sharing) sample.



Intel Demo Clip VMAF PSNR SSIM VQM speed bitrate

Quicksync H265
Iris Xe CQP best 91.76 41.85 0.9748 0.790 67 fps 2438 kbit


x265 (Staxrip 2.1.9.0)
i7-1165G7 CRF slow 93.20 41.90 0.9754 0.800 8 fps 2430 kbit
i7-1165G7 CRF medium 91.05 41.35 0.9744 0.861 19 fps 2440 Kbit
i7-1165G7 CRF very fast 89.99 40.97 0.9726 0.894 32 fps 2440 Kbit


x264 (Staxrip 2.1.9.0)
i7-1165G7 CRF slow 89.38 40.21 0.9693 0.974 30 fps 2425 Kbit
https://drive.google.com/file/d/1onLozSdINzy2xIsHhqq09fgwENHMFYEx/view?usp=sharing


Open gop 120 for all, as for Iris Xe I was using these settings:

CQP 23_24_26
offset 2_5_8
bframes 16


Interestingly Iris Xe wins VQM metric against x265 slow, whereas VMAF/PSNR/SSIM prefer x265 slow over Iris Xe. Personally I prefer VMAF.

benwaggoner
5th April 2021, 21:24
Interestingly Iris Xe wins VQM metric against x265 slow, whereas VMAF/PSNR/SSIM prefer x265 slow over Iris Xe. Personally I prefer VMAF.
Can you expand on that? I've not really compared VMAF and VQM in depth myself. Why do you prefer VMAF?

Of course, all frame-scored metrics suffer from the problem of how to extrapolate from a score for a <10 seconds to a clip of meaningful duration that captures the impact of quality variation throughout a video.

Yups
5th April 2021, 22:00
I never use 10 seconds clips for this reason. As for your question I prefer their scoring system over the others, a higher score means better quality unlike with VQM. 100 means highest, 0 means lowest quality, it's simple.

With SSIM tiny differences in the scores could have big visual subjective differences, there is a tiny 0.001 difference between slow and medium in my last example. Apart from the scoring system I have more trust in VMAF, I believe it's closer to subjective quality than the others. However I wouldn't say it's always closest to subjective quality, there are surely cases where VQM or SSIM will do better.

benwaggoner
6th April 2021, 19:05
I never use 10 seconds clips for this reason. As for your question I prefer their scoring system over the others, a higher score means better quality unlike with VQM. 100 means highest, 0 means lowest quality, it's simple.

With SSIM tiny differences in the scores could have big visual subjective differences, there is a tiny 0.001 difference between slow and medium in my last example. Apart from the scoring system I have more trust in VMAF, I believe it's closer to subjective quality than the others. However I wouldn't say it's always closest to subjective quality, there are surely cases where VQM or SSIM will do better.
And VMAF isn't really a "metric" in the traditional sense. It's a ML model to predict subjective scores based on several relatively simple objective metrics. The ML is trained on a variety of test encodes, and Netflix has periodically improved the ML training, and has added a new objective metric at least once. So the same clip measured with 2018 VMAF would have a different score with the 2021 VMAF.

VMAF is also limited by the sorts of content that was subjectively rated in their training set. Early VMAF didn't seem to have tested different adaptive quantization modes, and so VMAF wasn't accurate in predicting subjective quality between AQ approaches. And it rated dark scenes overly high for some reason, perhaps due to training content limitations. VMAF's underlying objective metrics are all luma-only, so everything gets compared as a black-and-white film. Automated tuning based on VMAF would naturally shift bits from chroma to luma more than is psychovisually appropriate.

VMAF scores are also not based on just the encode itself, but are relative to the resolution compared to. For example a 720p encode compared to a 720p source would have a significantly higher VMAF than the exact same stream compared to the source at 1080p.

I've not played with the latest VMAF much yet, and I presume it is improved in some ways. Which is good! But anyone giving a VMAF score needs to state what the comparison resolution was and what VMAF version was used.

That said, VMAF is absolutely the least-bad metric we've ever had, and is getting better.

Yups
7th April 2021, 20:55
But anyone giving a VMAF score needs to state what the comparison resolution was and what VMAF version was used.



All my VMAF results are based on VMAF 2.0.0 (model 0.6.1) and resolution is unchanged, original resolution for all.

I found an Iris Xe HEVC/AVC comparison from Intel btw: https://dgpu-docs.intel.com/devices/iris-xe-max-graphics/guides/media.html


https://abload.de/img/dg1_hevc_quality_s_cubvk1s.png

benwaggoner
7th April 2021, 21:34
All my VMAF results are based on VMAF 2.0.0 (model 0.6.1) and resolution is unchanged, original resolution for all.

I found an Iris Xe HEVC/AVC comparison from Intel btw: https://dgpu-docs.intel.com/devices/iris-xe-max-graphics/guides/media.html

The metrics from that article are based on Luma PSNR, which isn't something x265 is optimized for (unless you use --tune psnr). The BDRATE differences in luma PSNR is within the range where subjective quality could be quite different; it just isn't that great a metric. And x265 defaults to a lot of psychovisual optimizations that reduce PSNR in favor of improving subjective qualtiy.

That said, these suggest a generally competent encoder for high speed use (presumably why --preset medium was the top option). If you used x265 with --preset slower --tune psnr, x265 likely would win by a fair margin.

Yups
7th April 2021, 21:51
That being said, Intel didn't use the highest quality preset in this (there is another image with quality preset). But of course x265 slower would win in almost every case unless their Ubuntu FFMPEG environment is better than my Windows QSVEnc environment which I don't think it is.

Yups
8th April 2021, 17:12
I could test a GTX 1660 Super next week, it's already running on the current 7th gen (https://developer.nvidia.com/video-encode-and-decode-gpu-support-matrix-new) Nvenc generation, I'm curious how it compares to Iris Xe. Is there a settings tutorial somewhere? Is it correct that 5 bframes is the maximum number of bframes on Nvenc?

benwaggoner
8th April 2021, 20:14
That being said, Intel didn't use the highest quality preset in this (there is another image with quality preset). But of course x265 slower would win in almost every case unless their Ubuntu FFMPEG environment is better than my Windows QSVEnc environment which I don't think it is.
I'm surprised they didn't use their best setting.

An always-interesting question is where the crossover point in speed/quality is between HW and SW encoders.

A key use of GPU encoders is for game streaming, where even 25% CPU utilization would hurt FPS in a lot of games.

Yups
8th April 2021, 21:38
They did use the best setting in the other chart: average BDRATE computed across 27 standard short sequences generated in both CBR and VBR


https://abload.de/img/dg1_hevc_quality_highb4jpe.png

benwaggoner
8th April 2021, 22:51
They did use the best setting in the other chart: average BDRATE computed across 27 standard short sequences generated in both CBR and VBR
Are the axes mislabled or am I misreading? I really doubt that x265 efficiency gets worse with slower presets!

Although faster presets do use less psychovisual optimization, and mainly make choices based on SAD, which maps to PSNR better...

Yups
8th April 2021, 23:04
Are the axes mislabled or am I misreading? I really doubt that x265 efficiency gets worse with slower presets!



Left side of the chart: Bit-rate savings (higher is better).


13.7% bitrate saving for x265 slow over medium and 11.0% higher bitrate required for very fast preset over medium. VME quality is the best Quicksync preset.

Yups
9th April 2021, 17:03
Earlier than expected I got this:

https://abload.de/img/nvencltk3j.png


I will try CQP+Lookahead 32+bframes 5+quality preset later, if there is any other important setting I should use let me know. Bframes 5 is indeed the maximum on Turing.

Yups
10th April 2021, 00:29
I have finished my first GTX 1660 test from my last (https://forum.doom9.org/showpost.php?p=1939967&postcount=437) video sample. I have tried lots of different settings and this is the best I could find (b-frame ref middle gave me a nice score boost).



Intel Demo Clip 1080p VMAF PSNR SSIM VQM speed bitrate

Quicksync H265
Iris Xe CQP best 91.76 41.85 0.9748 0.790 67 fps 2438 kbit

NVENC H265
GTX 1660S CQP best 90.62 41.28 0.9699 0.885 150 fps 2439 Kbit

x265 (Staxrip 2.1.9.0)
i7-1165G7 CRF slow 93.20 41.90 0.9754 0.800 8 fps 2430 kbit
i7-1165G7 CRF medium 91.05 41.35 0.9744 0.861 19 fps 2440 Kbit
i7-1165G7 CRF very fast 89.99 40.97 0.9726 0.894 32 fps 2440 Kbit


x264 (Staxrip 2.1.9.0)
i7-1165G7 CRF slow 89.38 40.21 0.9693 0.974 30 fps 2425 Kbit
https://drive.google.com/file/d/1NzlheKfYJNei6wGJRm3tp3HrnTkn_kaH/view?usp=sharing
https://drive.google.com/file/d/1onLozSdINzy2xIsHhqq09fgwENHMFYEx/view?usp=sharing


Metric scores are a mixed bag, respectable VMAF and PSNR scores but not that good at VQM and especially SSIM metrics. Subjective frame to frame comparison it's obvious detail preservation is a lot worse compared to Iris Xe (VME/GPU) and x265.