Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Video Encoding > High Efficiency Video Coding (HEVC)

Reply
 
Thread Tools Search this Thread Display Modes
Old 7th April 2021, 20:55   #441  |  Link
Yups
Registered User
 
Join Date: Sep 2011
Posts: 362
Quote:
Originally Posted by benwaggoner View Post
But anyone giving a VMAF score needs to state what the comparison resolution was and what VMAF version was used.

All my VMAF results are based on VMAF 2.0.0 (model 0.6.1) and resolution is unchanged, original resolution for all.

I found an Iris Xe HEVC/AVC comparison from Intel btw: https://dgpu-docs.intel.com/devices/...des/media.html


Yups is offline   Reply With Quote
Old 7th April 2021, 21:34   #442  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 4,752
Quote:
Originally Posted by Yups View Post
All my VMAF results are based on VMAF 2.0.0 (model 0.6.1) and resolution is unchanged, original resolution for all.

I found an Iris Xe HEVC/AVC comparison from Intel btw: https://dgpu-docs.intel.com/devices/...des/media.html
The metrics from that article are based on Luma PSNR, which isn't something x265 is optimized for (unless you use --tune psnr). The BDRATE differences in luma PSNR is within the range where subjective quality could be quite different; it just isn't that great a metric. And x265 defaults to a lot of psychovisual optimizations that reduce PSNR in favor of improving subjective qualtiy.

That said, these suggest a generally competent encoder for high speed use (presumably why --preset medium was the top option). If you used x265 with --preset slower --tune psnr, x265 likely would win by a fair margin.
__________________
Ben Waggoner
Principal Video Specialist, Amazon Prime Video

My Compression Book
benwaggoner is offline   Reply With Quote
Old 7th April 2021, 21:51   #443  |  Link
Yups
Registered User
 
Join Date: Sep 2011
Posts: 362
That being said, Intel didn't use the highest quality preset in this (there is another image with quality preset). But of course x265 slower would win in almost every case unless their Ubuntu FFMPEG environment is better than my Windows QSVEnc environment which I don't think it is.
Yups is offline   Reply With Quote
Old 8th April 2021, 17:12   #444  |  Link
Yups
Registered User
 
Join Date: Sep 2011
Posts: 362
I could test a GTX 1660 Super next week, it's already running on the current 7th gen Nvenc generation, I'm curious how it compares to Iris Xe. Is there a settings tutorial somewhere? Is it correct that 5 bframes is the maximum number of bframes on Nvenc?
Yups is offline   Reply With Quote
Old 8th April 2021, 20:14   #445  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 4,752
Quote:
Originally Posted by Yups View Post
That being said, Intel didn't use the highest quality preset in this (there is another image with quality preset). But of course x265 slower would win in almost every case unless their Ubuntu FFMPEG environment is better than my Windows QSVEnc environment which I don't think it is.
I'm surprised they didn't use their best setting.

An always-interesting question is where the crossover point in speed/quality is between HW and SW encoders.

A key use of GPU encoders is for game streaming, where even 25% CPU utilization would hurt FPS in a lot of games.
__________________
Ben Waggoner
Principal Video Specialist, Amazon Prime Video

My Compression Book
benwaggoner is offline   Reply With Quote
Old 8th April 2021, 21:38   #446  |  Link
Yups
Registered User
 
Join Date: Sep 2011
Posts: 362
They did use the best setting in the other chart: average BDRATE computed across 27 standard short sequences generated in both CBR and VBR


Yups is offline   Reply With Quote
Old 8th April 2021, 22:51   #447  |  Link
benwaggoner
Moderator
 
Join Date: Jan 2006
Location: Portland, OR
Posts: 4,752
Quote:
Originally Posted by Yups View Post
They did use the best setting in the other chart: average BDRATE computed across 27 standard short sequences generated in both CBR and VBR
Are the axes mislabled or am I misreading? I really doubt that x265 efficiency gets worse with slower presets!

Although faster presets do use less psychovisual optimization, and mainly make choices based on SAD, which maps to PSNR better...
__________________
Ben Waggoner
Principal Video Specialist, Amazon Prime Video

My Compression Book
benwaggoner is offline   Reply With Quote
Old 8th April 2021, 23:04   #448  |  Link
Yups
Registered User
 
Join Date: Sep 2011
Posts: 362
Quote:
Originally Posted by benwaggoner View Post
Are the axes mislabled or am I misreading? I really doubt that x265 efficiency gets worse with slower presets!

Left side of the chart: Bit-rate savings (higher is better).


13.7% bitrate saving for x265 slow over medium and 11.0% higher bitrate required for very fast preset over medium. VME quality is the best Quicksync preset.
Yups is offline   Reply With Quote
Old 9th April 2021, 17:03   #449  |  Link
Yups
Registered User
 
Join Date: Sep 2011
Posts: 362
Earlier than expected I got this:




I will try CQP+Lookahead 32+bframes 5+quality preset later, if there is any other important setting I should use let me know. Bframes 5 is indeed the maximum on Turing.
Yups is offline   Reply With Quote
Old 10th April 2021, 00:29   #450  |  Link
Yups
Registered User
 
Join Date: Sep 2011
Posts: 362
I have finished my first GTX 1660 test from my last video sample. I have tried lots of different settings and this is the best I could find (b-frame ref middle gave me a nice score boost).

Code:
 
Intel Demo Clip 1080p         VMAF    PSNR    SSIM     VQM     speed    bitrate

Quicksync H265 
Iris Xe CQP best              91.76   41.85   0.9748   0.790   67 fps   2438 kbit

NVENC H265
GTX 1660S CQP best            90.62   41.28   0.9699   0.885   150 fps  2439 Kbit

x265 (Staxrip 2.1.9.0)
i7-1165G7 CRF slow            93.20   41.90   0.9754   0.800   8 fps    2430 kbit
i7-1165G7 CRF medium          91.05   41.35   0.9744   0.861   19 fps   2440 Kbit
i7-1165G7 CRF very fast       89.99   40.97   0.9726   0.894   32 fps   2440 Kbit


x264 (Staxrip 2.1.9.0)
i7-1165G7 CRF slow            89.38   40.21   0.9693   0.974   30 fps   2425 Kbit
https://drive.google.com/file/d/1Nzl...ew?usp=sharing
https://drive.google.com/file/d/1onL...ew?usp=sharing


Metric scores are a mixed bag, respectable VMAF and PSNR scores but not that good at VQM and especially SSIM metrics. Subjective frame to frame comparison it's obvious detail preservation is a lot worse compared to Iris Xe (VME/GPU) and x265.
Yups is offline   Reply With Quote
Old 10th April 2021, 15:35   #451  |  Link
Yups
Registered User
 
Join Date: Sep 2011
Posts: 362
Blender Open Movie from here: https://forum.doom9.org/showpost.php...&postcount=423

Code:
    
HERO - Blender Open Movie            VMAF      speed       bitrate

Quicksync H265 (27.20.100.9316)
Iris Xe CQP FF best                  86.56     550 fps     199 kbit
Iris Xe CQP FF balanced              85.66     850 fps     201 Kbit
Iris Xe CQP FF speed                 76.25     1550 fps    200 Kbit

Iris Xe CQP best                     88.59     167 fps     200 kbit
Iris Xe CQP balanced                 87.18     280 fps     201 Kbit
Iris Xe CQP speed                    86.22     520 fps     200 Kbit


NVENC H265 (470.14)
GTX 1660S CQP best                   84.52     430 fps     200 Kbit
GTX 1660S CQP default                83.55     990 fps     201 Kbit
GTX 1660S CQP performance            79.22     1130 fps    200 Kbit


x265 (Staxrip 2.1.9.0)
i7-1165G7 CRF slower                 90.25     8 fps       200 Kbit
i7-1165G7 CRF slow                   88.18     34 fps      200 kbit
i7-1165G7 CRF medium                 85.72     56 fps      200 Kbit
i7-1165G7 CRF very fast              83.36     73 fps      200 Kbit


x264 (Staxrip 2.1.9.0)
i7-1165G7 CRF slower                 75.37     81 fps      200 Kbit

Disabled b-adapt is better for this video. Turing CQP cannot reach Iris Xe CQP quality, subjective and objective the difference is large.

Turing has two downsides, only 5 bframes versus 16 bframes on Iris Xe and there is no GPU equivalent mode which is more flexible than a fully fixed function solution, however even the FF mode from Iris Xe looks better. It might look different with CBR vs CBR which I haven't tried. That said, the H265 CQP results from Turing are really good for a hardware encoder, something like x265 fast-faster with extremely fast encoding times, the CQP quality from Iris Xe is just insane.

Last edited by Yups; 10th April 2021 at 15:47.
Yups is offline   Reply With Quote
Old 10th April 2021, 17:41   #452  |  Link
Tenkei
Registered User
 
Join Date: Jan 2021
Posts: 10
Is there any reason to use CQP instead of ICQ with QuickSync. Never used it but it seems that ICQ is CRF equivalent. Did you try --ctu 64 and --ref X?

Last edited by Tenkei; 10th April 2021 at 17:46.
Tenkei is offline   Reply With Quote
Old 10th April 2021, 20:03   #453  |  Link
Yups
Registered User
 
Join Date: Sep 2011
Posts: 362
CQP with custom offset offers higher quality than ICQ, this old Quicksync bitrate method overview is still valid:

Quote:
Constant QP (CQP) provides the most control and best performance. Without question, the best coding efficiency with Intel codecs can be obtained via CQP plus custom content analysis. CQP often has significant performance advantages as well. CQP operates most closely to reference implementations. It is the most direct way to access codec capabilities and measure the effects of encoder parameter/algorithm trade-offs and also is the clearest way to evaluate against other codec algorithm implementations.
https://software.intel.com/content/w...media-sdk.html


For a basic user ICQ is easier to handle, there is just one global setting and that's it. Furthermore ICQ does not really scale over 5 bframes (16 bframes can be worse than 5 at low bitrate) whereas CQP scales really good beyond 5 bframes even at low bitrate. Here I did include both ICQ and CQP: https://forum.doom9.org/showpost.php...&postcount=369

On Iris Xe it automatically uses ctu 64 (Gen 9 ctu 32), this can't be changed at the moment. Tskip and SAO are also enabled on Tigerlake which I can't disable. Reference frames best leave it auto, with 16 bframes+bpyramid Intel sets it to 6 reference frames, I've tried 8 reference frames but there is no improvement.
Yups is offline   Reply With Quote
Old 17th April 2021, 23:05   #454  |  Link
Yups
Registered User
 
Join Date: Sep 2011
Posts: 362
CQP best runs ~5% faster with Intels new driver build 9466 on Iris Xe.

I was searching for CQP improvements and noticed that 14 and 15 bframes offers slightly better scores at a slightly lower bitrate on Iris Xe over 16 bframes which I was using. This is tested on Intel Demo Clip and Blender Open Movie low bitrate.

13 bframes and lower gradually decreases bitrate efficiency, it's a big degradation from 14 to 13 bframes. 14 bframes is a tiny bit better than 15. For better context:


Code:
 
Intel Demo Clip 1080p                VMAF    PSNR    SSIM     VQM     speed    bitrate

Quicksync H265 27.20.100.9466
Iris Xe CQP best 16 bframes          91.76   41.85   0.9748   0.790   73 fps   2438 kbit

Iris Xe CQP best 14 bframes          91.98   41.96   0.9754   0.780   74 fps   2426 Kbit

NVENC H265
GTX 1660S CQP best                   90.62   41.28   0.9699   0.885   150 fps  2439 Kbit

x265 (Staxrip 2.1.9.0)
i7-1165G7 CRF slow                   93.20   41.90   0.9754   0.800   8 fps    2430 kbit
i7-1165G7 CRF medium                 91.05   41.35   0.9744   0.861   19 fps   2440 Kbit
i7-1165G7 CRF very fast              89.99   40.97   0.9726   0.894   32 fps   2440 Kbit


x264 (Staxrip 2.1.9.0)
i7-1165G7 CRF slow                   89.38   40.21   0.9693   0.974   30 fps   2425 Kbit

Bitrate goes down from 2438 to 2426 and scores go up, it's a clear win.
Yups is offline   Reply With Quote
Old 18th April 2021, 21:51   #455  |  Link
MGarret
Registered User
 
Join Date: Feb 2007
Posts: 18
Quote:
Originally Posted by Yups View Post
CQP with custom offset offers higher quality than ICQ, this old Quicksync bitrate method overview is still valid:


https://software.intel.com/content/w...media-sdk.html


For a basic user ICQ is easier to handle, there is just one global setting and that's it. Furthermore ICQ does not really scale over 5 bframes (16 bframes can be worse than 5 at low bitrate) whereas CQP scales really good beyond 5 bframes even at low bitrate. Here I did include both ICQ and CQP: https://forum.doom9.org/showpost.php...&postcount=369

On Iris Xe it automatically uses ctu 64 (Gen 9 ctu 32), this can't be changed at the moment. Tskip and SAO are also enabled on Tigerlake which I can't disable. Reference frames best leave it auto, with 16 bframes+bpyramid Intel sets it to 6 reference frames, I've tried 8 reference frames but there is no improvement.
What a load of misleading information in that Intel article. CQP is not even a "proper" rate control. Of course it operates closely to reference implementations because, guess what: reference software don't even have real rate control algorithms. So, yeah, CQP with some custom software that analyzes characteristics and complexity of every scene is what huge content providers like Netflix and others use and play around to squeeze more optimized encodes to save bandwidth. I see it as an entire pipeline where several tools are used and not just one rate control algorithm. When this kind of algorithm is implemented inside the encoder, then it won't be called CQP but something else.
MGarret is offline   Reply With Quote
Old 18th April 2021, 22:01   #456  |  Link
Yups
Registered User
 
Join Date: Sep 2011
Posts: 362
Who cares? The point is CQP offers higher quality than ICQ, the Intel article is correct on this.
Yups is offline   Reply With Quote
Old 18th April 2021, 23:57   #457  |  Link
MGarret
Registered User
 
Join Date: Feb 2007
Posts: 18
The point is you obviously don't have any proof for your claim. I just pointed out what I think about some dubious claims from that article.

If you believe that just using fixed qp's is going to create efficient encoding in terms of file size/picture quality... well, I'm not going to persuade you otherwise. We all have our specific needs in regards to using encoders. Keep believing in your vmaf/psnr/ssim numbers.
MGarret is offline   Reply With Quote
Old 19th April 2021, 15:49   #458  |  Link
Yups
Registered User
 
Join Date: Sep 2011
Posts: 362
Quote:
Originally Posted by MGarret View Post
The point is you obviously don't have any proof for your claim. I just pointed out what I think about some dubious claims from that article.
What claim? The claim that ICQ is worse than CQP on Iris Xe? Dude if ICQ would offer better results over CQP I wouldn't use CQP. It would be insane if ICQ would offer even better results over CQP. The claim that CQP offers best coding efficiency on Intel isn't dubious, dubious are your postings.


Quote:
Originally Posted by MGarret View Post
If you believe that just using fixed qp's is going to create efficient encoding in terms of file size/picture quality... well, I'm not going to persuade you otherwise. We all have our specific needs in regards to using encoders. Keep believing in your vmaf/psnr/ssim numbers.

Nonsense. Subjective and objective CQP is clearly better than ICQ, I told this more than once and I have uploaded several samples. If you don't agree prove it!

If you don't believe in vmaf/psnr/ssim numbers download the sample or ask for the upload if I didn't upload any wanted sample, I'm always checking for subjective quality my results. I don't force you to believe any numbers.

Bframes scaling with CQP is way better than with ICQ or CBR/VBR on Iris Xe. CBR/VBR 7 bframes are best and on ICQ there is no real improvement over 5 bframes. With CQP it scales up to 14-16 bframes, this is one reason why ICQ can't reach CQP. This is something you can't know because you are obviously clueless.

About fixed CQP, I mean depending on the bitrate/quality target different quantization parameter are required, this is no different to any other constant rate factor. I don't understand your problem to be honest, I mean it's super easy on Intel (easier than on Nvidia).
Yups is offline   Reply With Quote
Old 20th April 2021, 12:51   #459  |  Link
MGarret
Registered User
 
Join Date: Feb 2007
Posts: 18
Quote:
Originally Posted by Yups View Post
What claim? The claim that ICQ is worse than CQP on Iris Xe? Dude if ICQ would offer better results over CQP I wouldn't use CQP. It would be insane if ICQ would offer even better results over CQP. The claim that CQP offers best coding efficiency on Intel isn't dubious, dubious are your postings.





Nonsense. Subjective and objective CQP is clearly better than ICQ, I told this more than once and I have uploaded several samples. If you don't agree prove it!

If you don't believe in vmaf/psnr/ssim numbers download the sample or ask for the upload if I didn't upload any wanted sample, I'm always checking for subjective quality my results. I don't force you to believe any numbers.

Bframes scaling with CQP is way better than with ICQ or CBR/VBR on Iris Xe. CBR/VBR 7 bframes are best and on ICQ there is no real improvement over 5 bframes. With CQP it scales up to 14-16 bframes, this is one reason why ICQ can't reach CQP. This is something you can't know because you are obviously clueless.

About fixed CQP, I mean depending on the bitrate/quality target different quantization parameter are required, this is no different to any other constant rate factor. I don't understand your problem to be honest, I mean it's super easy on Intel (easier than on Nvidia).
Hey chum, did I struck the nerve somehow?

Here, read about about etiquette on this forum.
Quote:
4) Be nice to each other and respect the moderator. Profanity and insults will not be tolerated.
I just stated my opinion and you found yourself offended because I somehow negated your findings? Are you that sensitive? I don't even know about your samples because this thread is couple of years long and I read it occasionally. I still stand by common knowledge that CQP is inefficient and "dumb" type of rate control. I don't know what Intel is doing differently and I don't care. Enjoy your Iris XE or whatever.
MGarret is offline   Reply With Quote
Old 20th April 2021, 18:44   #460  |  Link
Tenkei
Registered User
 
Join Date: Jan 2021
Posts: 10
Both of you can be correct. Usually RCF mode means better quality for the same bitrate, but if Intel's ICQ is broken it might reduce too much, like hevc-aq does in x265. Can't really conclude anything, without comparing clips of several scenes, one scene is not enough. Hand-picker quantitizer might be better for one scene, but it won't be an efficient way to encode hour long video.

Also, one frame might look better on CQ and other would be better on ICQ, so it depends on what are we comparing. Do you see quality difference in running clip or just a single random frame you chose.

Last edited by Tenkei; 20th April 2021 at 18:48.
Tenkei is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 08:42.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.