What is current status for hardware H.265 encoding. [Archive] - Page 4

CruNcher

29th January 2017, 11:18

Yeah i wonder how this actually behaves between 970,980,980 TI :)

Also Cyberlinks HAM Decoder seems to be a little more performant in this case

Also currently i become really skeptical about Nvidias multipass (2Pass VBR aka VBRHQ) now testing it further on both encoder at least for non Realtime 2nd Generation source.

JohnLai

29th January 2017, 12:08

Yeah i wonder how this actually behaves between 970,980,980 TI :)

Also Cyberlinks HAM Decoder seems to be a little more performant in this case

Also currently i become really skeptical about Nvidias multipass (2Pass VBR aka VBRHQ) now testing it further on both encoder at least for non Realtime 2nd Generation source.

VBR vs VBRHQ?

--vbr 4500 --codec h265 --ref 6 --level 5.1 --lookahead 32
No adaptive quantization being used.

The I-frame:
VBR I-frame = http://i.imgur.com/7pLjtma.png
VBR2 I-frame = http://i.imgur.com/f8LuWXC.png

The 4th frame (P-frame) being chosen:
VBR P-frame = http://i.imgur.com/gTigDZP.png
VBR2 P-frame = http://i.imgur.com/wCjDOiM.png

Edit: crap, uploaded the wrong I-frame, done replacing*

NikosD

29th January 2017, 15:02

I did read it.
But it is a conjecture by Xaymar (OBS studio plugin developer)

That option doesn't seem to be 'lookahead' either. Hmm....oh well, no point thinking too much about it.

I just saw a comment from that developer in his OBS Studio forum:

It still is only a fake Two-Pass encoding that only adjusts the qp values for a given macroblock to store information better.

CruNcher

29th January 2017, 15:03

im also skeptical about the overall efficiency of such a high lookahead as 32 seing that the lookahead is cuda as well ;)

the balance with 16 frames is still aceeptable but with 32 its drifting extremely

and here i would really like to see and compare AMDs OpenCL implementation on Polaris and especially Vega :)

JohnLai

29th January 2017, 15:34

I just saw a comment from that developer in his OBS Studio forum:

It still is only a fake Two-Pass encoding that only adjusts the qp values for a given macroblock to store information better.

'Storing information better' might be the wrong term, 'storing information efficiently' would be better.

So, in a manner, VCE two pass may operate more or less similar to nvenc two pass.
In the nvenc screenshot I provided above, there are more intra block with VBR and less intra block with VBR2 in P-frame (probably because nvenc decided flat anime surface doesn't require the usage of better quality intra block, just my conjecture)

im also skeptical about the overall efficiency of such a high lookahead as 32 seing that the lookahead is cuda as well ;)

the balance with 16 frames is still aceeptable but with 32 its drifting extremely

From nvidia samples with SAO support, higher lookahead value such as 32 seem to make the output even worse (to be precise, it get blurrier and QP value is higher). 16 seems to be optimal lookahead for pascal gpu. (I compared pascal hevc samples with 8, 16 and 32 lookahead)

I am more skeptical about nvenc low-complexity SAO algorithm and its relation to higher lookahead value.

Meanwhile, for maxwell case, higher lookahead value doesn't seem to produce that much blurrier output compared to pascal.

EDIT: grammar correction....

NikosD

29th January 2017, 15:43

'Storing information better' might be the wrong term, 'storing information efficiently' would be better.

So, in a manner, VCE two pass may operate more or less similar to nvenc two pass.

But how did AMD manage to implement a two-pass encoding (pre-analysis) so flexible that can be used using CQP rate mode besides VBR?

I was under the impression that two pass encoding is used for VBR in order to hit closer the target bitrate (more efficiently)

Using CQP there is no target bitrate.

JohnLai

29th January 2017, 16:23

But how did AMD manage to implement a two-pass encoding (pre-analysis) so flexible that can be used using CQP rate mode besides VBR?

I was under the impression that two pass encoding is used for VBR in order to hit closer the target bitrate (more efficiently)

Using CQP there is no target bitrate.

Well....if you ask me....I would say.....isn't the way Xaymar describing AMD pre-analysis method similar to Nvidia adaptive quantization? (Fun fact, Nvidia AQ works in CQP mode too)

But, based on your VCE H264 1st Test H.264 with Pre-analysis set to Full, all macroblock is using QP of 27..........., no variation in QP value for each macroblock at all. Thus, I doubt AMD pre-analysis actually works..........
Is this sample encoded with VCE CQP method?

In NVENC case, there is variation in QP value for each macroblocks for all different rate control mode (CQP, VBR,VBR2,CBR) edit: If AQ is activated.

NikosD

29th January 2017, 16:29

Yes, if you read my settings the only difference between 1st and 2nd test is the quality parameter and the number of ReF

1st tests are using "balanced" quality with low REF and 2nd tests are using "slow" quality with max REF 16.

JohnLai

29th January 2017, 16:51

Yes, if you read my settings the only difference between 1st and 2nd test is the quality parameter and the number of ReF

1st tests are using "balanced" quality with low REF and 2nd tests are using "slow" quality with max REF 16.

KK, so based on your original question on "But how did AMD manage to implement a two-pass encoding (pre-analysis) so flexible that can be used using CQP rate mode besides VBR? " and xaymar description on pre-analysis.....the answer is....pre-analysis option doesn't do anything in all of four samples provided by you. All macroblock (H264) and CU (HEVC) is using same QP value of 27. Perhaps the reason is CQP rate control. (as its name implied, constant QP)

So...can I have four new samples using VBR rate control, please? (using the avatar video source)
1)H264 without pre-analysis
2)H264 with pre-analysis full
3)HEVC without pre-analysis
4)HEVC with pre-analysis auto

Let see if these four samples have differences.

*ignore the reference frame, it is proven Polaris VCE only make use of single reference frame no matter how many ref being specified.

NikosD

29th January 2017, 16:57

So...can I have four new samples using VBR rate control, please? (using the avatar video source)
1)H264 without pre-analysis
2)H264 with pre-analysis full
3)HEVC without pre-analysis
4)HEVC with pre-analysis auto

Let see if these four samples have differences.

OK, I'll do it later today.

*ignore the reference frame, it is proven Polaris VCE only make use of single reference frame no matter how many ref being specified.

OK, but I think Media Info says ReF 4 and ReF 16 in file properties.

JohnLai

29th January 2017, 17:28

OK, I'll do it later today.

:thanks:

OK, but I think Media Info says ReF 4 and ReF 16 in file properties.

Nope, bitstream of all four samples indicated only one preceding frame is used as reference frame.

For two of your H264 samples:
num_ref_idx_l0_default_active_minus1 0
Value should be 4 or 16 instead of 0.
num_ref_idx_override_flag 0
ref_pic_list_reordering_flag_l0 0
Override flag should be 1 and reordering flag l0 should be 4 or 16

I already posted the HEVC bitstream ref at https://forum.doom9.org/showpost.php?p=1794695&postcount=137

EDIT: media info probably check for max_num_ref_frames.

CruNcher

29th January 2017, 18:48

~-5 FPS i measured now on several UHD tests for VBR2

VBR2(HQ)

encoded 10741 frames, 35.87 fps, 23719.33 kbps, 506.69 MB
encode time 0:04:59 / CPU Usage: 75.71%

frame type IDR 41
frame type I 41, avgQP 25.41, total size 8.26 MB
frame type P 10700, avgQP 25.22, total size 498.43 MB

VBR

encoded 10741 frames, 41.08 fps, 23739.44 kbps, 507.12 MB
encode time 0:04:21 / CPU Usage: 81.61%

frame type IDR 41
frame type I 41, avgQP 25.98, total size 7.97 MB
frame type P 10700, avgQP 25.25, total size 499.15 MB

NikosD

29th January 2017, 21:30

KK, so based on your original question on "But how did AMD manage to implement a two-pass encoding (pre-analysis) so flexible that can be used using CQP rate mode besides VBR? " and xaymar description on pre-analysis.....the answer is....pre-analysis option doesn't do anything in all of four samples provided by you. All macroblock (H264) and CU (HEVC) is using same QP value of 27. Perhaps the reason is CQP rate control. (as its name implied, constant QP)

So...can I have four new samples using VBR rate control, please? (using the avatar video source)
1)H264 without pre-analysis
2)H264 with pre-analysis full

OK, so first my H.264 samples.

All of the below samples for H.264 HW encoding, use the Avatar source and they have these parameters in common:

Rate control: VBR 21000 Kbps

Max bitrate: 40000 Kbps

vbv buffer size: 20000 Kbps

Motion estimation: Full-pel

VBAQ: Enabled

1) Quality: Balanced

a) Pre-analysis: NONE
https://www.sendspace.com/file/8gbjyp

b) Pre-analysis: FULL
https://www.sendspace.com/file/g9ejib

2) Quality: Slow

a) Pre-analysis: NONE
https://www.sendspace.com/file/a97obr

b) Pre-analysis: FULL
https://www.sendspace.com/file/h565ft

Waiting for your feedback regarding pre-analysis, motion estimation and if you see differences between balanced and slow quality for H.264

JohnLai

30th January 2017, 11:52

A picture is worth a thousand words. Multiple screenshot capture is nightmare.

There are 7 frames.
I-frame, 1st P-frame, 2nd P-frame, 3rd P-frame, 4th P-frame, 5th P-frame and suddenly jump to 20th P-frame

Avatar_bal_VBR_Full_mot_est_full_preanalysis_vbaq
http://i.imgur.com/S45MBKW.jpg
http://i.imgur.com/bEvJDxg.jpg
http://i.imgur.com/uq39H9U.jpg
http://i.imgur.com/jmVleor.jpg
http://i.imgur.com/kdxRiyz.jpg
http://i.imgur.com/seOhw4W.jpg
http://i.imgur.com/RpOX7br.jpg

Avatar_bal_VBR_Full_mot_est_NO_preanalysis_vbaq
http://i.imgur.com/rEwSdJ7.jpg
http://i.imgur.com/Wo1mkXF.jpg
http://i.imgur.com/Y2Fc0Nr.jpg
http://i.imgur.com/nw48mhB.jpg
http://i.imgur.com/MQIYeyq.jpg
http://i.imgur.com/x2zSiJI.jpg
http://i.imgur.com/zKl2sJ0.jpg

Avatar_slow_VBR_NO_preanalysis
http://i.imgur.com/vItqBNF.jpg
http://i.imgur.com/2fRFis2.jpg
http://i.imgur.com/TfzPtZ0.jpg
http://i.imgur.com/r0dNdto.jpg
http://i.imgur.com/3HxYXiL.jpg
http://i.imgur.com/CExWjSh.jpg
http://i.imgur.com/UaQvaqw.jpg

Avatar_slow_VBR_preanalysis_full
http://i.imgur.com/Dewk8O5.jpg
http://i.imgur.com/aAL4zRX.jpg
http://i.imgur.com/d3FZLzc.jpg
http://i.imgur.com/Wxnlh9r.jpg
http://i.imgur.com/FMaL6gy.jpg
http://i.imgur.com/h2dTCqO.jpg
http://i.imgur.com/bPhVVMB.jpg

*Phew...done uploading*

Next....from motion vectors and QP comparison......it seem like Pre-analysis doesn't do anything at all. Changing from Balanced to Slow does make a difference....

EDIT:
20th P-frame from your previous VCE H264 CQP
AvatarAVC.mkv
http://i.imgur.com/IGEZ0vJ.jpg

AvatarAVCslow
http://i.imgur.com/AgiLUvD.jpg

AvatarHEVC
http://i.imgur.com/211j6BQ.jpg

AvatarHEVCslow
http://i.imgur.com/CdStTH7.jpg

So...yeah...as you said:
For HEVC the problem of not being able to test other than balanced quality (like fast or slow) still exists, so although I choose Slow in the 2nd test, the output says Balanced
Both HEVC samples have exactly the same QP, CU and motion vectors.

NikosD

30th January 2017, 12:41

Next....from motion vectors and QP comparison......it seem like Pre-analysis doesn't do anything at all. Changing from Balanced to Slow does make a difference....

I don't understand a lot from your screenshots, but do you think that motion estimation or VBAQ could be incompatible with pre-analysis and somehow deactivate it?

Although in the runtime info, it says pre-analysis full.

Or maybe it isn't implemented properly by AMF or rigaya.

JohnLai

30th January 2017, 12:55

I don't understand a lot from your screenshots, but do you think that motion estimation or VBAQ could be incompatible with pre-analysis and somehow deactivate it?

Although in the runtime info, it says pre-analysis full.

Or maybe it isn't implemented properly by AMF or rigaya.

No idea.

Need confirmation, do two of Avatar_slow H264 VBR samples have VBAQ enabled as well?

NikosD

30th January 2017, 13:18

Yes, look at my common settings.

All these parameters are enabled for all 4 samples.

JohnLai

30th January 2017, 14:24

Yes, look at my common settings.

All these parameters are enabled for all 4 samples.

If that the case, I am disappointed by Amd VCE AMF software support. The VCE hardware block is fine (first to support HEVC 64x64 CTU) even without bframe, but AMD seriously need to do something about its rate control.

Or pre-analysis isn't enabled by AMD driver yet?
Or there is issue with rigaya VCEenc implementation?

NikosD

30th January 2017, 14:53

VCEENC v3.02 is out.

I'll upload some HEVC files this time.

JohnLai

30th January 2017, 15:03

VCEENC v3.02 is out.

I'll upload some HEVC files this time.

So.....fix for "quality"/Slow setting for HEVC and H264 'quarter' pre-analysis? (But, you mentioned your h264 samples are using "full", not "quarter"? Plus pre-analysis 'none' and 'full' doesn't change anything to the bitstream)

CruNcher

1st February 2017, 16:14

VBR

- Better bitrate target hit
- Higher Performance
- Higher Metrics

VBR2(HQ)

- Worse bitrate target hit
- Lower Performance
- Lower Metrics

Really strange, i still have to find a input where it is showing some visible improvement that would somehow justify it's Performance cost.

And the difference on the input itself im testing with is so visual minimal that no one would see that difference ever per block, the only thing you will hardly recognize is the Performance cost, i guess this was really optimized entirely for Game Streaming scenarios.

JohnLai

1st February 2017, 16:56

VBR

- Better bitrate target hit
- Higher Performance
- Higher Metrics

VBR2(HQ)

- Worse bitrate target hit
- Lower Performance
- Lower Metrics

Really strange, i still have to find a input where it is showing some visible improvement that would somehow justify it's Performance cost.

And the difference on the input itself im testing with is so visual minimal that no one would see that difference ever per block, the only thing you will hardly recognize is the Performance cost, i guess this was really optimized entirely for Game Streaming scenarios.

Which is why I often recommend the usage of CQP and standard VBR in conjunction with lookahead and adaptive quantization. :D

Back in the day where lookahead, adaptive GOP and AQ didn't exist yet, VBR2Pass is useful for optimizing (allocating proper QP value / distributing bit) every frame due to fixed GOP nature. Imagine this, there is only one I-frame for every 300 frames (for 30fps video). If there is screen transition within these fixed GOP, then video quality will suffer because there is no 'proper' high quality I-frame for subsequent P-frame to refer with.

With existence of Lookahead and AQ, there is no reason to use two pass for transcoding unless one requires extremely low latency encoding such as video conferencing.

I prefer to use Nvenc Unrestrainted VBR Constant Quality mode + AQ + Lookahead than CQP.

*Note for newcomers who read this: Hardware based encoder "two pass" works differently than software based encoder.

edit:
VCEEnc 3.03 is out....
Google Translation of changelog
[Common]
· Allow VBAQ in vbr / cbr mode on HEVC.

[VCEEnc.auo]
· Also be able to use HW resizer from VCEEnc.auo.
· Fixed that Level is not saved correctly when HEVC encoder is on.
- Fixed the problem that - vbr does not work properly with HEVC encod of VCEEncC.

[VCEEncC]
· Avsw reader supports YUV 420 10 bit reading (encoding is 8 bit).
· When using avsw reader, colors are shifted depending on resolution.

NikosD

1st February 2017, 19:25

After the last emails exchanged with rigaya, I have come to some conclusions regarding VCEEncC and Polaris HW encoder.

* The quality option -u (slow, balanced, fast) is not working for HEVC. It was fixed only as runtime/ log info, but the output is the same for all quality options.

* HW 10bit HEVC decoding is already supported by LAV Video, PotPlayer and other DXVA decoders and AMF has the relevant parameter included in the API, but when rigaya actually tries to use it, he gets a message "not supported".
That's why he added SW decoding of 10bit HEVC (--avsw)

* REF and pre-analysis don't look like working as expected.

For all of the above, we really don't know if it's a driver/ API limitation (bug?) or hardware limitation.

Documentation and runtime info, like Nvidia and Intel don't say everything.

He will concentrate on bug fixes from now on for VCEEnc, because he thinks he has implemented most of the really useful/necessary parameters of AMF API.

The last somewhat major bug (?) of VCEEnc is the high cpu usage (like working in copy-back mode) which VCEEncC v2.00 didn't have, that he is trying to fix.

I will probably try some VBAQ VBR HEVC encodings and upload them to check.

CruNcher

2nd February 2017, 04:36

Which is why I often recommend the usage of CQP and standard VBR in conjunction with lookahead and adaptive quantization. :D

Back in the day where lookahead, adaptive GOP and AQ didn't exist yet, VBR2Pass is useful for optimizing (allocating proper QP value / distributing bit) every frame due to fixed GOP nature. Imagine this, there is only one I-frame for every 300 frames (for 30fps video). If there is screen transition within these fixed GOP, then video quality will suffer because there is no 'proper' high quality I-frame for subsequent P-frame to refer with.

With existence of Lookahead and AQ, there is no reason to use two pass for transcoding unless one requires extremely low latency encoding such as video conferencing.

I prefer to use Nvenc Unrestrainted VBR Constant Quality mode + AQ + Lookahead than CQP.

*Note for newcomers who read this: Hardware based encoder "two pass" works differently than software based encoder.

edit:
VCEEnc 3.03 is out....
Google Translation of changelog
[Common]
· Allow VBAQ in vbr / cbr mode on HEVC.

[VCEEnc.auo]
· Also be able to use HW resizer from VCEEnc.auo.
· Fixed that Level is not saved correctly when HEVC encoder is on.
- Fixed the problem that - vbr does not work properly with HEVC encod of VCEEncC.

[VCEEncC]
· Avsw reader supports YUV 420 10 bit reading (encoding is 8 bit).
· When using avsw reader, colors are shifted depending on resolution.

Hmm interesting is how AQ lowers the overall Decoding complexity but unrestrained seems not a good idea i had to much frame drops on the Hardware side that way and a again the visual win wasn't worth it.

JohnLai

2nd February 2017, 06:17

Hmm interesting is how AQ lowers the overall Decoding complexity but unrestrained seems not a good idea i had to much frame drops on the Hardware side that way and a again the visual win wasn't worth it.

LOL, I just realize I used the wrong name. It should be "Unconstrained".

UVBR-CQ encoding speed is around 80-90% of CQP.
targetQuality only works if initialqp is set to 1, maximum maxBitRate = averageBitRate for chosen profile level.

Most of the time, I stick with CQ value of 26.
Another quirk of UVBR-CQ mode is dark scene kinda look better compared to CQP. Hmmm.....

*Note: lookahead + AQ are enabled.

NikosD

2nd February 2017, 12:07

NikosD how does the Samsung 10->8bit retranscode comes forward for you in direct compare vs the the Nvidia one i posted at the 2x reduction target for it any watchable results yet ? :)

OK, finally after VCEEnc v3.03 I managed to demux/ decode 10bit HEVC streams using --avsw (SW decoding) but unfortunately even the latest version doesn't allow me to encode HEVC in UHD/4K resolution.

Rigaya hasn't managed to reproduce that bug yet.

So, I tried 4K H.264 encoding by enabling almost everything.

VCEEncC v3.03
VBR 22500, max bitrate/ vbv buffer 50000
VBAQ
Motion estimation/ pre-analysis full
Full range
Level 5.2
Quality best/slow

The final encoding is a little less than 300MB and it's here:
https://www.sendspace.com/file/ytj5jt

And this one is simpler, with most parameters at default:

VCEEncC v3.03
VBR 22500, max bitrate/ vbv buffer 50000
Quality best/slow

So, vbaq, pre-analysis, full range, are all set to NONE by default and motion estimation is quarter pel by default.

The final encoding with default settings is here:
https://www.sendspace.com/file/0l6pyj

JohnLai

3rd February 2017, 06:39

300Mb!? X__X
This gonna take a while......

Frame 44
Samsung_Journey_H264.mkv
Motion vector : https://k60.imgup.net/MVstandard8c4b.PNG
QP value on top left : https://i76.imgup.net/QP17032.PNG
QP value on bottom left : https://h86.imgup.net/QPA8f9a.PNG

Samsung_Journey_H264_Default.mkv
Motion vector : https://r85.imgup.net/MVdefaultfa42.PNG
QP value on top left : https://x82.imgup.net/QPDefault1dbb.PNG
QP value on bottom left : https://h11.imgup.net/QPD8aa0.PNG

No much variation in QP.
Strange thing is Samsung_Journey_H264_Default.mkv seem to make use of more motion vectors than Samsung_Journey_H264.mkv

EDIT: Compared to cruncher gtx 970 nvenc HEVC:
Motion vector : https://w53.imgup.net/NVENCCRUNCd4c4.PNG --> Using different analyser because usual analyser ended with with error after trying to rescale fit to screen, see for yourself https://b16.imgup.net/NVENCCRUNC3936.PNG
QP value on everything : https://y85.imgup.net/NVENCCRUNC3d52.PNG Seem like cruncher is using CQP value of I:21 and P:24 without AQ XD

NikosD

3rd February 2017, 17:34

300Mb!? X__X
This gonna take a while......

VCEEnc v3.04 is out fixing most, if not all, bugs mentioned by me to rigaya.

Later today or early tomorrow, I'm going to upload 4K HEVC encodings of that Samsung Journey 10bit HEVC 4K sample.

It seems that the most important parameter is -u quality (slow, fast, balanced) which unfortunately still doesn't work for HEVC (fast is a little different, but slow and balanced are exactly the same)

CruNcher

3rd February 2017, 23:33

Yeah that version was still fixed Q :)

Performance retranscode 33 fps input 60 playback 60

http://i1.sendpic.org/t/1j/1jwQfj1zCtUxlpmyPdtuTT4dTvz.jpg (http://sendpic.org/view/1/i/v5qTm7CNkmiLr1H284kAUBLzJdy.png)

VBR2

encoded 10783 frames, 33.50 fps, 24717.28 kbps, 530.07 MB
encode time 0:05:21 / CPU Usage: 52.76%

frame type IDR 20
frame type I 20, avgQP 20.65, total size 6.27 MB
frame type P 10763, avgQP 23.74, total size 523.80 MB

VBR

encoded 10783 frames, 38.95 fps, 24799.58 kbps, 531.83 MB
encode time 0:04:36 / CPU Usage: 56.79%

frame type IDR 20
frame type I 20, avgQP 20.55, total size 6.34 MB
frame type P 10763, avgQP 23.76, total size 525.50 MB

CQP

encoded 10783 frames, 38.91 fps, 24807.54 kbps, 532.01 MB
encode time 0:04:37 / CPU Usage: 56.86%

frame type IDR 20
frame type I 20, avgQP 21.00, total size 6.08 MB
frame type P 10763, avgQP 24.00, total size 525.93 MB

CBR

encoded 10783 frames, 38.85 fps, 23587.17 kbps, 505.83 MB
encode time 0:04:37 / CPU Usage: 57.07%

frame type IDR 20
frame type I 20, avgQP 20.45, total size 6.27 MB
frame type P 10763, avgQP 23.82, total size 499.57 MB

CBRHQ

encoded 10783 frames, 33.42 fps, 24915.55 kbps, 534.32 MB
encode time 0:05:22 / CPU Usage: 52.67%

frame type IDR 20
frame type I 20, avgQP 20.80, total size 6.18 MB
frame type P 10763, avgQP 23.53, total size 528.14 MB

JohnLai

4th February 2017, 13:00

https://www.techpowerup.com/230360/first-intel-processor-with-amd-radeon-graphics-within-2017
Hopefully Intel won't replace its QSV with AMD VCE........

NikosD

4th February 2017, 15:07

A bug in VCEEnc with high bitrate CBR, VBR doesn't allow me to encode the sample in HEVC.

Waiting for VCEEnc v3.05 (?)...

CruNcher

4th February 2017, 23:57

CBRHQ

encoded 3780 frames, 26.69 fps, 25070.41 kbps, 376.94 MB
encode time 0:02:21 / CPU Usage: 58.62%

frame type IDR 20
frame type I 20, avgQP 20.45, total size 23.50 MB
frame type P 3760, avgQP 22.21, total size 353.45 MB

SSIM
Y:0.985357 (18.343621)
U:0.988216 (19.287139)
V:0.988070 (19.233419)
All:0.986285 (18.628190)

VBRHQ

encoded 3780 frames, 26.54 fps, 25181.62 kbps, 378.62 MB
encode time 0:02:22 / CPU Usage: 57.78%

frame type IDR 20
frame type I 20, avgQP 19.35, total size 26.09 MB
frame type P 3760, avgQP 21.71, total size 352.53 MB

SSIM
Y:0.988181 (19.274309)
U:0.988890 (19.543008)
V:0.988778 (19.499126)
All:0.988399 (19.355007)

Highest VBRHQ result so far with a slight overflow

VBRHQ

Using Hardware Decoding over Cuvid

encoded 3780 frames, 38.28 fps, 25181.62 kbps, 378.62 MB
encode time 0:01:38 / CPU Usage: 20.83%

frame type IDR 20
frame type I 20, avgQP 19.35, total size 26.09 MB
frame type P 3760, avgQP 21.71, total size 352.53 MB

VBR

Using Hardware Decoding over Cuvid

encoded 3780 frames, 47.82 fps, 25383.44 kbps, 381.65 MB
encode time 0:01:19 / CPU Usage: 24.99%

frame type IDR 20
frame type I 20, avgQP 19.95, total size 25.23 MB
frame type P 3760, avgQP 21.64, total size 356.42 MB

SSIM
Y:0.988352 (19.337383)
U:0.989007 (19.588642)
V:0.988774 (19.497570)
All:0.988531 (19.404814)

1 click H.264->Nvidia H.265 1080 p 2nd Generation transcoding test hitting the exact same bitrate 8000 kbps

SSIM
Y:0.895877 (9.824539)
U:0.985295 (18.325234)
V:0.979849 (16.956954)
All:0.924775 (11.236397)

and here manually overriding Rigayas RC Decisions getting a slightly lower bitrate result -500 ~7500 kbps

SSIM
Y:0.895970 (9.828434)
U:0.985184 (18.292787)
V:0.979744 (16.934516)
All:0.924802 (11.237925)

Reducing in bigger steps now towards the 50% Goal (not sure which Encoder was used for this H.264 input, random sample decision)

encoded 51445 frames, 162.21 fps, 5617.28 kbps, 1377.97 MB
encode time 0:05:17 / CPU Usage: 24.62%

frame type IDR 206
frame type I 206, avgQP 23.39, total size 14.26 MB
frame type P 51239, avgQP 23.70, total size 1363.70 MB

SSIM
Y:0.895992 (9.829321)
U:0.985270 (18.317935)
V:0.979906 (16.969430)
All:0.924857 (11.241128)

You can see that the SSIM Metric is rising now ;)

NikosD

5th February 2017, 21:24

New NVenc v3.06 adds HW decoding of HEVC/VP8/VP9

CruNcher

5th February 2017, 22:13

The H.264 High Predictive one seems damaged get strange artifacts on several blocks in the lower water parts with it compared to --avsw on my parkjoy test sample i created back then with x264 lossless but i guess it's a Generic Nvidia Decoder problem.

nevcairiel

5th February 2017, 22:21

The H.264 High Predictive one seems damaged get strange artifacts on several blocks in the lower water parts with it compared to --avsw on my parkjoy test sample i created back then with x264 lossless but i guess it's a Generic Nvidia Decoder problem.

x264 used to produce "wrong" lossless files not matching the specification properly, ffmpeg/avcodec compensates for that if it detects that a file was encoded with such a x264 version, but any other decoders will likely just show corrupted output.

It was fixed a long time ago, but who knows when your file was encoded.

CruNcher

5th February 2017, 22:37

Ahh that would explain it btw would you support the predictive files on lav cuvid please it falls back to avcodec not sure if you made that maybe on purpose for those wrong files though now not pushing them through to the cuvid decoder because of that :)

General
Complete name : E:\parkjoy.mp4
Format : MPEG-4
Format profile : JVT
Codec ID : avc1 (isom/avc1)
File size : 785 MiB
Duration : 10 s 0 ms
Overall bit rate : 658 Mb/s
Encoded date : UTC 2010-05-09 14:55:11
Tagged date : UTC 2010-05-09 14:55:11

Video
ID : 1
Format : AVC
Format/Info : Advanced Video Codec
Format profile : High 4:4:4 Predictive@L4.2
Format settings, CABAC : Yes
Format settings, ReFrames : 1 frame
Format settings, GOP : M=1, N=48
Codec ID : avc1
Codec ID/Info : Advanced Video Coding
Duration : 10 s 0 ms
Bit rate : 658 Mb/s
Maximum bit rate : 699 Mb/s
Width : 1 920 pixels
Height : 1 080 pixels
Display aspect ratio : 16:9
Frame rate mode : Constant
Frame rate : 50.000 FPS
Color space : YUV
Chroma subsampling : 4:2:0
Bit depth : 8 bits
Scan type : Progressive
Bits/(Pixel*Frame) : 6.348
Stream size : 785 MiB (100%)
Writing library : x264 core 94 r1583 7608d73
Encoding settings : cabac=1 / ref=1 / deblock=1:-1:-1 / analyse=0x3:0x13 / me=dia / subme=2 / psy=0 / mixed_ref=0 / me_range=16 / chroma_me=1 / trellis=0 / 8x8dct=1 / cqm=0 / deadzone=21,11 / fast_pskip=0 / chroma_qp_offset=0 / threads=3 / sliced_threads=0 / nr=0 / decimate=1 / interlaced=0 / constrained_intra=0 / bframes=0 / weightp=2 / keyint=48 / keyint_min=25 / scenecut=0 / intra_refresh=0 / rc=cqp / mbtree=0 / qp=0
Encoded date : UTC 2010-05-09 14:55:11
Tagged date : UTC 2010-05-09 14:57:12

nevcairiel

6th February 2017, 07:11

Ahh that would explain it btw would you support the predictive files on lav cuvid please it falls back to avcodec not sure if you made that maybe on purpose for those wrong files though now not pushing them through to the cuvid decoder because of that :)

CUVID only supports 4:2:0 decoding, not 4:4:4 (or more precisely, it cannot output anything but 4:2:0, so using it for 4:4:4 decoding would obliterate the chroma quality)

GrandPa

6th February 2017, 09:49

HD 630 1150 Mhz
b-frames 4, reference frames 2, Target usage 7
HEVC
CQP= 370 fps
VBR= 290 fps

@YUPS: Would it be possible to post the result of QSVEncC64 --check -features with your Kaby Lake CPU to see the difference with my 'old' Skylake 6700K?

(I'm new to this forum, but already did tons of H.264 and H.265 encoding with StaxRip and Hybrid. Now I'm interested to see whether H.265 HW encoding quality and efficiency with HD 630 could catch up a bit. While I found H.264 HW encoding with HD 530 quite acceptable in quality, there is currently no real alternative to X.265 SW encoding for archiving purposes)

Many thanks for any help

NikosD

7th February 2017, 13:37

Really?

The tree? The leaf?
Facial texture?

Is there any website that can mouse over to compare two different image?
Other than the dreaded http://screenshotcomparison.com/ , can't even upload to this site.

I found out a working site (I think) with frame comparison:
https://juxtapose.knightlab.com/

CruNcher

8th February 2017, 02:03

CUVID only supports 4:2:0 decoding, not 4:4:4 (or more precisely, it cannot output anything but 4:2:0, so using it for 4:4:4 decoding would obliterate the chroma quality)

Yeah kinda strange you can encode it but not decode it natively sounds after bandwidth saving though i wonder what is the reason that 4:2:2 isn't supported at all neither Encoding nor Decoding.

http://i1.sendpic.org/i/hN/hNHDvUAldhhJZF66LR6YWJ625Qu.png

And i wonder what they mean with the 3 asterisk at all, they appear nowhere in the Diagram makes no sense

you have 1,2 but 3 is like it's not specific but meant for everything based on the codecs or do they meant resolution and forgot the asterisks there ?

also color lossless seems confusing since when lossless is a color format ?

Im not quiet there, more exact data what they benched and how would be also nice

encoded 51445 frames, 162.21 fps, 5617.28 kbps, 1377.97 MB
encode time 0:05:17 / CPU Usage: 24.62%

frame type IDR 206
frame type I 206, avgQP 23.39, total size 14.26 MB
frame type P 51239, avgQP 23.70, total size 1363.70 MB

http://i1.sendpic.org/i/gb/gbLfj8UzAymwO4owHqXHWrggGMw.png

Dual Pass = VBR2(HQ)

i wonder how much SAO improves the result over Maxwell on the SSIM side for Pascal :)

easyfab

11th February 2017, 13:06

@YUPS: Would it be possible to post the result of QSVEncC64 --check -features with your Kaby Lake CPU to see the difference with my 'old' Skylake 6700K?

here Check-features : qsvencc 2.62 show api 1.19 but it's 1.22 so I d'ont know if all features are correct ?
10 bit Depth has an X but I can encode in 10bit with --profile main10.
Does qsvencc need an update to show new api features?

GPU Info Intel HD Graphics 615 (24EU) 300-900MHz [4W] (21.20.16.4589)
Media SDK QuickSyncVideo (hardware encoder) PG, 1st GPU, API v1.22

QSVEncC (x64) 2.62 (r1192) by rigaya, Jan 8 2017 23:11:24 (VC 1900/Win/avx2)
reader: raw, avi, avs, vpy, avqsv [H.264/AVC, HEVC, MPEG2, VC-1, VP8, VP9]
Environment Info
OS : Windows 10 (x64)
CPU: Intel Core m3-7Y30 @ 1.00GHz [TB: 1.61GHz] (2C/4T) <Skylake>
RAM: Used 2127 MB, Total 3976 MB
GPU: Intel HD Graphics 615 (24EU) 300-900MHz [4W] (21.20.16.4589)

Media SDK Version: Hardware API v1.19

Supported Enc features:
Codec: H.264/AVC
CBR VBR AVBR QVBR CQP VQP LA LAHRD ICQ LAICQ VCM
RC mode o o o o o o o o o o o
10bit depth x x x x x x x x x x x
Fixed Func o o o o o o x x x x o
Interlace o o o o o o o o o o x
SceneChange o o o o o o x x o x o
VUI info o o o o o o o o o o o
Trellis o o o o o o o o o o o
Adaptive_I x x x x x x x x x x x
Adaptive_B x x x x x x x x x x x
WeightP x x x x x x x x x x x
WeightB x x x x x x x x x x x
FadeDetect x x x x x x x x x x x
B_Pyramid o o o o o x o x o o o
+Scenechange x x x x x x x x x x x
+ManyBframes o o o o o x x x o x o
PyramQPOffset x x x x x x x x x x x
Ext_BRC o o o o x x x x o x o
MBBRC o o o o x x x x o x o
LA Quality x x x x x x o o x o x
QP Min/Max o o o o o o o o o o o
IntraRefresh x x x x x x x x x x x
No Debloc x x x x x x x x x x x
No GPB x x x x x x x x x x x
Windowed BRC x x x x x x o o x x x
PerMBQP(CQP) x x x x o o x x x x x
DirectBiasAdj x x x x x x x x x x x
MVCostScaling x x x x x x x x x x x

Codec: HEVC
CBR VBR AVBR QVBR CQP VQP LA LAHRD ICQ LAICQ VCM
RC mode o o x x o o x x o x o
10bit depth x x x x x x x x x x x
Fixed Func x x x x x x x x x x x
Interlace x x x x x x x x x x x
SceneChange o o x x o o x x o x o
VUI info o o x x o o x x o x o
Trellis x x x x x x x x x x x
Adaptive_I x x x x x x x x x x x
Adaptive_B x x x x x x x x x x x
WeightP x x x x x x x x x x x
WeightB x x x x x x x x x x x
FadeDetect x x x x x x x x x x x
B_Pyramid x x x x x x x x x x x
+Scenechange x x x x x x x x x x x
+ManyBframes x x x x x x x x x x x
PyramQPOffset x x x x o o x x x x x
Ext_BRC o o x x x x x x o x o
MBBRC o o x x x x x x o x o
LA Quality x x x x x x x x x x x
QP Min/Max x x x x x x x x x x x
IntraRefresh o o x x o o x x o x o
No Debloc o o x x o o x x o x o
No GPB x x x x x x x x x x x
Windowed BRC x x x x x x x x x x x
PerMBQP(CQP) o o x x x x x x o x o
DirectBiasAdj x x x x x x x x x x x
MVCostScaling x x x x x x x x x x x

Codec: MPEG2
CBR VBR AVBR QVBR CQP VQP LA LAHRD ICQ LAICQ VCM
RC mode o o o x o o x x x x x
10bit depth x x x x x x x x x x x
Fixed Func o o o x o o x x x x x
Interlace o o o x o o x x x x x
SceneChange o o o x o o x x x x x
VUI info o o o x o o x x x x x
Trellis o o o x o o x x x x x
Adaptive_I o o o x o o x x x x x
Adaptive_B o o o x o o x x x x x
WeightP o o o x o o x x x x x
WeightB o o o x o o x x x x x
FadeDetect x x x x x x x x x x x
B_Pyramid o o o x o x x x x x x
+Scenechange x x x x x x x x x x x
+ManyBframes o o o x o x x x x x x
PyramQPOffset x x x x x x x x x x x
Ext_BRC o o o x x x x x x x x
MBBRC o o o x x x x x x x x
LA Quality x x x x x x x x x x x
QP Min/Max o o o x o o x x x x x
IntraRefresh o o o x o o x x x x x
No Debloc o o o x o o x x x x x
No GPB x x x x x x x x x x x
Windowed BRC o o o x o o x x x x x
PerMBQP(CQP) x x x x x x x x x x x
DirectBiasAdj o o o x o o x x x x x
MVCostScaling o o o x o o x x x x x

Supported Vpp features:

Resize o
Deinterlace o
Scaling Quality o
Denoise o
Rotate o
Mirror x
Detail Enhancement o
Proc Amp. o
Image Stabilization x
Video Signal Info o
FPS Conversion o
FPS Conversion (Adv.) o

CruNcher

11th February 2017, 19:03

Could you do the same 10->8 bit Samsung Journey retranscode please and post it's result trying to hit the same bitrate target :)

http://demo-uhd3d.com/fiche.php?cat=uhd&id=91

Using the VBR RC

https://forum.doom9.org/showpost.php?p=1794206&postcount=108
https://forum.doom9.org/showpost.php?p=1795597&postcount=176

Though im not quiet sure if the target was a little to heavy with 300 mb for Nvidia and RC overall so restricted it seems not really good balanced i would have gone with slightly higher end result for visual perception/performance now without SAO but to late i guess to change to a closer to 25 mbps hit ;)

as we have those results now for Nvidia (HEVC, no SAO) and AMDs (H.264) current SDK NVENCC status ready only Intel results missing either H.264/HEVC or both :)

also going slightly higher would need another hosting so in some sense it's even a realistic target not going over that 300 mb ;)

easyfab

11th February 2017, 19:45

Will do when I downloaded the file, http://demo-uhd3d.com is slow.

What do you want exactly :
HEVC 8 bit @ VBR 22500K with slower/best setting ( or fast/balanced ) ?
Also AVC and HEVC 10bit ?

CruNcher

11th February 2017, 19:55

Of course a 10 bit HEVC one on one retranscode version would be nice without the 8 bit conversion step also :)

as fast as possible i dunno the current speed data on your system it would be able to achieve at those settings but the target would be 1x so 60 FPS :)

and if it should be going slower it would be around 40 and 30 fps

benchmark data of --avsw instead of the Quicksync Hardware Decoding would be also interesting :)

Reaching 25/30/50/60 is usable in some way ;)

easyfab

11th February 2017, 20:08

I don't believe that I will be that fast. I only have a small 4W Intel CPU (m3-7Y30) not a 7700K. @2K It's only 50fps for HEVC vs 250fps for AVC so for 4K ....
Download ETA 13 min

CruNcher

11th February 2017, 20:22

Even more exciting you have much lower overall latency in that notebook/mobile design and your very power limited on the ~10W :)

only 50 fps what should i say pushing much more W out to reach that on the top ~40 FPS result above (granted lots of that coming from the Decoding + conversion) ;)

though of course a compare of this would be not very nice and in Nvidias case a compare with a 1050 Notebook more fair then but overall the result of the encoder output counts first ;)

Also you dont have to forget that your 615 is also slighty restricted on it's ddr3 interface

Having a m3-7Y30 + Nvidia 1050 would be a really nice combination :D

though personally my decision would have gone towards

http://ark.intel.com/products/95442/Intel-Core-i3-7100U-Processor-3M-Cache-2_40-GHz-
http://ark.intel.com/products/95443/Intel-Core-i5-7200U-Processor-3M-Cache-up-to-3_10-GHz

http://www.cpubenchmark.net/compare.php?cmp%5B%5D=2865&cmp%5B%5D=2879&cmp%5B%5D=2864

What is your notebooks overall power out decoding/encoding that target and completing the task (without display) ?

easyfab

11th February 2017, 21:19

QSVENCC 2.62 HEVC 10 bit for Samsung_Journey

QSVEncC (x64) 2.62 (r1192) by rigaya, Jan 8 2017 23:11:24 (VC 1900/Win/avx2)
OS Windows 10 (x64)
CPU Info Intel Core m3-7Y30 @ 1.00GHz [TB: 1.61GHz] (2C/4T) <Skylake>
GPU Info Intel HD Graphics 615 (24EU) 300-900MHz [4W] (21.20.16.4589)
Media SDK QuickSyncVideo (hardware encoder) PG, 1st GPU, API v1.22
Async Depth 5 frames
Buffer Memory d3d9, 1 input buffer, 22 work buffer
Input Info avqsv video: HEVC, 3840x2160, 60000/1001 fps
Output HEVC main10 @ Level auto
3840x2160p 1:1 59.940fps (60000/1001fps)
avwriter: hevc => matroska
Target usage 4 - balanced
Encode Mode Bitrate Mode - VBR
Bitrate 24000 kbps
Max Bitrate 30000 kbps
QP Limit min: none, max: none
Trellis Auto
Ref frames 4 frames
Bframes 3 frames, B-pyramid: off
Max GOP Length 600 frames
Scene Change off
Ext. Features PerMBRC

encoded 6642 frames, 12.47 fps, 22039.38 kbps, 291.13 MB
encode time 0:08:52, CPULoad: 2.12%
frame type IDR 1
frame type I 12, total size 3.91 MB
frame type P 1661, total size 211.23 MB
frame type B 4969, total size 76.00 MB

As slow as expected. and a little smaller ( 291 MB)

The quality doesn't sound to bad, but I don't know if it's as good as AMD/Nvidia quality.
I can try with best preset if needed.

https://www.sendspace.com/file/qbod0d

NikosD

11th February 2017, 21:33

QSVEnc needs an update because it reports your CPU as Skylake, but the API has the correct version v1.22

Your CPU usage is extremely low, ~2% for 1GHz 2C/4T is very impressive.

The bottleneck could be the GPU though, as it has 24EUs but only at 0.9GHz

Someone with a Pascal GPU could try to transcode that clip to 10bit HEVC in balanced speed/quality using VBR and latest NVEnc v3.06 with HW HEVC decoding to see the difference.

easyfab

11th February 2017, 21:50

QSVENCC 2.62 AVC (Kaby lake api 1.22) for Samsung_Journey

QSVEncC (x64) 2.62 (r1192) by rigaya, Jan 8 2017 23:11:24 (VC 1900/Win/avx2)
OS Windows 10 (x64)
CPU Info Intel Core m3-7Y30 @ 1.00GHz [TB: 1.61GHz] (2C/4T) <Skylake>
GPU Info Intel HD Graphics 615 (24EU) 300-900MHz [4W] (21.20.16.4589)
Media SDK QuickSyncVideo (hardware encoder) PG, 1st GPU, API v1.22
Async Depth 6 frames
Buffer Memory d3d9, 1 input buffer, 32 work buffer
Input Info avqsv video: HEVC, 3840x2160, 60000/1001 fps
VPP Enabled ColorFmtConvertion: nv12(10bit) -> nv12
Output H.264/AVC High @ Level 5.2
3840x2160p 1:1 59.940fps (60000/1001fps)
avwriter: h264 => matroska
Target usage 1 - best
Encode Mode Bitrate Mode - VBR
Bitrate 25000 kbps
Max Bitrate 37500 kbps
QP Limit min: none, max: none
Trellis Auto
Ref frames 3 frames
Bframes 3 frames, B-pyramid: on
Max GOP Length 600 frames
Scene Change off

encoded 6642 frames, 33.74 fps, 21907.80 kbps, 289.39 MB
encode time 0:03:17, CPULoad: 27.08%
frame type IDR 12
frame type I 12, total size 3.14 MB
frame type P 1661, total size 228.03 MB
frame type B 4969, total size 58.23 MB

With best preset.
~3x faster than HEVC balanced preset.
CPULoad: 27.08% Must be the 10bit -> 8bit conversion.

https://www.sendspace.com/file/87hvix

easyfab

11th February 2017, 22:23

though personally my decision would have gone towards

http://ark.intel.com/products/95442/Intel-Core-i3-7100U-Processor-3M-Cache-2_40-GHz-
http://ark.intel.com/products/95443/Intel-Core-i5-7200U-Processor-3M-Cache-up-to-3_10-GHz

[url]http://www.cpubenchmark.net/compare.php?cmp%5B%5D=2865&cmp%5B%5D=2879&cmp%5B?

Yep but I choose a fanless laptop :) and I dont find 7100u or 7200u without fan