Log in

View Full Version : What is current status for hardware H.265 encoding.


Pages : 1 [2] 3 4 5 6 7 8 9 10 11 12 13 14

JohnLai
10th September 2016, 15:08
I have been using NVEncC (avcuvid native) in Staxrip which is the fastest alongside QSVEnC (Intel). I don't think there is something wrong.

Hmm.........in that case, I have no further comment. :cool:

aegisofrime
11th September 2016, 04:09
I have a GTX 1070 as well and am wondering how best to tune it for maximum quality while maintaining decent speed.

JohnLai
11th September 2016, 04:18
I have a GTX 1070 as well and am wondering how best to tune it for maximum quality while maintaining decent speed.

Use staxrip default CQP rate control plus --lookahead 32 and 10bit HEVC encoding...

Or.... a variation of VBR Constant Quality ---> select normal VBR (not vbr2 as nvidia said it is designed for low latency 2 pass), set maximum bitrate 17500kbps (don't worry, it won't actually use 17500kbps) --qp-init 1 --lookahead 32 + 10bit encoding --aq --vbr-quality 26 (26 or 25) should be sufficient

Note: In order to use VBRCQ (known as --vbr-quality) , must use VBR, max bitrate 17500 and --qp-init 1.
-lookahead 32 and --aq are optional, but lookahead and adaptive quantization improve quality,so why not?

**VBR2 doesn't work in conjunction with --vbr-quality, it will maxed out the 17500 bitrate instead of readjusting the bitrate/quality based on --vbr-quality)

RainyDog
12th September 2016, 12:00
Use staxrip default CQP rate control plus --lookahead 32 and 10bit HEVC encoding...

Or.... a variation of VBR Constant Quality ---> select normal VBR (not vbr2 as nvidia said it is designed for low latency 2 pass), set maximum bitrate 17500kbps (don't worry, it won't actually use 17500kbps) --qp-init 1 --lookahead 32 + 10bit encoding --aq --vbr-quality 26 (26 or 25) should be sufficient

Note: In order to use VBRCQ (known as --vbr-quality) , must use VBR, max bitrate 17500 and --qp-init 1.
-lookahead 32 and --aq are optional, but lookahead and adaptive quantization improve quality,so why not?

**VBR2 doesn't work in conjunction with --vbr-quality, it will maxed out the 17500 bitrate instead of readjusting the bitrate/quality based on --vbr-quality)

Hi John, do you know what I need to do in order to get the new Pascal encoding options to work through StaxRip on my GTX 1060 please?

If I try adding --main 10 or --lookahead 32 in the custom command lines box then it just comes up with an error and won't start encoding.

I expect it's because the latest StaxRip test build doesn't contain the latest rigaya CLI?

Thanks.

JohnLai
12th September 2016, 16:37
Hi John, do you know what I need to do in order to get the new Pascal encoding options to work through StaxRip on my GTX 1060 please?

If I try adding --main 10 or --lookahead 32 in the custom command lines box then it just comes up with an error and won't start encoding.

I expect it's because the latest StaxRip test build doesn't contain the latest rigaya CLI?

Thanks.

Yes. Just replace the Nvencc with the latest build from rigaya http://rigaya34589.blog135.fc2.com/blog-entry-814.html

Don't forget to update your driver too. Minimum driver version is 368.69

Edit: By the way....why you add "--main 10"? I thought rigaya nvencc command should be "--profile main10"?

aegisofrime
12th September 2016, 17:05
Yes. Just replace the Nvencc with the latest build from rigaya http://rigaya34589.blog135.fc2.com/blog-entry-814.html

Don't forget to update your driver too. Minimum driver version is 368.69

Edit: By the way....why you add "--main 10"? I thought rigaya nvencc command should be "--profile main10"?

Thanks for your reply. I actually tried it, speeds are great but unsurprisingly quality still falls far below x265. Well, can't have the best of both worlds I guess.

Anyway, the switch for 10-bit should be --output-depth 10.

JohnLai
12th September 2016, 17:33
Thanks for your reply. I actually tried it, speeds are great but unsurprisingly quality still falls far below x265. Well, can't have the best of both worlds I guess.

Anyway, the switch for 10-bit should be --output-depth 10.

Can't blame me for not knowing :D After all, I don't own pascal gpu. (Only maxwell gpu for me)
Can't expect much from hardware encoders.
For example, x265 crf 20 at very fast preset for live action film normally encoded with average QPI 18, QPP 20, QPB 23 or so depending on the content.

Now, since nvenc doesn't support B-frame, you might wanna adjust Nvencc CQP to QPI 18 and QPP 20 respectively. Unfortunately, the file size is going to be quite big.

General compression ratio for each frame is:
I : 100% normally non-compressed or slightly compressed.
P : 50% of I frame size
B : 25% of I frame size
Based on above.....let say we have GOP (group of picture) of 240 with IPBBB type (3 B-frames) where only I frame being inserted one time for each 240)
We got:
1 I -frame = 1Mb
59.75 P-frames = 0.5Mb X 59.75 = 29.875Mb
179.25 B-frames = 0.25Mb X 179.25 = 44.8125Mb
Total sizes = 1Mb + 29.875Mb + 44.8125Mb = 75.6875Mb

Without B-frame:
1 I-frame = 1Mb
239 P-frames = 0.5Mb x 239 = 119.5Mb

*Note: there is no fractional kind of frame, above is just a rough calculation

[(119.5 / 75.6875 ) -1] X 100% = 57.886% larger size......


Edit: Only Intel QSV HEVC kinda follow the rule of 100%,50% and 25% by using B-frame as P-frame, my finding here http://forum.doom9.org/showpost.php?p=1775316&postcount=1474

trip_let
12th September 2016, 20:24
Quick confirmation: only Pascal (Nvidia 10 series) and Kaby Lake (Intel 7 series Core) so far has any kind of total or hybrid hardware 10 bit HEVC encode? For consumer stuff, that is.

NikosD
12th September 2016, 20:26
Polaris has also HEVC 10 bit pure HW encoding

Roph
12th September 2016, 20:35
Polaris has also HEVC 10 bit pure HW encoding

You sure about that? I've only seen 10-bit HEVC decoding through UVD. No mention of encoding with VCE.

Also, this AMD employee seems certain that Polaris actually has a regression in that it does not support B-frames when encoding: https://github.com/GPUOpen-LibrariesAndSDKs/AMF/issues/8

I'm most interested in the 2-pass encoding on Polaris, though as of yet it's still AWOL in their SDK.

JohnLai
13th September 2016, 03:46
Quick confirmation: only Pascal (Nvidia 10 series) and Kaby Lake (Intel 7 series Core) so far has any kind of total or hybrid hardware 10 bit HEVC encode? For consumer stuff, that is.

Pascal supports 10bit hevc encoding in fully hardware mode (The only other functionality being offloaded to CUDA cores are lookahead, two pass rate control and adaptive quantization stuff --> doesn't use much of CUDA cores, around 4-10% for 4k 8bit hevc encode with CQP mode for lookahead using GTX970 )

Upcoming Kaby Lake claims to support 10bit HEVC encode, since product isn't out yet.....nobody can be sure....

You sure about that? I've only seen 10-bit HEVC decoding through UVD. No mention of encoding with VCE.

Also, this AMD employee seems certain that Polaris actually has a regression in that it does not support B-frames when encoding: https://github.com/GPUOpen-LibrariesAndSDKs/AMF/issues/8

I'm most interested in the 2-pass encoding on Polaris, though as of yet it's still AWOL in their SDK.

https://www.youtube.com/watch?v=hvD37UUcdIo&feature=youtu.be&t=154

Polaris should support 10bit HEVC encoding...... Unless if my hearing got problem....

About AMD B-frame....it is kinda weird for Polaris series not to support B-Frame for H264...so...since you have polaris....why not try an h264 encoding with B-frame command specified by using Rigaya VCEencc, then use the bitstream analyzer to find out?

Roph
13th September 2016, 06:44
https://www.youtube.com/watch?v=hvD37UUcdIo&feature=youtu.be&t=154

Polaris should support 10bit HEVC encoding...... Unless if my hearing got problem....

About AMD B-frame....it is kinda weird for Polaris series not to support B-Frame for H264...so...since you have polaris....why not try an h264 encoding with B-frame command specified by using Rigaya VCEencc, then use the bitstream analyzer to find out?

Yeah I've interacted with Robert a few times on reddit, he's an AMD Marketing guy. He has mentioned VCE and its capabilities a couple times in his comments, even mentioned he made a few of AMD's presentation slides.

Unfortunatly a few developers of third party software - and an AMD developer who actually works on their VCE Media SDK (See that github issue link) all say that Polaris lacks B-frames support.

Querying the GPU's capabilities via the SDK shows B-frames are not supported. Asking it to encode them anyway results in errors thrown.

Funnily enough I read up on VCEenc recently, it doesn't seem to have been updated yet to support AMD's new Media SDK; the author notes that AMD's SDK hasn't been updated since Jan 2015 in their blog post. Nevertheless I tried forcing b-frames with VCEenc via staxrip, and nope: (Staxrip's VCEEnc is outdated, I dropped in the latest 2.0 binary)

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Encoding using VCEEncC 1.03v2 x64
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

F:\tmp\Staxrip\Apps\VCEEncC\VCEEncC64.exe --quality slow --gop-len 120 --bframes 3 --max-bitrate 9000 --vbv-bufsize 9000 --vbr 7090 -i "D:\Work\Encode\VCE\HEVC Test\Media\Source (1080p)_temp\Source (1080p)_new.avs" -o "D:\Work\Encode\VCE\HEVC Test\Media\Source (1080p)_temp\Source (1080p)_new_out.h264"

VCEEnc 2.00 (x64) / Windows 10 (x64)
CPU: AMD FX(tm)-8320 Eight-Core Processor (4C/8T)
GPU: Radeon RX 470 Graphics [Ellesmere 2000MHz (2117.13 (VM))]
Input Info: Avisynth 2.60 yv12->nv12[AVX], 1920x1024p, 24000/1001 fps
Output: H.264/AVC High @ Level 4.1
1920x1024p 23.976fps (24000/1001fps)
Quality: slow
VBR: 7090 kbps, Max 9000 kbps
QP: Min: 0, Max: 51
VBV Bufsize: 9000 kbps
Bframes: 0 frames, b-pyramid: off
Motion Est: Q-pel
Slices: 1
GOP Len: 120 frames
Others: deblock hrd
encoded 1280 frames, 59.59 fps, 7222.98 kbps, 45.97 MB
encode time 0:00:21, CPULoad: 20.44
m_encoder->SetProperty(BPicturesDeltaQP) failed Error:AMF_ALREADY_INITIALIZED
m_encoder->SetProperty(BPicturesPattern) failed Error:AMF_OUT_OF_RANGE
m_encoder->SetProperty(ReferenceBPicturesDeltaQP) failed Error:AMF_ALREADY_INITIALIZED

Start: 06:38:43 AM
End: 06:39:06 AM
Duration: 00:00:22

General
Complete name : D:\Work\Encode\VCE\HEVC Test\Media\Source (1080p)_temp\Source (1080p)_new_out.h264
Format : AVC
Format/Info : Advanced Video Codec
File size : 46.0 MiB

Video
Format : AVC
Format/Info : Advanced Video Codec
Format profile : High@L4.1
Format settings, CABAC : Yes
Format settings, ReFrames : 4 frames
Format settings, GOP : M=1, N=60
Width : 1 920 pixels
Height : 1 024 pixels
Display aspect ratio : 1.85:1
Frame rate : 23.976 (24000/1001) fps
Standard : Component
Color space : YUV
Chroma subsampling : 4:2:0
Bit depth : 8 bits
Scan type : Progressive
Color range : Limited

JohnLai
13th September 2016, 07:29
Yeah I've interacted with Robert a few times on reddit, he's an AMD Marketing guy. He has mentioned VCE and its capabilities a couple times in his comments, even mentioned he made a few of AMD's presentation slides.

Unfortunatly a few developers of third party software - and an AMD developer who actually works on their VCE Media SDK (See that github issue link) all say that Polaris lacks B-frames support.

Querying the GPU's capabilities via the SDK shows B-frames are not supported. Asking it to encode them anyway results in errors thrown.

Funnily enough I read up on VCEenc recently, it doesn't seem to have been updated yet to support AMD's new Media SDK; the author notes that AMD's SDK hasn't been updated since Jan 2015 in their blog post. Nevertheless I tried forcing b-frames with VCEenc via staxrip, and nope: (Staxrip's VCEEnc is outdated, I dropped in the latest 2.0 binary)

Oh darn, what a setback, not having B-Frame for HEVC is one thing, dropping B-Frame support for H264 is another when previous iteration of VCE supported B-frame for H264 just fine.

:(

trip_let
13th September 2016, 20:38
Okay, thanks for all the info.


I just tried HEVC encode on NVEnc via Staxrip on my GTX 960 using the VBR setting and the quality for a given size was still noticeably worse than x264 @ 8 bits, medium preset on the one sample I looked at. I used CRF on x264 with the parameter changed to match the bitrate of the HEVC encode (came out to CRF 22.4). Maybe I used the wrong settings, but the difference wasn't that small. Kind of disappointing other than the blazing speed, seeing as the medium and fast-ish presets in x264 are already plenty fast enough for my needs.

Well, at least there's lower power consumption, and not every system has a 4C8T kind of processor at high clocks.

bladerunner1982
12th January 2017, 13:54
Hi,

can anyone post some hardware encoded Kaby Lake HEVC samples?
What fps do you get at what quality settings?
What platform has the biggest potential to match x265 quality in the near future? Intel, Nvidia or AMD?

Thanks...

CruNcher
14th January 2017, 14:55
Nvidia leads Discrete Balance wise and Intel leads on the Internal CPU side and Quality :)

AMD is a good question they just trying to fix a lot so far it doesn't make a bad impression comparable to Nvidias state but far away from Intel yet for the Power Consumption this wont change neither with Ryzen if AMD doesn't improved it's APU Encoder significantly.

But the Main Goals are still low latency for Nvidia and AMD so i wouldn't expect big steps anyways and AMD is already marketing CPU Encoding with their super efficient Internal Async comunication (so you buy their 8 core Ryzen) so you see that they themselves prepare no big jumps at all the biggest jump was the 2 pass and lookahead with Polaris but if you remember Nvidia had this already with Maxwell so not really anything interesting ;)

AMD just made the move to improve it's Streaming Quality Efficiency with Polaris and is now pretty much on par with Nvidia.

I don't think we will see that much more improvements for VEGA and RyZen alone though i hope i be wrong ;)

But RyZen in combination with VEGA that becomes something to look out for indeed ;)

Heaviest overhead for Nvidia is indeed the two pass rate control and it's visual efficiency overall is questionable (vs the performance impact) depending on the bitrate target and your actual goal.

ShogoXT
17th January 2017, 23:37
https://github.com/GPUOpen-LibrariesAndSDKs/AMF/releases

AMD just released AMF 1.4. I am eager to see how it's hevc encoding stacks up finally.

Edit: No b frames and now no 10 bit encoding. I'm starting to feel burned having bought a Rx 480. It was one of my reasons for buying this product. I did not find out until months later about these feature regressions.

JohnLai
18th January 2017, 04:03
https://github.com/GPUOpen-LibrariesAndSDKs/AMF/releases

AMD just released AMF 1.4. I am eager to see how it's hevc encoding stacks up finally.

Edit: No b frames and now no 10 bit encoding. I'm starting to feel burned having bought a Rx 480. It was one of my reasons for buying this product. I did not find out until months later about these feature regressions.

Hope someone provides a sample with 7 reference frames (Polaris) for me to analyse after this....to verify if the P frame can use multiple preceding frames.
Before AMF1.4 is out, every VCE sample I verified only make use of single reference frame just like nvidia nvenc.

NikosD
18th January 2017, 08:54
I have already contacted rigaya to take a look on the new AMF v1.4, although I'm pretty sure he had already seen that.

The moment he releases his updated VCEENC with HEVC support, I'll post here a few samples with different encoding options and speeds.

JohnLai
18th January 2017, 10:11
I have already contacted rigaya to take a look on the new AMF v1.4, although I'm pretty sure he had already seen that.

The moment he releases his updated VCEENC with HEVC support, I'll post here a few samples with different encoding options and speeds.


I am interested with HEVC_MAX_NUM_REFRAMES, HEVC_RATE_CONTROL_PREANALYSIS_ENABLE and HEVC_ENABLE_VBAQ as well as type of supported motion partitions.

Hmmm.....documentation doesn't mention anything about SAO and CU size. Only bilinear and bicubic hardware resizer.
But based on A's Video Converter sample without proper AMF last time.....I don't put too much hope on it.

But UVD can support AMFVideoDecoderHW_H265_MAIN10. Kinda weird for encoder section not to mention anything about HEVC 10bit encoding.
:confused:

*by the way, does Kaby lake hevc encoder support SAO and 64x64 CU yet?*

NikosD
18th January 2017, 11:26
But UVD can support AMFVideoDecoderHW_H265_MAIN10. Kinda weird for encoder section not to mention anything about HEVC 10bit encoding.
:confused:



It seems that HEVC encoder is only 8 bit for Polaris.
Maybe the upcoming VEGA can do something better on this.

https://github.com/GPUOpen-LibrariesAndSDKs/AMF/issues/51#issuecomment-269660790

NikosD
18th January 2017, 11:43
For some unknown reason, all big 3 (Intel, Nvidia and AMD) hardware encoders have problem encoding dark / black scene properly. I can see those big dark blocky artifacts on dark scene even with 15Mbps on 1080p. Bright scene is perfect though.....

That was an issue for AMD that managed to fix in 16.9.2 drivers according to this:

https://github.com/GPUOpen-LibrariesAndSDKs/AMF/issues/28#issue-179300712

NikosD
19th January 2017, 22:15
But the Main Goals are still low latency for Nvidia and AMD so i wouldn't expect big steps anyways and AMD is already marketing CPU Encoding with their super efficient Internal Async comunication (so you buy their 8 core Ryzen) so you see that they themselves prepare no big jumps at all the biggest jump was the 2 pass and lookahead with Polaris but if you remember Nvidia had this already with Maxwell so not really anything interesting ;)



Indeed.

Infinity Fabric (what a terrible marketing name) will make a difference according to AMD in the interconnections in RyZen and VEGA (and all the next products actually)

x264 and possibly x265 (although that has a lot of AVX2 optimizations that are slower to RyZen compared with Haswell and onward) have been demonstrated by AMD as applications that favor RyZen over Intel HEDT processors.

But probably VEGA has or should have some new features/ better performance in HW encoding that Polaris hasn't.

sephirotic
20th January 2017, 02:48
Wrong post.

Pitou
20th January 2017, 21:39
Hello all,

It's been a long time since I came here and I have a little question.

I'm playing with hevc and tried x265 and NVEnc (using ffmpeg) in Archlinux, trying to re-encode a 1080p VC1 movie.

I was reading some of the @JohnLai posts and got some great hints.

However, I would like to use NVEnc has it is faster. I'm using it with a Nvidia 1060 (Pascal) video card.

What would be the absolute best settings to have the best quality and reasonable filesize?

I can try using StaxRip instead of ffmpeg if needed.

One thing is that with ffmpeg I can specify "-hwaccel cuvid -c:v vc1_cuvid", so that he GPU is used to decode the source. I get an incredible speed gain using this.

Can StaxRip do this as well?
Does StaxRip (NVEncC) have more options that will result in better quality than ffmpeg?

Here are 2 sample cmds I tried to encode the video:


ffmpeg -hwaccel cuvid -c:v vc1_cuvid -i movie.mkv -qmin 0 -qmax 20 -preset slow -rc vbr_2pass -rc-lookahead 32 -c:v hevc_nvenc -c:a copy movieout.mkv
ffmpeg -hwaccel cuvid -c:v vc1_cuvid -i movie.mkv -preset medium -profile:v main10 -spatial_aq 1 -rc-lookahead 32 -rc constqp -global_quality 22 -c:v hevc_nvenc -c:a copy movieout.mkv


Thank you.

Pitou!

JohnLai
21st January 2017, 03:49
Hello all,

It's been a long time since I came here and I have a little question.

I'm playing with hevc and tried x265 and NVEnc (using ffmpeg) in Archlinux, trying to re-encode a 1080p VC1 movie.

I was reading some of the @JohnLai posts and got some great hints.

However, I would like to use NVEnc has it is faster. I'm using it with a Nvidia 1060 (Pascal) video card.

What would be the absolute best settings to have the best quality and reasonable filesize?

I can try using StaxRip instead of ffmpeg if needed.

One thing is that with ffmpeg I can specify "-hwaccel cuvid -c:v vc1_cuvid", so that he GPU is used to decode the source. I get an incredible speed gain using this.

Can StaxRip do this as well?
Does StaxRip (NVEncC) have more options that will result in better quality than ffmpeg?

Here are 2 sample cmds I tried to encode the video:


ffmpeg -hwaccel cuvid -c:v vc1_cuvid -i movie.mkv -qmin 0 -qmax 20 -preset slow -rc vbr_2pass -rc-lookahead 32 -c:v hevc_nvenc -c:a copy movieout.mkv
ffmpeg -hwaccel cuvid -c:v vc1_cuvid -i movie.mkv -preset medium -profile:v main10 -spatial_aq 1 -rc-lookahead 32 -rc constqp -global_quality 22 -c:v hevc_nvenc -c:a copy movieout.mkv


Thank you.

Pitou!
Hmm, staxrip + nvencc. Yes, nvencc has CUVID + NPP resizers too.

http://forums.guru3d.com/showthread.php?t=411509
Kinda lazy to re-type everything....read from beginning until the end. Problem with ffmpeg default high quantizers value for I & P (global_quality flag + lookahead issue).

Pitou
21st January 2017, 04:32
Thanks very much for the reply, I'll read the entire thread for sure.

In the meantime, I tried StaxRip + nvencc for my VC1 movie. So far I'm decoding it in software because when trying to use cuvid, I'm getting this error:

avcuvid: codec
avcuvid: unable to decode by cuvid.
Failed to open input file.

Is it because ncencc doesn't support hardware decoding using cuvid for VC1?

Would it be ok for a h264 movie?

Thanks again!

Pitou!

JohnLai
21st January 2017, 04:59
Thanks very much for the reply, I'll read the entire thread for sure.

In the meantime, I tried StaxRip + nvencc for my VC1 movie. So far I'm decoding it in software because when trying to use cuvid, I'm getting this error:

avcuvid: codec
avcuvid: unable to decode by cuvid.
Failed to open input file.

Is it because ncencc doesn't support hardware decoding using cuvid for VC1?

Would it be ok for a h264 movie?

Thanks again!

Pitou!

Eh? By right, Pascal should be able to decode VC1 codec.
https://developer.nvidia.com/nvidia-video-codec-sdk under NVDEC - Hardware-Accelerated Video Decoding section

Strange indeed.
Well, there are two CUVID modes for nvencc decoder, one is "Native" and the other is "CUDA". Have you try both?

Pardon me, I just read rigaya nvencc documentation and it appears the developer doesn't implement VC-1 decoding support
http://rigaya34589.blog135.fc2.com/blog-entry-739.html
The developer only enabled CUVID decoding for MPEG1, MPEG2 and H.264/AVC

Pitou
21st January 2017, 05:08
Yes tried both, but native or avisynth is rather slow. Getting 40fps instead of around 250fps with cuda.

Just tried with a h264 source and it works fine. I'm getting around 250fps.

With ffmpeg, vc1_cuvid works fine to decode VC1 with cuda. I'm getting around 250fps there also.
(ffmpeg -hwaccel cuvid -c:v vc1_cuvid)

Pitou!

JohnLai
21st January 2017, 05:10
Yes tried both, but native or avisynth is rather slow. Getting 40fps instead of around 250fps with cuda.

Just tried with a h264 source and it works fine. I'm getting around 250fps.

With ffmpeg, vc1_cuvid works fine to decode VC1 with cuda. I'm getting around 250fps there also.
(ffmpeg -hwaccel cuvid -c:v vc1_cuvid)

Pitou!

How about :
http://i.imgur.com/hwfccMc.jpg

Pitou
21st January 2017, 12:05
I get much better speed now, about 80fps. I'll use that for VC1 sources and cuvid for h264 sources

Thanks for the hint!

Pitou!

JohnLai
21st January 2017, 13:51
I get much better speed now, about 80fps. I'll use that for VC1 sources and cuvid for h264 sources

Thanks for the hint!

Pitou!

Only 80fps with ffmpeg(dxva) for VC1? :confused: Got a feeling ffmpeg auto-select wrong GPU for decoding. (Maybe ffmpeg dxva auto select intel igpu, there is a bug with ffmpeg dxva decoding using Intel IGPU)
The copy-back operation shouldn't be that taxing.

EDIT:
If you have intel integrated GPU enabled (plus using windows 8/10)....you can select "QSVEncC (Intel)" as decoder.
It turned out QSVENCC decoder supports MPEG2, H264, VC1 too.
Note: Intel hardware decoder is the fastest compared to Nvidia and AMD.

~Using Intel IGPU decoder to pipe the decoded video to Nvidia Nvenc for encoding.~

CruNcher
21st January 2017, 17:22
The TS Parser that Rigaya uses by default in Nvenc 3.02 makes me crazy.

it fails with


Complete name : F:\hevc\10bit\fail wip\Samsung_UHD_Ride_on_Board.ts
Format : MPEG-TS
File size : 992 MiB
Duration : 2 min 41 s
Overall bit rate mode : Constant
Overall bit rate : 51.6 Mb/s

Video
ID : 257 (0x101)
Menu ID : 1 (0x1)
Format : HEVC
Format/Info : High Efficiency Video Coding
Format profile : Main 10@L5.1@High
Codec ID : 36
Duration : 2 min 40 s
Width : 3 840 pixels
Height : 2 160 pixels
Display aspect ratio : 16:9
Frame rate : 59.940 (60000/1001) FPS
Color space : YUV
Chroma subsampling : 4:2:0
Bit depth : 10 bits
Writing library : ATEME Titan KFE 3.6.2 (4.6.1.9)

Audio
ID : 258 (0x102)
Menu ID : 1 (0x1)
Format : AAC
Format/Info : Advanced Audio Codec
Format version : Version 4
Format profile : LC
Muxing mode : ADTS
Codec ID : 15
Duration : 2 min 40 s
Channel(s) : 2 channels
Channel positions : Front: L R
Sampling rate : 48.0 kHz
Frame rate : 46.875 FPS (1024 spf)
Compression mode : Lossy


Result on the mux side:


avout: failed to write header for output file: Invalid argument
[mp4 @ 000000000902d680] sample rate not set


also ProRes seems not supported @ all


Lets see if it works on 3.05 now by default :)

uhhh Rigaya updated to Video Codec SDK 7.1.9 since 7.01 it seems

seems 375.63 stopped working now and now 375.95 is minimum

Glimpse of 8.0 coming ?


Failed to create instance of nvEncodeAPI, please consider updating your GPU driver.

1) Enhancements to H.264 motion estimation only mode:
a) Ability to select specific motion vector partitions and intra mode enable/disable for motion estimation only mode.
b) Performance enhancement for stereo mode motion-estimation.
2) Streamlined the nomenclature of rate control modes.
3) Quality improvement for H.264 Temporal Adaptive Quantization(TAQ).

Nothing really interesting for HEVC though only H.264 and MVC improvements overall (officially), nothing about bug fixes in terms of HEVC Encoding or general Improvements.

Most important improvements where introduced already with 7.01

JohnLai
21st January 2017, 18:20
[@CruNcher]
Muxer error = Seem like there is a problem with the audio. If ffmpeg is being used...then -ar 48000 is used. In case of NVENCC, maybe --audio-samplerate 48000 ?
NVENC SDK 7.1 requires NVIDIA Windows display driver 375.95 or newer.

CruNcher
21st January 2017, 18:33
Every muxer for Rigayas Nvenc TS input shows the same problem, very unreliable standalone and ProRes fails Decoding completely.

JohnLai
21st January 2017, 18:46
Well....you could contact rigaya or create a ticket https://github.com/rigaya/NVEnc/issues about the parser issue.

CruNcher
21st January 2017, 19:37
I'll do it when im sure 3.05 has still the same issue it looks like it copying the new ffmpeg library components to the 3.02 binary but i dunno how both are internally working together and before i can test 3.05 i need to change the driver and im very picky with driver changing overall (especially Nvidia Drivers since some strategy updates), even if it looks basically like a very small change going from 375.63 up to 375.95 being forced currently through this API change that seems to have happened with the 7.1 SDK.

I want to produce some test files first and compare if they really have been no noticeable bitstream differences on the HEVC side of things and some test files for H.264 as well looking at the overall improvements there.

So far i found 1 visual issue on Nvidias Encoder side in form of Nvenc im overall not happy with it's visual retranscoding outcome for the Performance, especially not vs H.264 CPU but overall it doesn't do bad for UHD almost 50 fps realtime at reasonable bitrates, i mostly hit 30 fps though on GM204 :)

If that result should have improved with the Driver update it would be really interesting :D


Hmm it seems to work without --audio-copy so the direct copy transfer seems to fail interesting this is being shown when aborting the execution.

[mp4 @ 00000000091fd680] sample rate not set
avout: failed to write header for output file: Invalid argument
Ignoring attempt to set invalid timebase 1/0 for st:1

encoded 0 frames, 0.00 fps, -nan(ind) kbps, 0.00 MB
encode time 0:00:03 / CPU Usage: 75.59%

@JohnLai

you could be right it seems to be rather a muxer issue with that ADTS track, but then practicaly every muxer fails geez.

with the default audio transcoding overhead ~0.5x Realtime and the internal 10->8bit conversion

[26.2%] 2572 frames: 30.78 fps, 24654.49 kb/s, remain 0:03:55


https://devtalk.nvidia.com/default/topic/987496/video-technologies/nvenc-diagram-correction/

hehe yeah funny though that the Encoder is as powerful as GM206 but the Decoder is Hybrid and pretty damn weak on the Performance GTX 970.

So some files the Encoder creates most GM204 can't even Playback flawless but GM206 can :D

And the part about the Decoding support in the Matrix doesn't fit either for it based on the Assumption that NVCUVID is now transformed to NVDEC ;)

Normally it should be as well shown as supported in the NVDEC Matrix even if heavier CPU Depending as on GM206 ;)


Note: For Video Codec SDK 7.0, NVCUVID has been renamed to NVDECODE API.

JohnLai
22nd January 2017, 04:15
[@CruNcher]
30fps hevc encoding seems to be about right due to cpu decoding. Can't expect high quality visual transcoding from GM204 fixed function encoder.
Encoder itself could handle 8bit HEVC 4k at 60fps just fine. Then again, not even core i7-3770k at 4.5ghz (nor my i5-3570k 4.2ghz) can decode the source 10bit HEVC 4K60 50Mbps without dropping frame.
Very ironic indeed.

However, there is a bug with color range for nvidia hevc encoding https://devtalk.nvidia.com/default/topic/958132/video-technologies/nvenc-hevc-with-full-range-colors-/
This particular bug doesn't happen with H264.

CruNcher
22nd January 2017, 11:59
I find it already visual very pleasing for the Performance it achieves and you shouldn't forget this is not yet a full GPU developed Video Codec, that thing is currently working only in Nvidias Lab ;)
That Intel would beat it at even lower Power in every of their High End Notebooks @ around 35W is a really masterful architectural achievement that even AMD has a hard time to reach yet with their APU integration and Fusion idea (HSA) ;)

Im throwing currently around 150W out of the Gen2 System for that 30 fps result as much as for the Decoding actually @ flawless non dropping 60 fps @ around 70W :)

It will be interesting to see what i can achieve with VP9 and AV1 at those 150W against Kyrion and Titan ;)

https://youtu.be/BaiPRAPOnjA?t=184

JohnLai
22nd January 2017, 13:03
VCEEnc 3.00 by rigaya is out!
Polaris HEVC encoding is supported. Only 8bit encoding.

EDIT:
Eh? It appears rigaya-san bought second hand rx460 just to ensure everything works.
Talk about amount of dedication....:eek:

NikosD
22nd January 2017, 13:09
I've just read his email he sent me.

You are fast!

He also wrote this:

Details:
https://github.com/rigaya/VCEEnc/releases/tag/3.00

"Please note that VCEEnc still lacks some features compared to QSVEnc/NVEnc, like HW HEVC decode or sw decoding by libavcodec."

I'll try to post some encodings later today.

CruNcher
22nd January 2017, 13:44
Please try todo the best retranscode you can of Samsungs Journey of Colors in 8 bit and a target of 25 Mbps :)

gona release my test result shortly in the Decoder Evaluation thread of it

encoded 6646 frames, 32.25 fps, 22602.53 kbps, 298.75 MB
encode time 0:03:26 / CPU Usage: 72.36%

frame type IDR 31
frame type I 31, avgQP 23.48, total size 15.90 MB
frame type P 6615, avgQP 28.70, total size 282.85 MB


Format : MPEG-4
Format profile : Base Media / Version 2
Codec ID : mp42 (isom/iso2/mp41)
File size : 299 MiB
Duration : 1 min 50 s
Overall bit rate : 22.6 Mb/s
Writing application : NVEncC (x64) 3.02

Video
ID : 1
Format : HEVC
Format/Info : High Efficiency Video Coding
Format profile : Main@L5@High
Codec ID : hev1
Codec ID/Info : High Efficiency Video Coding
Duration : 1 min 50 s
Bit rate : 22.6 Mb/s
Width : 3 840 pixels
Height : 2 160 pixels
Display aspect ratio : 16:9
Frame rate mode : Constant
Frame rate : 59.940 (60000/1001) FPS
Color space : YUV
Chroma subsampling : 4:2:0
Bit depth : 8 bits
Scan type : Progressive
Bits/(Pixel*Frame) : 0.045
Stream size : 299 MiB (100%)


after testing the new Driver result and comparing it

JohnLai
22nd January 2017, 13:46
I've just read his email he sent me.

You are fast!

He also wrote this:

Details:
https://github.com/rigaya/VCEEnc/releases/tag/3.00

"Please note that VCEEnc still lacks some features compared to QSVEnc/NVEnc, like HW HEVC decode or sw decoding by libavcodec."

I'll try to post some encodings later today.

Prepared to be saddened for MikhailAMD said Polaris VCE encoder only support 8bit HEVC encoding.......T_T.....

EDIT:
One can use directshow filter in conjunction with LAVfilter DXVA Copy back to get hardware acceleration decode and piping it to vceencc.

EDIT2:
Don't forget to use "--quality slow" and "--pre-analysis auto". You know, cause last time intel TU has different min/max rectangle partitions for each target usage. Worry if AMD also has same issue.

Pitou
22nd January 2017, 14:31
Only 80fps with ffmpeg(dxva) for VC1? Got a feeling ffmpeg auto-select wrong GPU for decoding. (Maybe ffmpeg dxva auto select intel igpu, there is a bug with ffmpeg dxva decoding using Intel IGPU)
The copy-back operation shouldn't be that taxing.

EDIT:
If you have intel integrated GPU enabled (plus using windows 8/10)....you can select "QSVEncC (Intel)" as decoder.
It turned out QSVENCC decoder supports MPEG2, H264, VC1 too.
Note: Intel hardware decoder is the fastest compared to Nvidia and AMD.

~Using Intel IGPU decoder to pipe the decoded video to Nvidia Nvenc for encoding.~

I only have a Q9450 CPU, which is very old and doesn't have accelleration. That's why I need my Nvidia to do all the work. But apprently, NVencC doesn't decode VC1 on GPU. Only FFMpeg does.

Pitou!

CruNcher
22nd January 2017, 14:51
Prepared to be saddened for MikhailAMD said Polaris VCE encoder only support 8bit HEVC encoding.......T_T.....
Wonder why everyone is so surprised about it it was clear from day 1 that RX 470/480 (Polaris) is just Designed to beat GTX 970 (GM204) and be able to compete with GTX 1060 (GP104) on the Mass Mainstream market ;)

And Amd is lacking behind on UVD/VCE they just started to keep up :)

Mikhail has a interesting Background as a PMTS :)

https://www.youtube.com/watch?v=6GncOvnc_S0

MTUCI not MSU ;)

JohnLai
22nd January 2017, 15:24
I only have a Q9450 CPU, which is very old and doesn't have accelleration. That's why I need my Nvidia to do all the work. But apprently, NVencC doesn't decode VC1 on GPU. Only FFMpeg does.

Pitou!

I just dropped an email to rigaya asking about VC1, HEVC and VP9 NVDEC(cuvid) missing support. Don't put high hope on it.

Wonder why everyone is so surprised about it it was clear from day 1 that RX 470/480 (Polaris) is just Designed to beat GTX 970 (GM204) and be able to compete with GTX 1060 (GP104) on the Mass Mainstream market ;)


Because almost all reddit users at /AMD actually believe Polaris has 10bit HEVC encoding support. ~.~......

NikosD
22nd January 2017, 15:27
Please try todo the best retranscode you can of Samsungs Journey of Colors in 8 bit and a target of 25 Mbps :)


I'm ready to start testing new VCEEnc.

Give me a link for the above clip.

I just dropped an email to rigaya asking about VC1, HEVC and VP9 NVDEC(cuvid) missing support. Don't put high hope on it.


@JohnLai

I urgently suggest you to open a new thread here in Doom9 regarding GPU/HW encoding of H.264/ H.265 for all (AMD, Nvidia, Intel)

You are the most suitable guy to do that, since you dedicate a lot of hours testing, reading, evaluating mostly Nvidia HW encoders but also you are clearly interested in all HW encoders.

You are very helpful and detailed suggesting solutions to users and you will get a lot of feedback because I think HW encoding is a trend right now.

That thread is clearly missing from doom9.

Go on and we will all contribute to that, starting from my results of Polaris HEVC encoding :)


gona release my test result shortly in the Decoder Evaluation thread of it


No, don't post such info in the Decoder Evaluation thread.
It would be much better to do it in the new thread regarding HW encoding that JohnLai will open soon :)

JohnLai
22nd January 2017, 15:37
@JohnLai

I urgently suggest you to open a new thread here in Doom9 regarding GPU/HW encoding of H.264/ H.265 for all (AMD, Nvidia, Intel)



No way....ain't my specialty.:scared:

Speaking of H264....seem like only Intel IGPU (haswell onwards) and AMD VCE (except Polaris) support b-pyramid. Nvidia never support b-pyramid. I wonder why.

NikosD
22nd January 2017, 15:41
No way....ain't my specialty.:scared:


It is exactly your specialty, at least from all the users writing here.

And what do you mean specialty ?

We aren't exactly professionals, we only want to help users.

Come on!

You are the best on HW encoding and you don't have to know everything.

Just make the beginning and all the rest will follow.

mariush
22nd January 2017, 15:44
This VCEEnc 3.0 actually works on my Windows 7 machine, with my XFX RX 470 card. A's video converter didn't show the HEVC option.
So if you want me to do some test, paste the command line and a link to the sample input file (if any particular one is desired) and I can do tests for you guys.

I did a test encode of a 1280x536 5mbps vbr ~1min video and it encoded it at around 135 fps 1600 kbps ... it was the absolute minimum command line parameters to get it to work, like -c hevc --avcee (or something like that) -i file.mp4 -o file.hevc