What is current status for hardware H.265 encoding. [Archive] - Page 3

NikosD

22nd January 2017, 15:45

Don't post anything here until JohnLai opens a thread regarding HW encoding [emoji14]

CruNcher

22nd January 2017, 15:48

http://demo-uhd3d.com/fiche.php?cat=uhd&id=91

It's a relative old Ateme Encoder result ;)

JohnLai

22nd January 2017, 16:15

Don't post anything here until JohnLai opens a thread regarding HW encoding [emoji14]

Sorry, I can't.

By the way, why does rigaya explicitly mention --vbaq (H.264 only)? AMF 1.4 simply states By default, disable VBAQ. It should be possible to enable it for HEVC or if this is just an oversight by rigaya?

It appears Long Term Reference picture support is not implemented too.
EDIT: Assuming if AMD VCE HEVC P frames can refers to multiple preceding frames, LTR support is even more crucial.

NikosD

22nd January 2017, 17:12

http://demo-uhd3d.com/fiche.php?cat=uhd&id=91

It's a relative old Ateme Encoder result ;)

Downloading...

Sorry, I can't.

I really, really, really can't understand you...

By the way, why does rigaya explicitly mention --vbaq (H.264 only)? AMF 1.4 simply states By default, disable VBAQ. It should be possible to enable it for HEVC or if this is just an oversight by rigaya?

It appears Long Term Reference picture support is not implemented too.
EDIT: Assuming if AMD VCE HEVC P frames can refers to multiple preceding frames, LTR support is even more crucial.

I tried VBAQ with HEVC and get a reply:
"VBAQ is not supported with HEVC encoding, disabled."

Yes, I think the first version of VCEEnc supporting AMF has some small bugs and is missing a few things.

While I'm downloading the 10bit HEVC file, I did some tests with this source:
ftp://helpedia.com/pub/multimedia/testvideos/x264/2012%20-%2001%20-%20QuickSync%20vs%20UVD%202.2%20vs%20VP4/9.Ducks.Take.Off.1080p30fpsRef5-108Mbps.mkv

For all my tests I used --cqp 25 along with --preset-analysis auto

I only changed -u (quality) using three options fast, balanced, slow.

The results:
(All HEVC encoded samples have half size of the original H.264 file with quality you can see by yourselves)

Ducks_HEVC_CQP25_fast.mkv
https://www.sendspace.com/file/9djrh3

DX9: List of adapters:
0: Device ID: 67DF [Radeon (TM) RX 470 Graphics]
DX9 : Chosen Device 0: Device ID: 67DF [Radeon (TM) RX 470 Graphics]
VCEEnc 3.00 (x64) / Windows 10 (x64)
CPU: Intel Core i5-2400 @ 3.10GHz [TB: 3.20GHz] (4C/4T)
GPU: \\.\DISPLAY1 [Ellesmere 1300MHz (2236.10)]
Input Info: avcodec video: H.264/AVC, 1920x1080, 30000/1001 fps
Output: H.265/HEVC main @ Level 4.1
1920x1080p 1:1 29.970fps (30000/1001fps)
avwriter: hevc => matroska
Quality: balanced
CQP: I:25, P:25
VBV Bufsize: 20000 kbps
Bframes: 0 frames
Motion Est: Q-pel
Slices: 1
GOP Len: 300 frames
Others: deblock hrd pre-analysis:auto

encoded 500 frames, 62.77 fps, 57915.24 kbps, 115.18 MB
encode time 0:00:08, CPULoad: 26.13%
frame type IDR 2
frame type I 2, total size 0.60 MB
frame type P 498, total size 114.58 MB

Ducks_HEVC_CQP25_balanced.mkv
https://www.sendspace.com/file/yfii43

DX9: List of adapters:
0: Device ID: 67DF [Radeon (TM) RX 470 Graphics]
DX9 : Chosen Device 0: Device ID: 67DF [Radeon (TM) RX 470 Graphics]
VCEEnc 3.00 (x64) / Windows 10 (x64)
CPU: Intel Core i5-2400 @ 3.10GHz [TB: 3.30GHz] (4C/4T)
GPU: \\.\DISPLAY1 [Ellesmere 1300MHz (2236.10)]
Input Info: avcodec video: H.264/AVC, 1920x1080, 30000/1001 fps
Output: H.265/HEVC main @ Level 4.1
1920x1080p 1:1 29.970fps (30000/1001fps)
avwriter: hevc => matroska
Quality: balanced
CQP: I:25, P:25
VBV Bufsize: 20000 kbps
Bframes: 0 frames
Motion Est: Q-pel
Slices: 1
GOP Len: 300 frames
Others: deblock hrd pre-analysis:auto

encoded 500 frames, 55.80 fps, 58366.81 kbps, 116.08 MB
encode time 0:00:09, CPULoad: 25.84%
frame type IDR 2
frame type I 2, total size 0.60 MB
frame type P 498, total size 115.48 MB

Ducks_HEVC_CQP25_quality.mkv
https://www.sendspace.com/file/cteioc

DX9: List of adapters:
0: Device ID: 67DF [Radeon (TM) RX 470 Graphics]
DX9 : Chosen Device 0: Device ID: 67DF [Radeon (TM) RX 470 Graphics]
VCEEnc 3.00 (x64) / Windows 10 (x64)
CPU: Intel Core i5-2400 @ 3.10GHz [TB: 3.30GHz] (4C/4T)
GPU: \\.\DISPLAY1 [Ellesmere 1300MHz (2236.10)]
Input Info: avcodec video: H.264/AVC, 1920x1080, 30000/1001 fps
Output: H.265/HEVC main @ Level 4.1
1920x1080p 1:1 29.970fps (30000/1001fps)
avwriter: hevc => matroska
Quality: balanced
CQP: I:25, P:25
VBV Bufsize: 20000 kbps
Bframes: 0 frames
Motion Est: Q-pel
Slices: 1
GOP Len: 300 frames
Others: deblock hrd pre-analysis:auto

During the encoding of all three, GPU utilisation and GPU Memory controller was 0%, GPU clock was at low 751MHz but GPU Memory speed was at max 2000MHz

GPU core only power consumption was about 7.7W to 12.3W

As you can see Quality says always balanced which is probably a bug.
Also, the size of balanced and slow samples is exactly the same.

Using other samples of 1080p H.264 files as a source (~45Mbps), I managed a speed ~100fps for HEVC encoding.

Bframes reported are always 0, no matter what.

JohnLai

22nd January 2017, 18:35

Darn, I forgot to send a request to rigaya about adding AMF_VIDEO_ENCODER_HEVC_MAX_NUM_REFRAMES support after enumerating AMFCaps interface of AMF_VIDEO_ENCODER_HEVC_CAP_MAX_REFERENCE_FRAMES.

Anyway, currently checking Ducks_HEVC_CQP25_quality.mkv.
Intra PU sizes
4x4 8x8 16x16 32x32

Inter PU sizes
8x8
8x16
16x8
16x16
16x32
32x16
32x32
32x64
64x32
64x64

Hmm......
VCE "quality" sample from nikosd
max_transform_hierarchy_depth_inter 4
max_transform_hierarchy_depth_intra 4
transform_skip_enabled_flag 0
cu_qp_delta_enabled_flag 0
pps_loop_filter_across_slices_enabled_flag 0

NvenC using GTX970
max_transform_hierarchy_depth_inter 3
max_transform_hierarchy_depth_intra 0
transform_skip_enabled_flag 1
cu_qp_delta_enabled_flag 1
pps_loop_filter_across_slices_enabled_flag 1

Well, as usual, no SAO for Polaris.

QSV TU1 Skylake
max_transform_hierarchy_depth_inter 2
max_transform_hierarchy_depth_intra 2
transform_skip_enabled_flag 0
cu_qp_delta_enabled_flag 1
pps_loop_filter_across_slices_enabled_flag 0

NikosD

22nd January 2017, 18:37

I will send him a thorough email with various small bugs I have found out.

Tell me to add features from the AMF.

JohnLai

22nd January 2017, 18:56

I will send him a thorough email with various small bugs I have found out.

Tell me to add features from the AMF.

LTR, REF, HRD conformance, option to enable VBAQ (sdk said disable by default, it doesn't mean it can't be enabled or is there serious bug with it?)

HEVC_DE_BLOCKING_FILTER_DISABLE, this one should be set to true or false if I wanna keep deblocking active?

Judging from the AMF sdk, that about it.

CruNcher

22nd January 2017, 19:36

NVEnc 3.02 (x64), using NVENC API v7.0
OS Version Windows 7 (x64)
CPU Intel Core i5-2400 @ 3.10GHz [TB: 3.30GHz] (4C/4T)
GPU #0: GeForce GTX 970 (13 EU) @ 1266 MHz (375.63)
Input Buffers CUDA, 32 frames
Input Info avsw: hevc(yv12(10bit))->nv12 [SSE2], 3840x2160, 60000/1001 fps
Vpp Filters copyHtoD
Output Info H.265/HEVC main @ Level auto
3840x2160p 1:1 59.940fps (60000/1001fps)
avwriter: hevc => mp4
Rate Control VBR2
Bitrate 25000 kbps (Max: 30000 kbps)
Initial QP I:20 P:23 B:25
VBV buf size auto
Lookahead on, 16 frames, Adaptive I, B Insert
GOP length 600 frames
B frames 0 frames
Ref frames 3 frames, LTR: on
AQ off
MV Quality Q-pel
CU max / min 32 / 8

encoded 6646 frames, 30.14 fps, 22650.44 kbps, 299.38 MB
encode time 0:03:40 / CPU Usage: 72.37%

frame type IDR 31
frame type I 31, avgQP 23.52, total size 15.46 MB
frame type P 6615, avgQP 28.61, total size 283.93 MB

For highest Quality Playback Efficiency use a Renderer with Realtime Debanding option (MadVR/MPDotNet/MPv) or push it through a TV PP Decoder Pipeline Sony/Samsung e.c.t :)

https://www.sendspace.com/file/declpg

NikosD

22nd January 2017, 19:38

LTR, REF, HRD conformance, option to enable VBAQ (sdk said disable by default, it doesn't mean it can't be enabled or is there serious bug with it?)

HEVC_DE_BLOCKING_FILTER_DISABLE, this one should be set to true or false if I wanna keep deblocking active?

Judging from the AMF sdk, that about it.

Eventually, I read both PDFs for AVC and HEVC from AMF docs and I added all the small bugs I have found out and I sent a huge email to rigaya!

Let's see what he could manage to add and fix.

thanks!

NikosD

22nd January 2017, 21:36

Anyway, currently checking Ducks_HEVC_CQP25_quality.mkv.
Intra PU sizes
4x4 8x8 16x16 32x32

Inter PU sizes
8x8
8x16
16x8
16x16
16x32
32x16
32x32
32x64
64x32
64x64

Hmm......
VCE "quality" sample from nikosd
max_transform_hierarchy_depth_inter 4
max_transform_hierarchy_depth_intra 4
transform_skip_enabled_flag 0
cu_qp_delta_enabled_flag 0
pps_loop_filter_across_slices_enabled_flag 0

Two of the bugs reported to rigaya were:

1) HW decoding of VCE v3.0 is not working like v2.0, because it uses ~25% of a 4C/4T CPU which means that one core is used at 100%, while v2.0 uses HW decoding with ~2% CPU

But v3.0 is slightly faster than v2.0

2) The Quality reported by the runtime info is always at balanced no matter what.
I mean even if I choose fast or slow (quality), the Quality info line says "Balanced"

I thought it was cosmetic but when I tried H.264 encoding it says Quality fast, balanced or slow and the variation in speed is a lot more than H.265 encoding between different presets.

So, hold your horses about the "quality" sample until rigaya replies what's really going on.

CruNcher

22nd January 2017, 23:37

375.95 Installed checking for result differences

So indeed only unification of things no change like the release notes also stated on the HEVC side.
VBR 2Pass now called like Nvidias Internal naming convention VBR High Quality and so on ;)

NVEnc 3.02 (x64), using NVENC API v7.0
OS Version Windows 7 (x64)
CPU Intel Core i5-2400 @ 3.10GHz [TB: 3.30GHz] (4C/4T)
GPU #0: GeForce GTX 970 (13 EU) @ 1266 MHz (375.63)
Input Buffers CUDA, 32 frames
Input Info avsw: hevc(yv12(10bit))->nv12 [SSE2], 3840x2160, 60000/1001 fps
Vpp Filters copyHtoD
Output Info H.265/HEVC main @ Level auto
3840x2160p 1:1 59.940fps (60000/1001fps)
avwriter: hevc => mp4
Rate Control VBR2
Bitrate 25000 kbps (Max: 30000 kbps)
Initial QP I:20 P:23 B:25
VBV buf size auto
Lookahead on, 16 frames, Adaptive I, B Insert
GOP length 600 frames
B frames 0 frames
Ref frames 3 frames, LTR: on
AQ off
MV Quality Q-pel
CU max / min 32 / 8

encoded 6646 frames, 30.14 fps, 22650.44 kbps, 299.38 MB
encode time 0:03:40 / CPU Usage: 72.37%

frame type IDR 31
frame type I 31, avgQP 23.52, total size 15.46 MB
frame type P 6615, avgQP 28.61, total size 283.93 MB

NVEnc 3.02 (x64), using NVENC API v7.0
OS Version Windows 7 (x64)
CPU Intel Core i5-2400 @ 3.10GHz [TB: 3.30GHz] (4C/4T)
GPU #0: GeForce GTX 970 (13 EU) @ 1266 MHz (375.95)
Input Buffers CUDA, 32 frames
Input Info avsw: hevc(yv12(10bit))->nv12 [SSE2], 3840x2160, 60000/1001 fps
Vpp Filters copyHtoD
Output Info H.265/HEVC main @ Level auto
3840x2160p 1:1 59.940fps (60000/1001fps)
avwriter: hevc => mp4
Rate Control VBR2
Bitrate 25000 kbps (Max: 30000 kbps)
Initial QP I:20 P:23 B:25
VBV buf size auto
Lookahead on, 16 frames, Adaptive I, B Insert
GOP length 600 frames
B frames 0 frames
Ref frames 3 frames, LTR: on
AQ off
MV Quality Q-pel
CU max / min 32 / 8

encoded 6646 frames, 29.87 fps, 22650.44 kbps, 299.38 MB
encode time 0:03:42 / CPU Usage: 70.54%

frame type IDR 31
frame type I 31, avgQP 23.52, total size 15.46 MB
frame type P 6615, avgQP 28.61, total size 283.93 MB

NVEnc 3.05 (x64), using NVENC API v7.1
OS Version Windows 7 (x64)
CPU Intel Core i5-2400 @ 3.10GHz [TB: 3.30GHz] (4C/4T)
GPU #0: GeForce GTX 970 (13 EU) @ 1266 MHz (375.95)
Input Buffers CUDA, 32 frames
Input Info avsw: hevc(yv12(10bit))->nv12 [SSE2], 3840x2160, 60000/1001 fps
Vpp Filters copyHtoD
Output Info H.265/HEVC main @ Level auto
3840x2160p 1:1 59.940fps (60000/1001fps)
avwriter: hevc => mp4
Rate Control VBRHQ
Bitrate 25000 kbps (Max: 30000 kbps)
Initial QP I:20 P:23 B:25
VBV buf size auto
Lookahead on, 16 frames, Adaptive I, B Insert
GOP length 600 frames
B frames 0 frames
Ref frames 3 frames, LTR: on
AQ off
MV Quality Q-pel
CU max / min 32 / 8

encoded 6646 frames, 30.27 fps, 22650.44 kbps, 299.38 MB
encode time 0:03:39 / CPU Usage: 72.38%

frame type IDR 31
frame type I 31, avgQP 23.52, total size 15.46 MB
frame type P 6615, avgQP 28.61, total size 283.93 MB

btw

[mpegts @ 00000000003779a0] start time for stream 1 is not set in estimate_timin
gs_from_pts
[mpegts @ 00000000003779a0] Could not find codec parameters for stream 1 (Audio:
aac ([15][0][0][0] / 0x000F), 0 channels, fltp): unspecified sample rate
Consider increasing the value for the 'analyzeduration' and 'probesize' options

so it's not really Rigayas fault

NikosD

23rd January 2017, 15:14

OK, I got some interesting replies from rigaya.

He will add REF in next version and probably add after a time LTR.

Regarding HRD, it is always enabled - no need to disable it

Deblocking filter always enabled, otherwise bad video quality.

VBAQ for HEVC always disabled, otherwise bad video quality.

Regarding HEVC -u options for quality always showing "balanced" he had no clue, but will investigate.

It could be a bug or limitation.

NikosD

23rd January 2017, 15:20

I also told him to add:

HEVC tier (main, high)
Full range color (H.264 only)
HW HEVC decoding

He could probably add them layer.

JohnLai

23rd January 2017, 16:05

Gotcha, NikosD.
Once rigaya-san adds the ref + ltr support, we can finally know if multiple reference frames are used.
I was thinking VBAQ to be acronym of Variable Bitrate Adaptive Quantization, clearly it is not....since developer said it produces bad quality.

Hmm, reading through AMF sdk source code.....where is AMD promised Two Pass encoding? There is nothing about two-pass in AMF SDK.

CruNcher

23rd January 2017, 20:39

NikosD how does the Samsung 10->8bit retranscode comes forward for you in direct compare vs the the Nvidia one i posted at the 2x reduction target for it any watchable results yet ? :)

im trying to get some result out of ffmpeg currently but that is somehow tricky with that parsing issue together not as easy as i thought and i wonder if that .ts is corrupt overall but Lav Splitter has 0 issues with it and also MPV and others show no issues and adjusting the probesize doesn't fix this very weired

i didn't got any encoding result for nvenc out of ffmpeg yet it behaves crazy with that input overall.

and it tells me it can't find any nvenc device at all tried different pixel formats and things but it acts totally weired -gpu list detects it correctly overall, pretty frustrating.

NikosD

23rd January 2017, 20:44

I'll wait a little for some basic fixes of rigaya regarding VCEENC before I try that sample.

You can see the results of HEVC encoding of the source and the samples I posted.

CruNcher

23rd January 2017, 21:17

@NikosD
Ok so only ducks, jellyfish and some desktop results to directly compare for now, not ideal but better then nothing ;)

lets hope some Intel user joins in Skylake/Kabylake then we can throw each other results around and compare though obviously neither of us would have any chance vs Quicksync overall ;)

What the ????

[graph 0 input from stream 0:0 @ 0000000000357140] Setting 'video_size' to value
'3840x2160'
[graph 0 input from stream 0:0 @ 0000000000357140] Setting 'pix_fmt' to value '7
2'
[graph 0 input from stream 0:0 @ 0000000000357140] Setting 'time_base' to value
'1/90000'
[graph 0 input from stream 0:0 @ 0000000000357140] Setting 'pixel_aspect' to val
ue '1/1'
[graph 0 input from stream 0:0 @ 0000000000357140] Setting 'sws_param' to value
'flags=2'
[graph 0 input from stream 0:0 @ 0000000000357140] Setting 'frame_rate' to value
'60000/1001'
[graph 0 input from stream 0:0 @ 0000000000357140] w:3840 h:2160 pixfmt:yuv420p1
0le tb:1/90000 fr:60000/1001 sar:1/1 sws_param:flags=2
[format @ 0000000000358620] compat: called with args=[yuv420p|nv12|p010le|yuv444
p|yuv444p16le|bgr0|rgb0|cuda]
[format @ 0000000000358620] Setting 'pix_fmts' to value 'yuv420p|nv12|p010le|yuv
444p|yuv444p16le|bgr0|rgb0|cuda'
[auto_scaler_0 @ 0000000000358b80] Setting 'flags' to value 'bicubic'
[auto_scaler_0 @ 0000000000358b80] w:iw h:ih flags:'bicubic' interl:0
[format @ 0000000000358620] auto-inserting filter 'auto_scaler_0' between the fi
lter 'Parsed_null_0' and the filter 'format'
[AVFilterGraph @ 0000000002f661e0] query_formats: 4 queried, 2 merged, 1 already
done, 0 delayed
[auto_scaler_0 @ 0000000000358b80] picking p010le out of 7 ref:yuv420p10le alpha
:0
[auto_scaler_0 @ 0000000000358b80] w:3840 h:2160 fmt:yuv420p10le sar:1/1 -> w:38
40 h:2160 fmt:p010le sar:1/1 flags:0x4
[hevc_nvenc @ 0000000003269020] Loaded Nvenc version 7.1
[hevc_nvenc @ 0000000003269020] Nvenc initialized successfully
[hevc_nvenc @ 0000000003269020] 1 CUDA capable devices found
[hevc_nvenc @ 0000000003269020] [ GPU #0 - < GeForce GTX 970 > has Compute SM 5.
2 ]
[hevc_nvenc @ 0000000003269020] 10 bit encode not supported
[hevc_nvenc @ 0000000003269020] No NVENC capable devices found
[hevc_nvenc @ 0000000003269020] Nvenc unloaded
Stream mapping:
Stream #0:0 -> #0:0 (hevc (native) -> hevc (hevc_nvenc))
Error while opening encoder for output stream #0:0 - maybe incorrect parameters
such as bit_rate, rate, width or height
[AVIOContext @ 00000000003bf520] Statistics: 0 seeks, 0 writeouts
[AVIOContext @ 00000000006b90a0] Statistics: 29381488 bytes read, 8 seeks

So it thinks i want todo a actuall 10bit encode and cancels ?

ok got it ffmpegs picky parser ;)

First result doesn't make the same quality level impression as rigayas nvencc output i posted above currently need to tweak it to the same level first options wise -preset slow itself doesn't seem on that level alone.

easyfab

24th January 2017, 20:53

for information, new Media Server Studio 2017 R2 for intel QSV

https://software.intel.com/en-us/forums/intel-media-sdk/topic/708917

JohnLai

25th January 2017, 16:25

http://rigaya34589.blog135.fc2.com/blog-entry-891.html
VCEEnc 3.01 is out.
Google translate version of changelog:
Added functions and fixed bugs.
[Common]
· Check the function of VCE at the time of execution and check the parameters.
- Added option to specify reference distance. (- ref <int>)
- Added option to specify the number of LTR frames. (- ltr <int>)
· Added H.264 Level 5.2.
- Version of AMF added to version information.

[VCEEncC]
· Fixed spelling error etc in help.
· Added option to check the function of VCE. (- check-features)
The function of HEVC can not be displayed normally.
Is this ...?

· Added HW decoding of HEVC (8 bit).
· Since wmv3's HW decoding does not work properly, it is deleted.

NikosD

25th January 2017, 16:30

I know.

I've already exchanged a few emails with rigaya :)

JohnLai

25th January 2017, 16:32

I know.

I've already exchanged a few emails with rigaya :)

:devil:
Awaiting hevc samples.

One with 6 refs
One with 6 refs + 6 LTR

NikosD

25th January 2017, 16:33

Still a few critical bugs.

I'll try though.

JohnLai

25th January 2017, 16:37

Still a few critical bugs.

I'll try though.

Don't use --pre-analysis
For some reason, this option causes my system with R7 260X H264 encoding to blue screen.

NikosD

25th January 2017, 16:39

--pre-analysis works for me, there are other bugs.

BTW, pre-analysis is the two-pass encoding you mentioned in a previous post.

JohnLai

25th January 2017, 16:42

--pre-analysis works for me, there are other bugs.

BTW, pre-analysis is the two-pass encoding you mentioned in a previous post.

That is two-pass? .....kinda disappointing ....judging from h264 vbr video quality.....

By the way, it turned out there is a bug about the blue screen with pre-analysis activated at amf github too.
https://github.com/GPUOpen-LibrariesAndSDKs/AMF/issues/62

NikosD

25th January 2017, 16:47

That is two-pass? .....kinda disappointing ....judging from h264 vbr video quality.....

I think that's what says here:
https://github.com/GPUOpen-LibrariesAndSDKs/AMF/issues/2

Have you tried CQP with pre-analysis?

JohnLai

25th January 2017, 16:54

I think that's what says here:
https://github.com/GPUOpen-LibrariesAndSDKs/AMF/issues/2

Have you tried CQP with pre-analysis?

Nope. I only tested 4 times with VBR mode. (5000kbps and 2500kbps, pre-analysis 'full' and 'none')
But it randomly blue screen out of nowhere.
Kinda not worth turning pre-analysis on with random system blue screen before it even start encoding.

By the way, are you sure pre-analysis option is really two-pass?
Cause from the video quality output....it doesn't seem so.

NikosD

25th January 2017, 17:11

Did you read the link above ?

My impression by reading that ticket closed is that is two pass encoding.

I don't have any other source for that info.

JohnLai

25th January 2017, 17:17

Did you read the link above ?

My impression by reading that ticket closed is that is two pass encoding.

I don't have any other source for that info.

I did read it.
But it is a conjecture by Xaymar (OBS studio plugin developer)

That option doesn't seem to be 'lookahead' either. Hmm....oh well, no point thinking too much about it.

NikosD

25th January 2017, 17:23

But AMD didn't reply him differently.

I'll ask rigaya if he knows something more.

CruNcher

25th January 2017, 20:27

Hmm i wonder why FFMPEG NVENC behaves so different overall then Rigayas NVEnCC Encoder i have to say i like Rigayas decission overall more even though it doesn't hit the bitrate as exact as FFMPEG NVENC does in every configured way.

NVENCC

Format : MPEG-4
Format profile : Base Media / Version 2
Codec ID : mp42 (isom/iso2/mp41)
File size : 299 MiB
Duration : 1 min 50 s
Overall bit rate : 22.7 Mb/s
Writing application : NVEncC (x64) 3.02

Video
ID : 1
Format : HEVC
Format/Info : High Efficiency Video Coding
Format profile : Main@L5@High
Codec ID : hev1
Codec ID/Info : High Efficiency Video Coding
Duration : 1 min 50 s
Bit rate : 22.7 Mb/s
Width : 3 840 pixels
Height : 2 160 pixels
Display aspect ratio : 16:9
Frame rate mode : Constant
Frame rate : 59.940 (60000/1001) FPS
Color space : YUV
Chroma subsampling : 4:2:0
Bit depth : 8 bits
Scan type : Progressive
Bits/(Pixel*Frame) : 0.046
Stream size : 299 MiB (100%)

FFMPEG NVENC

Format : MPEG-4
Format profile : Base Media
Codec ID : isom (isom/iso2/mp41)
File size : 302 MiB
Duration : 1 min 50 s
Overall bit rate : 22.8 Mb/s
Writing application : Lavf57.62.100

Video
ID : 1
Format : HEVC
Format/Info : High Efficiency Video Coding
Format profile : Main@L5@High
Codec ID : hev1
Codec ID/Info : High Efficiency Video Coding
Duration : 1 min 50 s
Bit rate : 22.8 Mb/s
Width : 3 840 pixels
Height : 2 160 pixels
Display aspect ratio : 16:9
Frame rate mode : Constant
Frame rate : 59.940 (60000/1001) FPS
Color space : YUV
Chroma subsampling : 4:2:0
Bit depth : 8 bits
Scan type : Progressive
Bits/(Pixel*Frame) : 0.046
Stream size : 302 MiB (100%)

Target for Rigaya NVENCC was 25 Mbps target for FFMPEG NVENC was 23 Mbps FFMPEG hit it pretty perfectly Rigaya missed it by a alot ;)

Rigaya is almost 2 Mbps off

So for Rigayas current setup you better calculate with that offset by default

NikosD

25th January 2017, 21:28

NikosD how does the Samsung 10->8bit retranscode comes forward for you in direct compare vs the the Nvidia one i posted at the 2x reduction target for it any watchable results yet ? :)

I tried all the sources of StaxRip but they can't demux that 10bit HEVC file and VCEEncC can't demux it too.

Find me a way to demux it in order to test it.

:devil:
Awaiting hevc samples.

One with 6 refs
One with 6 refs + 6 LTR

For you and everyone else who wants to see Polaris encoder in action.

With this source:
https://www.sendspace.com/file/zg01zb

I got these encoding results:

In order to achieve more than half of the original bit rate I used CQP 27 in all tests

1)H.264 (VCE) Polaris encoder (VCEEncC v3.01)

1st Test H.264

Encoding options:

Quality -> Balanced
Ref -> 6
LTR -> 2 (That's the MAX for both H.264 & H.265)
VBAQ -> ON
Pre-analysis -> Full
Profile -> High

Everything else set to default.

Speed -> ~100fps

Result:
https://www.sendspace.com/file/0fbm6y

2nd Test H.264

Encoding options:

Quality -> Slow
Ref -> 16

Everything else, same as 1st test.

Speed -> 55 fps

Result:
https://www.sendspace.com/file/breejv

2) H.265 (HEVC) Polaris encoder

1st Test H.265

Encoding options:

Quality -> Balanced
Ref -> 6
LTR -> 2 (That's the MAX for both H.264 & H.265)
Pre-analysis -> Auto

Everything else set to default.

Speed -> ~91fps

Result:
https://www.sendspace.com/file/dejr6x

2nd Test H.265

Encoding options:

Quality -> Slow
Ref -> 16

Everything else, same as 1st test.

Speed -> 91 fps

Result:
https://www.sendspace.com/file/kh28qr

For HEVC the problem of not being able to test other than balanced quality (like fast or slow) still exists, so although I choose Slow in the 2nd test, the output says Balanced

All of the encoded samples look great to me.

Waiting for your feedback of my samples and your quality tests comparison on the above source using Nvidia, Intel or AMD older cards.

JohnLai

26th January 2017, 12:35

Source frame number 18 (P-frame) is selected for PSNR and MSSIM analysis (0th frame is I-Frame, so I skipped it, and b-frame isn't suitable for comparison because Polaris HEVC encoding only has I and P frame) (P-frame comparison only).

1st Test H.264
PSNR
Y 15.7386
Cb 10.7270
Cr 10.7073

MSSIM
Y 0.0826
Cb 0.0422
Cr 0.0423

2nd Test H.264
PSNR
Y 15.7369
Cb 10.7270
Cr 10.7071

MSSIM
Y 0.0826
Cb 0.0421
Cr 0.0424

1st Test H.265
PSNR
Y 15.7008
Cb 10.7266
Cr 10.7066

MSSIM
Y 0.0822
Cb 0.0422
Cr 0.0424

2nd Test H.265
PSNR
Y 15.7008
Cb 10.7266
Cr 10.7066

MSSIM
Y 0.0822
Cb 0.0422
Cr 0.0424

1st Test H.264
MAX 0.0961 MSSIM / 28.5 PSNR
MIN 0.0418 MSSIM / 10.7 PSNR

2nd Test H.264
MAX 0.0961 MSSIM / 28.4 PSNR
MIN 0.0419 MSSIM / 10.7 PSNR

1st Test H.265
MAX 0.0958 MSSIM / 25.8 PSNR
MIN 0.0416 MSSIM / 10.7 PSNR

2nd Test H.265
MAX 0.0958 MSSIM / 25.8 PSNR
MIN 0.0416 MSSIM / 10.7 PSNR

Hmm...so I transcode the source video using* my current pc GTX 970 into H264 and HEVC.
GTX 970 LOOKAHEAD 32 H264 CQP 20 23 25 AQ (not using temporal aq, it just normal spatial aq) File size : 147668 KB (Probably due to B-frame usage)
PSNR
Y 15.6819
Cb 10.7267
Cr 10.7057

MSSIM
Y 0.0817
Cb 0.0424
Cr 0.0425

MAX 0.0953 MSSIM / 25.8 PSNR
MIN 0.0416 MSSIM / 10.7 PSNR

GTX 970 LOOKAHEAD 32 HEVC CQP 20 23 25 AQ File size : 193856KB
PSNR
Y 15.6819
Cb 10.7267
Cr 10.7057

MSSIM
Y 0.0817
Cb 0.0424
Cr 0.0425

MAX 0.0953 MSSIM / 25.8 PSNR
MIN 0.0416 MSSIM / 10.7 PSNR

GTX 970 LOOKAHEAD 32 HEVC CQP 29 29 29 AQ File size 79748KB
PSNR
Y 15.6818
Cb 10.7255
Cr 10.7057

MSSIM
Y 0.0819
Cb 0.0424
Cr 0.0424
MAX 0.0955 MSSIM / 25.8 PSNR
MIN 0.0417 MSSIM / 10.7 PSNR

NikosD

26th January 2017, 13:01

What are the sizes of the output files using CQP 20 23 25 ?

JohnLai

26th January 2017, 13:31

What are the sizes of the output files using CQP 20 23 25 ?

Up there....in the red color.....

NikosD

26th January 2017, 13:45

Ok...Your low numbered CQP H.264 file is double than mine and your HEVC file is almost triple, the comparison is by far not equal.

Your HEVC file is larger than the original AVC source (!)

When viewing the files of Polaris and GTX 970 which do you think look better ?

But of course at the same file size.

JohnLai

26th January 2017, 14:15

Ok...Your low numbered CQP H.264 file is double than mine and your HEVC file is almost triple, the comparison is by far not equal.

Your HEVC file is larger than the original AVC source (!)

When viewing the files of Polaris and GTX 970 which do you think look better ?

But of course at the same file size.

Comparing Nvidia HEVC CQP I 29 : P 29 with VCE? In motion, definitely Nvidia HEVC output.

In general nvidia HEVC is gonna be larger than AMD HEVC for sure.
This is due to more I-frame insertion + adaptive GOP (from Lookahead) + AQ + limited maximum CU 32x32.

By the way, VCE HEVC reference frame usage --> Also use 1 preceding frame as reference frame similar to NVENC HEVC. Your H264 samples also the same where the P-frame only makes use of 1 reference frame.
NumNegativePics 1
NumPositivePics 0
NumDeltaPocs 1
UsedByCurrPicS0 1
UsedByCurrPicS1
DeltaPocS0 -1
DeltaPocS1

EDIT: All of the VCE samples GOP = 600 , only 1 I-frame being inserted every 600 frames........

NikosD

26th January 2017, 14:31

From all the options rigaya has implemented I haven't used only the ones regarding b-frames, that I think are useless for Polaris encoder.

But there are at least two parameters of AMF that I don't know if they could be useful and rigaya hasn't implemented them so far: (for AVC and HEVC)

1) headers insertion spacing

2) IDR period

Cab you upload the small HEVC sample to sendspace ?

JohnLai

26th January 2017, 14:42

From all the options rigaya has implemented I haven't used only the ones regarding b-frames, that I think are useless for Polaris encoder.

But there are at least two parameters of AMF that I don't know if they could be useful and rigaya hasn't implemented them so far: (for AVC and HEVC)

1) headers insertion spacing

2) IDR period

Cab you upload the small HEVC sample to sendspace ?

Header = no idea
IDR period = "--gop-len" available in VCEenc, but what we actually need is adaptive gop length. Doubt AMD gonna implement it anytime soon.

Sample? This gonna take a long time using 50kb/s ADSL upload....
EDIT: wow....30 minutes to upload.....
EDIT2: Hmm? I just noticed nvencc lookahead also decides fixed 600 gop is optimal for the source material?
EDIT3: Interesting.....nvidia adaptive quantization is something to be feared, even if I set CQP of I29 and P29.....it actually varies the QP for each CU, 25 being the lowest and 32 being the highest. You can check CU QP value in the bitstream of the video I uploaded later. (52% uploading...unless if it disconnected again....)
EDIT4 : -.-....For all the VCE samples......every CU is using CU QP 27........

NikosD

26th January 2017, 14:46

I think you mean 50KByte/s which ~512Kbit/s.

It's not that bad!

JohnLai

26th January 2017, 15:25

I think you mean 50KByte/s which ~512Kbit/s.

It's not that bad!

Finally.... https://www.sendspace.com/file/mghgaq

NikosD

26th January 2017, 15:40

Thank you.

My eyes can't tell a difference between your sample and mine.

I mean no difference at all.

So, my next question is:

What is your encoding speed (FPS) for that HEVC small file ?

JohnLai

26th January 2017, 15:52

Thank you.

My eyes can't tell a difference between your sample and mine.

I mean no difference at all.

So, my next question is:

What is your encoding speed (FPS) for that HEVC small file ?

You can't tell the difference when it is in motion. Pausing it at certain frame...and you will notice it.

This is the speed.
encoded 1998 frames, 144.57 fps, 19596.57 kbps, 77.87 MB

NikosD

26th January 2017, 15:53

No, I can't tell any difference even in still images.

JohnLai

26th January 2017, 16:38

No, I can't tell any difference even in still images.

Really?

The tree? The leaf?
Facial texture?

Is there any website that can mouse over to compare two different image?
Other than the dreaded http://screenshotcomparison.com/ , can't even upload to this site.

EDIT:
Since almost all image hosters tend to autoconvert png file....
AvatarHEVCslow_track1_und.mkv_snapshot_00.03_[2017.01.26_23.24.51]
https://www.sendspace.com/file/0rffgz

6.Avatar-1080p60fps NVENC CQP I29P29 LOOKAHEAD 32 AQ.mp4_snapshot_00.03_[2017.01.26_23.25.04]
https://www.sendspace.com/file/6vg0ah

Yups

26th January 2017, 18:46

With this source:
https://www.sendspace.com/file/zg01zb

HD 630 1150 Mhz
b-frames 4, reference frames 2, Target usage 7
HEVC
CQP= 370 fps
VBR= 290 fps

JohnLai

26th January 2017, 18:53

HD 630 1150 Mhz
b-frames 4, reference frames 2, Target usage 7
HEVC
CQP= 370 fps
VBR= 290 fps

Dear Yups,
Can provide Intel QSV 10bit HEVC, b-frame 16, ref 5, TU1, scenechange sample with that source video?

I wanna check the bitstream......:)

Edit: Just use either VBR or ICQ to ensure the size is around 75 - 90 Mb.

Yups

28th January 2017, 10:52

B-frames 4 is the limit for HEVC, otherwise you run into issues.

CruNcher

28th January 2017, 11:08

Scenecuts can cause interesting fluctuations between NVEnCC and FFMPEGs NVENC on the Decoder side at the same avg bitrate target but i have no idea where this difference originates from in it's bitrate decisions yet.

Yups could you please benchmark the bitstream i posted here on your Kaby-Lake Decoder (in a rather clean system state with minimum of 2 runs)

https://www.sendspace.com/file/declpg

Nvidias GM204 Cuda Decoder Core

http://i1.sendpic.org/t/aC/aCT2XKCGnjRfhCQiPdTBwxU2U7p.jpg (http://sendpic.org/view/1/i/sq7WBQbNha5WxmPuzdYcoPvPd1M.png)

NVENCC (stream above)

Format : MPEG-4
Format profile : Base Media / Version 2
Codec ID : mp42 (isom/iso2/mp41)
File size : 299 MiB
Duration : 1 min 50 s
Overall bit rate : 22.7 Mb/s
Writing application : NVEncC (x64) 3.02

Video
ID : 1
Format : HEVC
Format/Info : High Efficiency Video Coding
Format profile : Main@L5@High
Codec ID : hev1
Codec ID/Info : High Efficiency Video Coding
Duration : 1 min 50 s
Bit rate : 22.7 Mb/s
Width : 3 840 pixels
Height : 2 160 pixels
Display aspect ratio : 16:9
Frame rate mode : Constant
Frame rate : 59.940 (60000/1001) FPS
Color space : YUV
Chroma subsampling : 4:2:0
Bit depth : 8 bits
Scan type : Progressive
Bits/(Pixel*Frame) : 0.046
Stream size : 299 MiB (100%)

SSIM
Y:0.942599 (12.410791)
U:0.967868 (14.930601)
V:0.964379 (14.482894)
All:0.950440 (13.048714)

Encode Speed = ~30 FPS (CPU/GPU)

FFMPEG NVENC (WIP)

Format : MPEG-4
Format profile : Base Media
Codec ID : isom (isom/iso2/mp41)
File size : 298 MiB
Duration : 1 min 50 s
Overall bit rate : 22.6 Mb/s
Writing application : Lavf57.62.100

Video
ID : 1
Format : HEVC
Format/Info : High Efficiency Video Coding
Format profile : Main@L5@High
Codec ID : hev1
Codec ID/Info : High Efficiency Video Coding
Duration : 1 min 50 s
Bit rate : 22.6 Mb/s
Width : 3 840 pixels
Height : 2 160 pixels
Display aspect ratio : 16:9
Frame rate mode : Constant
Frame rate : 59.940 (60000/1001) FPS
Color space : YUV
Chroma subsampling : 4:2:0
Bit depth : 8 bits
Scan type : Progressive
Bits/(Pixel*Frame) : 0.045
Stream size : 298 MiB (100%)

SSIM
Y:0.940291 (12.239596)
U:0.966039 (14.690139)
V:0.961858 (14.185987)
All:0.948177 (12.854755)

Encode Speed = ~33 FPS (CPU/GPU)

JohnLai

28th January 2017, 18:36

Scenecuts can cause interesting fluctuations between NVEnCC and FFMPEGs NVENC on the Decoder side at the same avg bitrate target but i have no idea where this difference originates from in it's bitrate decisions yet.

Nvidias GM204 Cuda Decoder Core

http://i1.sendpic.org/t/aC/aCT2XKCGnjRfhCQiPdTBwxU2U7p.jpg (http://sendpic.org/view/1/i/sq7WBQbNha5WxmPuzdYcoPvPd1M.png)

Hmm....seem like GTX970 hybrid decoding nature is the bottleneck.
http://i.imgur.com/FHozcid.png
Frame 2012 is the I-frame in the screenshot you produced
Checking the Coded Picture Buffer graph....guess the limit of CPB for our gtx970 hybrid decoding is 30 000 000, anymore than that, stutter~~~~