View Full Version : What is current status for hardware H.265 encoding.
Pages :
1
2
[
3]
4
5
6
7
8
9
10
11
12
13
14
NikosD
22nd January 2017, 15:45
Don't post anything here until JohnLai opens a thread regarding HW encoding [emoji14]
CruNcher
22nd January 2017, 15:48
http://demo-uhd3d.com/fiche.php?cat=uhd&id=91
It's a relative old Ateme Encoder result ;)
JohnLai
22nd January 2017, 16:15
Don't post anything here until JohnLai opens a thread regarding HW encoding [emoji14]
Sorry, I can't.
By the way, why does rigaya explicitly mention --vbaq (H.264 only)? AMF 1.4 simply states By default, disable VBAQ. It should be possible to enable it for HEVC or if this is just an oversight by rigaya?
It appears Long Term Reference picture support is not implemented too.
EDIT: Assuming if AMD VCE HEVC P frames can refers to multiple preceding frames, LTR support is even more crucial.
NikosD
22nd January 2017, 17:12
http://demo-uhd3d.com/fiche.php?cat=uhd&id=91
It's a relative old Ateme Encoder result ;)
Downloading...
Sorry, I can't.
I really, really, really can't understand you...
By the way, why does rigaya explicitly mention --vbaq (H.264 only)? AMF 1.4 simply states By default, disable VBAQ. It should be possible to enable it for HEVC or if this is just an oversight by rigaya?
It appears Long Term Reference picture support is not implemented too.
EDIT: Assuming if AMD VCE HEVC P frames can refers to multiple preceding frames, LTR support is even more crucial.
I tried VBAQ with HEVC and get a reply:
"VBAQ is not supported with HEVC encoding, disabled."
Yes, I think the first version of VCEEnc supporting AMF has some small bugs and is missing a few things.
While I'm downloading the 10bit HEVC file, I did some tests with this source:
ftp://helpedia.com/pub/multimedia/testvideos/x264/2012%20-%2001%20-%20QuickSync%20vs%20UVD%202.2%20vs%20VP4/9.Ducks.Take.Off.1080p30fpsRef5-108Mbps.mkv
For all my tests I used --cqp 25 along with --preset-analysis auto
I only changed -u (quality) using three options fast, balanced, slow.
The results:
(All HEVC encoded samples have half size of the original H.264 file with quality you can see by yourselves)
Ducks_HEVC_CQP25_fast.mkv
https://www.sendspace.com/file/9djrh3
DX9: List of adapters:
0: Device ID: 67DF [Radeon (TM) RX 470 Graphics]
DX9 : Chosen Device 0: Device ID: 67DF [Radeon (TM) RX 470 Graphics]
VCEEnc 3.00 (x64) / Windows 10 (x64)
CPU: Intel Core i5-2400 @ 3.10GHz [TB: 3.20GHz] (4C/4T)
GPU: \\.\DISPLAY1 [Ellesmere 1300MHz (2236.10)]
Input Info: avcodec video: H.264/AVC, 1920x1080, 30000/1001 fps
Output: H.265/HEVC main @ Level 4.1
1920x1080p 1:1 29.970fps (30000/1001fps)
avwriter: hevc => matroska
Quality: balanced
CQP: I:25, P:25
VBV Bufsize: 20000 kbps
Bframes: 0 frames
Motion Est: Q-pel
Slices: 1
GOP Len: 300 frames
Others: deblock hrd pre-analysis:auto
encoded 500 frames, 62.77 fps, 57915.24 kbps, 115.18 MB
encode time 0:00:08, CPULoad: 26.13%
frame type IDR 2
frame type I 2, total size 0.60 MB
frame type P 498, total size 114.58 MB
Ducks_HEVC_CQP25_balanced.mkv
https://www.sendspace.com/file/yfii43
DX9: List of adapters:
0: Device ID: 67DF [Radeon (TM) RX 470 Graphics]
DX9 : Chosen Device 0: Device ID: 67DF [Radeon (TM) RX 470 Graphics]
VCEEnc 3.00 (x64) / Windows 10 (x64)
CPU: Intel Core i5-2400 @ 3.10GHz [TB: 3.30GHz] (4C/4T)
GPU: \\.\DISPLAY1 [Ellesmere 1300MHz (2236.10)]
Input Info: avcodec video: H.264/AVC, 1920x1080, 30000/1001 fps
Output: H.265/HEVC main @ Level 4.1
1920x1080p 1:1 29.970fps (30000/1001fps)
avwriter: hevc => matroska
Quality: balanced
CQP: I:25, P:25
VBV Bufsize: 20000 kbps
Bframes: 0 frames
Motion Est: Q-pel
Slices: 1
GOP Len: 300 frames
Others: deblock hrd pre-analysis:auto
encoded 500 frames, 55.80 fps, 58366.81 kbps, 116.08 MB
encode time 0:00:09, CPULoad: 25.84%
frame type IDR 2
frame type I 2, total size 0.60 MB
frame type P 498, total size 115.48 MB
Ducks_HEVC_CQP25_quality.mkv
https://www.sendspace.com/file/cteioc
DX9: List of adapters:
0: Device ID: 67DF [Radeon (TM) RX 470 Graphics]
DX9 : Chosen Device 0: Device ID: 67DF [Radeon (TM) RX 470 Graphics]
VCEEnc 3.00 (x64) / Windows 10 (x64)
CPU: Intel Core i5-2400 @ 3.10GHz [TB: 3.30GHz] (4C/4T)
GPU: \\.\DISPLAY1 [Ellesmere 1300MHz (2236.10)]
Input Info: avcodec video: H.264/AVC, 1920x1080, 30000/1001 fps
Output: H.265/HEVC main @ Level 4.1
1920x1080p 1:1 29.970fps (30000/1001fps)
avwriter: hevc => matroska
Quality: balanced
CQP: I:25, P:25
VBV Bufsize: 20000 kbps
Bframes: 0 frames
Motion Est: Q-pel
Slices: 1
GOP Len: 300 frames
Others: deblock hrd pre-analysis:auto
During the encoding of all three, GPU utilisation and GPU Memory controller was 0%, GPU clock was at low 751MHz but GPU Memory speed was at max 2000MHz
GPU core only power consumption was about 7.7W to 12.3W
As you can see Quality says always balanced which is probably a bug.
Also, the size of balanced and slow samples is exactly the same.
Using other samples of 1080p H.264 files as a source (~45Mbps), I managed a speed ~100fps for HEVC encoding.
Bframes reported are always 0, no matter what.
JohnLai
22nd January 2017, 18:35
Darn, I forgot to send a request to rigaya about adding AMF_VIDEO_ENCODER_HEVC_MAX_NUM_REFRAMES support after enumerating AMFCaps interface of AMF_VIDEO_ENCODER_HEVC_CAP_MAX_REFERENCE_FRAMES.
Anyway, currently checking Ducks_HEVC_CQP25_quality.mkv.
Intra PU sizes
4x4 8x8 16x16 32x32
Inter PU sizes
8x8
8x16
16x8
16x16
16x32
32x16
32x32
32x64
64x32
64x64
Hmm......
VCE "quality" sample from nikosd
max_transform_hierarchy_depth_inter 4
max_transform_hierarchy_depth_intra 4
transform_skip_enabled_flag 0
cu_qp_delta_enabled_flag 0
pps_loop_filter_across_slices_enabled_flag 0
NvenC using GTX970
max_transform_hierarchy_depth_inter 3
max_transform_hierarchy_depth_intra 0
transform_skip_enabled_flag 1
cu_qp_delta_enabled_flag 1
pps_loop_filter_across_slices_enabled_flag 1
Well, as usual, no SAO for Polaris.
QSV TU1 Skylake
max_transform_hierarchy_depth_inter 2
max_transform_hierarchy_depth_intra 2
transform_skip_enabled_flag 0
cu_qp_delta_enabled_flag 1
pps_loop_filter_across_slices_enabled_flag 0
NikosD
22nd January 2017, 18:37
I will send him a thorough email with various small bugs I have found out.
Tell me to add features from the AMF.
JohnLai
22nd January 2017, 18:56
I will send him a thorough email with various small bugs I have found out.
Tell me to add features from the AMF.
LTR, REF, HRD conformance, option to enable VBAQ (sdk said disable by default, it doesn't mean it can't be enabled or is there serious bug with it?)
HEVC_DE_BLOCKING_FILTER_DISABLE, this one should be set to true or false if I wanna keep deblocking active?
Judging from the AMF sdk, that about it.
CruNcher
22nd January 2017, 19:36
NVEnc 3.02 (x64), using NVENC API v7.0
OS Version Windows 7 (x64)
CPU Intel Core i5-2400 @ 3.10GHz [TB: 3.30GHz] (4C/4T)
GPU #0: GeForce GTX 970 (13 EU) @ 1266 MHz (375.63)
Input Buffers CUDA, 32 frames
Input Info avsw: hevc(yv12(10bit))->nv12 [SSE2], 3840x2160, 60000/1001 fps
Vpp Filters copyHtoD
Output Info H.265/HEVC main @ Level auto
3840x2160p 1:1 59.940fps (60000/1001fps)
avwriter: hevc => mp4
Rate Control VBR2
Bitrate 25000 kbps (Max: 30000 kbps)
Initial QP I:20 P:23 B:25
VBV buf size auto
Lookahead on, 16 frames, Adaptive I, B Insert
GOP length 600 frames
B frames 0 frames
Ref frames 3 frames, LTR: on
AQ off
MV Quality Q-pel
CU max / min 32 / 8
encoded 6646 frames, 30.14 fps, 22650.44 kbps, 299.38 MB
encode time 0:03:40 / CPU Usage: 72.37%
frame type IDR 31
frame type I 31, avgQP 23.52, total size 15.46 MB
frame type P 6615, avgQP 28.61, total size 283.93 MB
For highest Quality Playback Efficiency use a Renderer with Realtime Debanding option (MadVR/MPDotNet/MPv) or push it through a TV PP Decoder Pipeline Sony/Samsung e.c.t :)
https://www.sendspace.com/file/declpg
NikosD
22nd January 2017, 19:38
LTR, REF, HRD conformance, option to enable VBAQ (sdk said disable by default, it doesn't mean it can't be enabled or is there serious bug with it?)
HEVC_DE_BLOCKING_FILTER_DISABLE, this one should be set to true or false if I wanna keep deblocking active?
Judging from the AMF sdk, that about it.
Eventually, I read both PDFs for AVC and HEVC from AMF docs and I added all the small bugs I have found out and I sent a huge email to rigaya!
Let's see what he could manage to add and fix.
thanks!
NikosD
22nd January 2017, 21:36
Anyway, currently checking Ducks_HEVC_CQP25_quality.mkv.
Intra PU sizes
4x4 8x8 16x16 32x32
Inter PU sizes
8x8
8x16
16x8
16x16
16x32
32x16
32x32
32x64
64x32
64x64
Hmm......
VCE "quality" sample from nikosd
max_transform_hierarchy_depth_inter 4
max_transform_hierarchy_depth_intra 4
transform_skip_enabled_flag 0
cu_qp_delta_enabled_flag 0
pps_loop_filter_across_slices_enabled_flag 0
Two of the bugs reported to rigaya were:
1) HW decoding of VCE v3.0 is not working like v2.0, because it uses ~25% of a 4C/4T CPU which means that one core is used at 100%, while v2.0 uses HW decoding with ~2% CPU
But v3.0 is slightly faster than v2.0
2) The Quality reported by the runtime info is always at balanced no matter what.
I mean even if I choose fast or slow (quality), the Quality info line says "Balanced"
I thought it was cosmetic but when I tried H.264 encoding it says Quality fast, balanced or slow and the variation in speed is a lot more than H.265 encoding between different presets.
So, hold your horses about the "quality" sample until rigaya replies what's really going on.
CruNcher
22nd January 2017, 23:37
375.95 Installed checking for result differences
So indeed only unification of things no change like the release notes also stated on the HEVC side.
VBR 2Pass now called like Nvidias Internal naming convention VBR High Quality and so on ;)
NVEnc 3.02 (x64), using NVENC API v7.0
OS Version Windows 7 (x64)
CPU Intel Core i5-2400 @ 3.10GHz [TB: 3.30GHz] (4C/4T)
GPU #0: GeForce GTX 970 (13 EU) @ 1266 MHz (375.63)
Input Buffers CUDA, 32 frames
Input Info avsw: hevc(yv12(10bit))->nv12 [SSE2], 3840x2160, 60000/1001 fps
Vpp Filters copyHtoD
Output Info H.265/HEVC main @ Level auto
3840x2160p 1:1 59.940fps (60000/1001fps)
avwriter: hevc => mp4
Rate Control VBR2
Bitrate 25000 kbps (Max: 30000 kbps)
Initial QP I:20 P:23 B:25
VBV buf size auto
Lookahead on, 16 frames, Adaptive I, B Insert
GOP length 600 frames
B frames 0 frames
Ref frames 3 frames, LTR: on
AQ off
MV Quality Q-pel
CU max / min 32 / 8
encoded 6646 frames, 30.14 fps, 22650.44 kbps, 299.38 MB
encode time 0:03:40 / CPU Usage: 72.37%
frame type IDR 31
frame type I 31, avgQP 23.52, total size 15.46 MB
frame type P 6615, avgQP 28.61, total size 283.93 MB
NVEnc 3.02 (x64), using NVENC API v7.0
OS Version Windows 7 (x64)
CPU Intel Core i5-2400 @ 3.10GHz [TB: 3.30GHz] (4C/4T)
GPU #0: GeForce GTX 970 (13 EU) @ 1266 MHz (375.95)
Input Buffers CUDA, 32 frames
Input Info avsw: hevc(yv12(10bit))->nv12 [SSE2], 3840x2160, 60000/1001 fps
Vpp Filters copyHtoD
Output Info H.265/HEVC main @ Level auto
3840x2160p 1:1 59.940fps (60000/1001fps)
avwriter: hevc => mp4
Rate Control VBR2
Bitrate 25000 kbps (Max: 30000 kbps)
Initial QP I:20 P:23 B:25
VBV buf size auto
Lookahead on, 16 frames, Adaptive I, B Insert
GOP length 600 frames
B frames 0 frames
Ref frames 3 frames, LTR: on
AQ off
MV Quality Q-pel
CU max / min 32 / 8
encoded 6646 frames, 29.87 fps, 22650.44 kbps, 299.38 MB
encode time 0:03:42 / CPU Usage: 70.54%
frame type IDR 31
frame type I 31, avgQP 23.52, total size 15.46 MB
frame type P 6615, avgQP 28.61, total size 283.93 MB
NVEnc 3.05 (x64), using NVENC API v7.1
OS Version Windows 7 (x64)
CPU Intel Core i5-2400 @ 3.10GHz [TB: 3.30GHz] (4C/4T)
GPU #0: GeForce GTX 970 (13 EU) @ 1266 MHz (375.95)
Input Buffers CUDA, 32 frames
Input Info avsw: hevc(yv12(10bit))->nv12 [SSE2], 3840x2160, 60000/1001 fps
Vpp Filters copyHtoD
Output Info H.265/HEVC main @ Level auto
3840x2160p 1:1 59.940fps (60000/1001fps)
avwriter: hevc => mp4
Rate Control VBRHQ
Bitrate 25000 kbps (Max: 30000 kbps)
Initial QP I:20 P:23 B:25
VBV buf size auto
Lookahead on, 16 frames, Adaptive I, B Insert
GOP length 600 frames
B frames 0 frames
Ref frames 3 frames, LTR: on
AQ off
MV Quality Q-pel
CU max / min 32 / 8
encoded 6646 frames, 30.27 fps, 22650.44 kbps, 299.38 MB
encode time 0:03:39 / CPU Usage: 72.38%
frame type IDR 31
frame type I 31, avgQP 23.52, total size 15.46 MB
frame type P 6615, avgQP 28.61, total size 283.93 MB
btw
[mpegts @ 00000000003779a0] start time for stream 1 is not set in estimate_timin
gs_from_pts
[mpegts @ 00000000003779a0] Could not find codec parameters for stream 1 (Audio:
aac ([15][0][0][0] / 0x000F), 0 channels, fltp): unspecified sample rate
Consider increasing the value for the 'analyzeduration' and 'probesize' options
so it's not really Rigayas fault
NikosD
23rd January 2017, 15:14
OK, I got some interesting replies from rigaya.
He will add REF in next version and probably add after a time LTR.
Regarding HRD, it is always enabled - no need to disable it
Deblocking filter always enabled, otherwise bad video quality.
VBAQ for HEVC always disabled, otherwise bad video quality.
Regarding HEVC -u options for quality always showing "balanced" he had no clue, but will investigate.
It could be a bug or limitation.
NikosD
23rd January 2017, 15:20
I also told him to add:
HEVC tier (main, high)
Full range color (H.264 only)
HW HEVC decoding
He could probably add them layer.
JohnLai
23rd January 2017, 16:05
Gotcha, NikosD.
Once rigaya-san adds the ref + ltr support, we can finally know if multiple reference frames are used.
I was thinking VBAQ to be acronym of Variable Bitrate Adaptive Quantization, clearly it is not....since developer said it produces bad quality.
Hmm, reading through AMF sdk source code.....where is AMD promised Two Pass encoding? There is nothing about two-pass in AMF SDK.
CruNcher
23rd January 2017, 20:39
NikosD how does the Samsung 10->8bit retranscode comes forward for you in direct compare vs the the Nvidia one i posted at the 2x reduction target for it any watchable results yet ? :)
im trying to get some result out of ffmpeg currently but that is somehow tricky with that parsing issue together not as easy as i thought and i wonder if that .ts is corrupt overall but Lav Splitter has 0 issues with it and also MPV and others show no issues and adjusting the probesize doesn't fix this very weired
i didn't got any encoding result for nvenc out of ffmpeg yet it behaves crazy with that input overall.
and it tells me it can't find any nvenc device at all tried different pixel formats and things but it acts totally weired -gpu list detects it correctly overall, pretty frustrating.
NikosD
23rd January 2017, 20:44
I'll wait a little for some basic fixes of rigaya regarding VCEENC before I try that sample.
You can see the results of HEVC encoding of the source and the samples I posted.
CruNcher
23rd January 2017, 21:17
@NikosD
Ok so only ducks, jellyfish and some desktop results to directly compare for now, not ideal but better then nothing ;)
lets hope some Intel user joins in Skylake/Kabylake then we can throw each other results around and compare though obviously neither of us would have any chance vs Quicksync overall ;)
What the ????
[graph 0 input from stream 0:0 @ 0000000000357140] Setting 'video_size' to value
'3840x2160'
[graph 0 input from stream 0:0 @ 0000000000357140] Setting 'pix_fmt' to value '7
2'
[graph 0 input from stream 0:0 @ 0000000000357140] Setting 'time_base' to value
'1/90000'
[graph 0 input from stream 0:0 @ 0000000000357140] Setting 'pixel_aspect' to val
ue '1/1'
[graph 0 input from stream 0:0 @ 0000000000357140] Setting 'sws_param' to value
'flags=2'
[graph 0 input from stream 0:0 @ 0000000000357140] Setting 'frame_rate' to value
'60000/1001'
[graph 0 input from stream 0:0 @ 0000000000357140] w:3840 h:2160 pixfmt:yuv420p1
0le tb:1/90000 fr:60000/1001 sar:1/1 sws_param:flags=2
[format @ 0000000000358620] compat: called with args=[yuv420p|nv12|p010le|yuv444
p|yuv444p16le|bgr0|rgb0|cuda]
[format @ 0000000000358620] Setting 'pix_fmts' to value 'yuv420p|nv12|p010le|yuv
444p|yuv444p16le|bgr0|rgb0|cuda'
[auto_scaler_0 @ 0000000000358b80] Setting 'flags' to value 'bicubic'
[auto_scaler_0 @ 0000000000358b80] w:iw h:ih flags:'bicubic' interl:0
[format @ 0000000000358620] auto-inserting filter 'auto_scaler_0' between the fi
lter 'Parsed_null_0' and the filter 'format'
[AVFilterGraph @ 0000000002f661e0] query_formats: 4 queried, 2 merged, 1 already
done, 0 delayed
[auto_scaler_0 @ 0000000000358b80] picking p010le out of 7 ref:yuv420p10le alpha
:0
[auto_scaler_0 @ 0000000000358b80] w:3840 h:2160 fmt:yuv420p10le sar:1/1 -> w:38
40 h:2160 fmt:p010le sar:1/1 flags:0x4
[hevc_nvenc @ 0000000003269020] Loaded Nvenc version 7.1
[hevc_nvenc @ 0000000003269020] Nvenc initialized successfully
[hevc_nvenc @ 0000000003269020] 1 CUDA capable devices found
[hevc_nvenc @ 0000000003269020] [ GPU #0 - < GeForce GTX 970 > has Compute SM 5.
2 ]
[hevc_nvenc @ 0000000003269020] 10 bit encode not supported
[hevc_nvenc @ 0000000003269020] No NVENC capable devices found
[hevc_nvenc @ 0000000003269020] Nvenc unloaded
Stream mapping:
Stream #0:0 -> #0:0 (hevc (native) -> hevc (hevc_nvenc))
Error while opening encoder for output stream #0:0 - maybe incorrect parameters
such as bit_rate, rate, width or height
[AVIOContext @ 00000000003bf520] Statistics: 0 seeks, 0 writeouts
[AVIOContext @ 00000000006b90a0] Statistics: 29381488 bytes read, 8 seeks
So it thinks i want todo a actuall 10bit encode and cancels ?
ok got it ffmpegs picky parser ;)
First result doesn't make the same quality level impression as rigayas nvencc output i posted above currently need to tweak it to the same level first options wise -preset slow itself doesn't seem on that level alone.
easyfab
24th January 2017, 20:53
for information, new Media Server Studio 2017 R2 for intel QSV
https://software.intel.com/en-us/forums/intel-media-sdk/topic/708917
JohnLai
25th January 2017, 16:25
http://rigaya34589.blog135.fc2.com/blog-entry-891.html
VCEEnc 3.01 is out.
Google translate version of changelog:
Added functions and fixed bugs.
[Common]
· Check the function of VCE at the time of execution and check the parameters.
- Added option to specify reference distance. (- ref <int>)
- Added option to specify the number of LTR frames. (- ltr <int>)
· Added H.264 Level 5.2.
- Version of AMF added to version information.
[VCEEncC]
· Fixed spelling error etc in help.
· Added option to check the function of VCE. (- check-features)
The function of HEVC can not be displayed normally.
Is this ...?
· Added HW decoding of HEVC (8 bit).
· Since wmv3's HW decoding does not work properly, it is deleted.
NikosD
25th January 2017, 16:30
I know.
I've already exchanged a few emails with rigaya :)
JohnLai
25th January 2017, 16:32
I know.
I've already exchanged a few emails with rigaya :)
:devil:
Awaiting hevc samples.
One with 6 refs
One with 6 refs + 6 LTR
NikosD
25th January 2017, 16:33
Still a few critical bugs.
I'll try though.
JohnLai
25th January 2017, 16:37
Still a few critical bugs.
I'll try though.
Don't use --pre-analysis
For some reason, this option causes my system with R7 260X H264 encoding to blue screen.
NikosD
25th January 2017, 16:39
--pre-analysis works for me, there are other bugs.
BTW, pre-analysis is the two-pass encoding you mentioned in a previous post.
JohnLai
25th January 2017, 16:42
--pre-analysis works for me, there are other bugs.
BTW, pre-analysis is the two-pass encoding you mentioned in a previous post.
That is two-pass? .....kinda disappointing ....judging from h264 vbr video quality.....
By the way, it turned out there is a bug about the blue screen with pre-analysis activated at amf github too.
https://github.com/GPUOpen-LibrariesAndSDKs/AMF/issues/62
NikosD
25th January 2017, 16:47
That is two-pass? .....kinda disappointing ....judging from h264 vbr video quality.....
I think that's what says here:
https://github.com/GPUOpen-LibrariesAndSDKs/AMF/issues/2
Have you tried CQP with pre-analysis?
JohnLai
25th January 2017, 16:54
I think that's what says here:
https://github.com/GPUOpen-LibrariesAndSDKs/AMF/issues/2
Have you tried CQP with pre-analysis?
Nope. I only tested 4 times with VBR mode. (5000kbps and 2500kbps, pre-analysis 'full' and 'none')
But it randomly blue screen out of nowhere.
Kinda not worth turning pre-analysis on with random system blue screen before it even start encoding.
By the way, are you sure pre-analysis option is really two-pass?
Cause from the video quality output....it doesn't seem so.
NikosD
25th January 2017, 17:11
Did you read the link above ?
My impression by reading that ticket closed is that is two pass encoding.
I don't have any other source for that info.
JohnLai
25th January 2017, 17:17
Did you read the link above ?
My impression by reading that ticket closed is that is two pass encoding.
I don't have any other source for that info.
I did read it.
But it is a conjecture by Xaymar (OBS studio plugin developer)
That option doesn't seem to be 'lookahead' either. Hmm....oh well, no point thinking too much about it.
NikosD
25th January 2017, 17:23
But AMD didn't reply him differently.
I'll ask rigaya if he knows something more.
CruNcher
25th January 2017, 20:27
Hmm i wonder why FFMPEG NVENC behaves so different overall then Rigayas NVEnCC Encoder i have to say i like Rigayas decission overall more even though it doesn't hit the bitrate as exact as FFMPEG NVENC does in every configured way.
NVENCC
Format : MPEG-4
Format profile : Base Media / Version 2
Codec ID : mp42 (isom/iso2/mp41)
File size : 299 MiB
Duration : 1 min 50 s
Overall bit rate : 22.7 Mb/s
Writing application : NVEncC (x64) 3.02
Video
ID : 1
Format : HEVC
Format/Info : High Efficiency Video Coding
Format profile : Main@L5@High
Codec ID : hev1
Codec ID/Info : High Efficiency Video Coding
Duration : 1 min 50 s
Bit rate : 22.7 Mb/s
Width : 3 840 pixels
Height : 2 160 pixels
Display aspect ratio : 16:9
Frame rate mode : Constant
Frame rate : 59.940 (60000/1001) FPS
Color space : YUV
Chroma subsampling : 4:2:0
Bit depth : 8 bits
Scan type : Progressive
Bits/(Pixel*Frame) : 0.046
Stream size : 299 MiB (100%)
FFMPEG NVENC
Format : MPEG-4
Format profile : Base Media
Codec ID : isom (isom/iso2/mp41)
File size : 302 MiB
Duration : 1 min 50 s
Overall bit rate : 22.8 Mb/s
Writing application : Lavf57.62.100
Video
ID : 1
Format : HEVC
Format/Info : High Efficiency Video Coding
Format profile : Main@L5@High
Codec ID : hev1
Codec ID/Info : High Efficiency Video Coding
Duration : 1 min 50 s
Bit rate : 22.8 Mb/s
Width : 3 840 pixels
Height : 2 160 pixels
Display aspect ratio : 16:9
Frame rate mode : Constant
Frame rate : 59.940 (60000/1001) FPS
Color space : YUV
Chroma subsampling : 4:2:0
Bit depth : 8 bits
Scan type : Progressive
Bits/(Pixel*Frame) : 0.046
Stream size : 302 MiB (100%)
Target for Rigaya NVENCC was 25 Mbps target for FFMPEG NVENC was 23 Mbps FFMPEG hit it pretty perfectly Rigaya missed it by a alot ;)
Rigaya is almost 2 Mbps off
So for Rigayas current setup you better calculate with that offset by default
NikosD
25th January 2017, 21:28
NikosD how does the Samsung 10->8bit retranscode comes forward for you in direct compare vs the the Nvidia one i posted at the 2x reduction target for it any watchable results yet ? :)
I tried all the sources of StaxRip but they can't demux that 10bit HEVC file and VCEEncC can't demux it too.
Find me a way to demux it in order to test it.
:devil:
Awaiting hevc samples.
One with 6 refs
One with 6 refs + 6 LTR
For you and everyone else who wants to see Polaris encoder in action.
With this source:
https://www.sendspace.com/file/zg01zb
I got these encoding results:
In order to achieve more than half of the original bit rate I used CQP 27 in all tests
1)H.264 (VCE) Polaris encoder (VCEEncC v3.01)
1st Test H.264
Encoding options:
Quality -> Balanced
Ref -> 6
LTR -> 2 (That's the MAX for both H.264 & H.265)
VBAQ -> ON
Pre-analysis -> Full
Profile -> High
Everything else set to default.
Speed -> ~100fps
Result:
https://www.sendspace.com/file/0fbm6y
2nd Test H.264
Encoding options:
Quality -> Slow
Ref -> 16
Everything else, same as 1st test.
Speed -> 55 fps
Result:
https://www.sendspace.com/file/breejv
2) H.265 (HEVC) Polaris encoder
1st Test H.265
Encoding options:
Quality -> Balanced
Ref -> 6
LTR -> 2 (That's the MAX for both H.264 & H.265)
Pre-analysis -> Auto
Everything else set to default.
Speed -> ~91fps
Result:
https://www.sendspace.com/file/dejr6x
2nd Test H.265
Encoding options:
Quality -> Slow
Ref -> 16
Everything else, same as 1st test.
Speed -> 91 fps
Result:
https://www.sendspace.com/file/kh28qr
For HEVC the problem of not being able to test other than balanced quality (like fast or slow) still exists, so although I choose Slow in the 2nd test, the output says Balanced
All of the encoded samples look great to me.
Waiting for your feedback of my samples and your quality tests comparison on the above source using Nvidia, Intel or AMD older cards.
JohnLai
26th January 2017, 12:35
Source frame number 18 (P-frame) is selected for PSNR and MSSIM analysis (0th frame is I-Frame, so I skipped it, and b-frame isn't suitable for comparison because Polaris HEVC encoding only has I and P frame) (P-frame comparison only).
1st Test H.264
PSNR
Y 15.7386
Cb 10.7270
Cr 10.7073
MSSIM
Y 0.0826
Cb 0.0422
Cr 0.0423
2nd Test H.264
PSNR
Y 15.7369
Cb 10.7270
Cr 10.7071
MSSIM
Y 0.0826
Cb 0.0421
Cr 0.0424
1st Test H.265
PSNR
Y 15.7008
Cb 10.7266
Cr 10.7066
MSSIM
Y 0.0822
Cb 0.0422
Cr 0.0424
2nd Test H.265
PSNR
Y 15.7008
Cb 10.7266
Cr 10.7066
MSSIM
Y 0.0822
Cb 0.0422
Cr 0.0424
1st Test H.264
MAX 0.0961 MSSIM / 28.5 PSNR
MIN 0.0418 MSSIM / 10.7 PSNR
2nd Test H.264
MAX 0.0961 MSSIM / 28.4 PSNR
MIN 0.0419 MSSIM / 10.7 PSNR
1st Test H.265
MAX 0.0958 MSSIM / 25.8 PSNR
MIN 0.0416 MSSIM / 10.7 PSNR
2nd Test H.265
MAX 0.0958 MSSIM / 25.8 PSNR
MIN 0.0416 MSSIM / 10.7 PSNR
Hmm...so I transcode the source video using* my current pc GTX 970 into H264 and HEVC.
GTX 970 LOOKAHEAD 32 H264 CQP 20 23 25 AQ (not using temporal aq, it just normal spatial aq) File size : 147668 KB (Probably due to B-frame usage)
PSNR
Y 15.6819
Cb 10.7267
Cr 10.7057
MSSIM
Y 0.0817
Cb 0.0424
Cr 0.0425
MAX 0.0953 MSSIM / 25.8 PSNR
MIN 0.0416 MSSIM / 10.7 PSNR
GTX 970 LOOKAHEAD 32 HEVC CQP 20 23 25 AQ File size : 193856KB
PSNR
Y 15.6819
Cb 10.7267
Cr 10.7057
MSSIM
Y 0.0817
Cb 0.0424
Cr 0.0425
MAX 0.0953 MSSIM / 25.8 PSNR
MIN 0.0416 MSSIM / 10.7 PSNR
GTX 970 LOOKAHEAD 32 HEVC CQP 29 29 29 AQ File size 79748KB
PSNR
Y 15.6818
Cb 10.7255
Cr 10.7057
MSSIM
Y 0.0819
Cb 0.0424
Cr 0.0424
MAX 0.0955 MSSIM / 25.8 PSNR
MIN 0.0417 MSSIM / 10.7 PSNR
NikosD
26th January 2017, 13:01
What are the sizes of the output files using CQP 20 23 25 ?
JohnLai
26th January 2017, 13:31
What are the sizes of the output files using CQP 20 23 25 ?
Up there....in the red color.....
NikosD
26th January 2017, 13:45
Ok...Your low numbered CQP H.264 file is double than mine and your HEVC file is almost triple, the comparison is by far not equal.
Your HEVC file is larger than the original AVC source (!)
When viewing the files of Polaris and GTX 970 which do you think look better ?
But of course at the same file size.
JohnLai
26th January 2017, 14:15
Ok...Your low numbered CQP H.264 file is double than mine and your HEVC file is almost triple, the comparison is by far not equal.
Your HEVC file is larger than the original AVC source (!)
When viewing the files of Polaris and GTX 970 which do you think look better ?
But of course at the same file size.
Comparing Nvidia HEVC CQP I 29 : P 29 with VCE? In motion, definitely Nvidia HEVC output.
In general nvidia HEVC is gonna be larger than AMD HEVC for sure.
This is due to more I-frame insertion + adaptive GOP (from Lookahead) + AQ + limited maximum CU 32x32.
By the way, VCE HEVC reference frame usage --> Also use 1 preceding frame as reference frame similar to NVENC HEVC. Your H264 samples also the same where the P-frame only makes use of 1 reference frame.
NumNegativePics 1
NumPositivePics 0
NumDeltaPocs 1
UsedByCurrPicS0 1
UsedByCurrPicS1
DeltaPocS0 -1
DeltaPocS1
EDIT: All of the VCE samples GOP = 600 , only 1 I-frame being inserted every 600 frames........
NikosD
26th January 2017, 14:31
From all the options rigaya has implemented I haven't used only the ones regarding b-frames, that I think are useless for Polaris encoder.
But there are at least two parameters of AMF that I don't know if they could be useful and rigaya hasn't implemented them so far: (for AVC and HEVC)
1) headers insertion spacing
2) IDR period
Cab you upload the small HEVC sample to sendspace ?
JohnLai
26th January 2017, 14:42
From all the options rigaya has implemented I haven't used only the ones regarding b-frames, that I think are useless for Polaris encoder.
But there are at least two parameters of AMF that I don't know if they could be useful and rigaya hasn't implemented them so far: (for AVC and HEVC)
1) headers insertion spacing
2) IDR period
Cab you upload the small HEVC sample to sendspace ?
Header = no idea
IDR period = "--gop-len" available in VCEenc, but what we actually need is adaptive gop length. Doubt AMD gonna implement it anytime soon.
Sample? This gonna take a long time using 50kb/s ADSL upload....
EDIT: wow....30 minutes to upload.....
EDIT2: Hmm? I just noticed nvencc lookahead also decides fixed 600 gop is optimal for the source material?
EDIT3: Interesting.....nvidia adaptive quantization is something to be feared, even if I set CQP of I29 and P29.....it actually varies the QP for each CU, 25 being the lowest and 32 being the highest. You can check CU QP value in the bitstream of the video I uploaded later. (52% uploading...unless if it disconnected again....)
EDIT4 : -.-....For all the VCE samples......every CU is using CU QP 27........
NikosD
26th January 2017, 14:46
I think you mean 50KByte/s which ~512Kbit/s.
It's not that bad!
JohnLai
26th January 2017, 15:25
I think you mean 50KByte/s which ~512Kbit/s.
It's not that bad!
Finally.... https://www.sendspace.com/file/mghgaq
NikosD
26th January 2017, 15:40
Thank you.
My eyes can't tell a difference between your sample and mine.
I mean no difference at all.
So, my next question is:
What is your encoding speed (FPS) for that HEVC small file ?
JohnLai
26th January 2017, 15:52
Thank you.
My eyes can't tell a difference between your sample and mine.
I mean no difference at all.
So, my next question is:
What is your encoding speed (FPS) for that HEVC small file ?
You can't tell the difference when it is in motion. Pausing it at certain frame...and you will notice it.
This is the speed.
encoded 1998 frames, 144.57 fps, 19596.57 kbps, 77.87 MB
NikosD
26th January 2017, 15:53
No, I can't tell any difference even in still images.
JohnLai
26th January 2017, 16:38
No, I can't tell any difference even in still images.
Really?
The tree? The leaf?
Facial texture?
Is there any website that can mouse over to compare two different image?
Other than the dreaded http://screenshotcomparison.com/ , can't even upload to this site.
EDIT:
Since almost all image hosters tend to autoconvert png file....
AvatarHEVCslow_track1_und.mkv_snapshot_00.03_[2017.01.26_23.24.51]
https://www.sendspace.com/file/0rffgz
6.Avatar-1080p60fps NVENC CQP I29P29 LOOKAHEAD 32 AQ.mp4_snapshot_00.03_[2017.01.26_23.25.04]
https://www.sendspace.com/file/6vg0ah
Yups
26th January 2017, 18:46
With this source:
https://www.sendspace.com/file/zg01zb
HD 630 1150 Mhz
b-frames 4, reference frames 2, Target usage 7
HEVC
CQP= 370 fps
VBR= 290 fps
JohnLai
26th January 2017, 18:53
HD 630 1150 Mhz
b-frames 4, reference frames 2, Target usage 7
HEVC
CQP= 370 fps
VBR= 290 fps
Dear Yups,
Can provide Intel QSV 10bit HEVC, b-frame 16, ref 5, TU1, scenechange sample with that source video?
I wanna check the bitstream......:)
Edit: Just use either VBR or ICQ to ensure the size is around 75 - 90 Mb.
Yups
28th January 2017, 10:52
B-frames 4 is the limit for HEVC, otherwise you run into issues.
CruNcher
28th January 2017, 11:08
Scenecuts can cause interesting fluctuations between NVEnCC and FFMPEGs NVENC on the Decoder side at the same avg bitrate target but i have no idea where this difference originates from in it's bitrate decisions yet.
Yups could you please benchmark the bitstream i posted here on your Kaby-Lake Decoder (in a rather clean system state with minimum of 2 runs)
https://www.sendspace.com/file/declpg
Nvidias GM204 Cuda Decoder Core
http://i1.sendpic.org/t/aC/aCT2XKCGnjRfhCQiPdTBwxU2U7p.jpg (http://sendpic.org/view/1/i/sq7WBQbNha5WxmPuzdYcoPvPd1M.png)
NVENCC (stream above)
Format : MPEG-4
Format profile : Base Media / Version 2
Codec ID : mp42 (isom/iso2/mp41)
File size : 299 MiB
Duration : 1 min 50 s
Overall bit rate : 22.7 Mb/s
Writing application : NVEncC (x64) 3.02
Video
ID : 1
Format : HEVC
Format/Info : High Efficiency Video Coding
Format profile : Main@L5@High
Codec ID : hev1
Codec ID/Info : High Efficiency Video Coding
Duration : 1 min 50 s
Bit rate : 22.7 Mb/s
Width : 3 840 pixels
Height : 2 160 pixels
Display aspect ratio : 16:9
Frame rate mode : Constant
Frame rate : 59.940 (60000/1001) FPS
Color space : YUV
Chroma subsampling : 4:2:0
Bit depth : 8 bits
Scan type : Progressive
Bits/(Pixel*Frame) : 0.046
Stream size : 299 MiB (100%)
SSIM
Y:0.942599 (12.410791)
U:0.967868 (14.930601)
V:0.964379 (14.482894)
All:0.950440 (13.048714)
Encode Speed = ~30 FPS (CPU/GPU)
FFMPEG NVENC (WIP)
Format : MPEG-4
Format profile : Base Media
Codec ID : isom (isom/iso2/mp41)
File size : 298 MiB
Duration : 1 min 50 s
Overall bit rate : 22.6 Mb/s
Writing application : Lavf57.62.100
Video
ID : 1
Format : HEVC
Format/Info : High Efficiency Video Coding
Format profile : Main@L5@High
Codec ID : hev1
Codec ID/Info : High Efficiency Video Coding
Duration : 1 min 50 s
Bit rate : 22.6 Mb/s
Width : 3 840 pixels
Height : 2 160 pixels
Display aspect ratio : 16:9
Frame rate mode : Constant
Frame rate : 59.940 (60000/1001) FPS
Color space : YUV
Chroma subsampling : 4:2:0
Bit depth : 8 bits
Scan type : Progressive
Bits/(Pixel*Frame) : 0.045
Stream size : 298 MiB (100%)
SSIM
Y:0.940291 (12.239596)
U:0.966039 (14.690139)
V:0.961858 (14.185987)
All:0.948177 (12.854755)
Encode Speed = ~33 FPS (CPU/GPU)
JohnLai
28th January 2017, 18:36
Scenecuts can cause interesting fluctuations between NVEnCC and FFMPEGs NVENC on the Decoder side at the same avg bitrate target but i have no idea where this difference originates from in it's bitrate decisions yet.
Nvidias GM204 Cuda Decoder Core
http://i1.sendpic.org/t/aC/aCT2XKCGnjRfhCQiPdTBwxU2U7p.jpg (http://sendpic.org/view/1/i/sq7WBQbNha5WxmPuzdYcoPvPd1M.png)
Hmm....seem like GTX970 hybrid decoding nature is the bottleneck.
http://i.imgur.com/FHozcid.png
Frame 2012 is the I-frame in the screenshot you produced
Checking the Coded Picture Buffer graph....guess the limit of CPB for our gtx970 hybrid decoding is 30 000 000, anymore than that, stutter~~~~
vBulletin® v3.8.11, Copyright ©2000-2026, vBulletin Solutions Inc.