Log in

View Full Version : What is current status for hardware H.265 encoding.


Pages : 1 2 3 4 5 [6] 7 8 9 10 11 12 13 14

CruNcher
4th March 2017, 19:00
The More CUDA Power the more tasks you can route their like 10 bit Decoding overhead ASIC is only of concern mostly on Power efficiency it makes the most sense for underpowered Shader cards or Mobile depending on the Target audience ;)

Nvidia did this in the past on many levels i remember Motion Adaptive Deinterlacing only enabled on 1 Specific Hardware and by acccident inside a Beta Driver for every Shader count, it was daunting slow and unusable on the lower Shader card though so Economicaly nonsense to keep it and possible support overhead following it and then later it became the default, as the shader power rose for the more consumer targeted cards ;)

Im pretty sure Nvidia also tests inside CUDA their future ASIC optimizations and have a pretty nice efficient conversion workflow from CUDA(GPU)->ASIC :)

The Encoder itself is still updated via CUDA additions like OpenCL can be used for x264 Lookahead Nvidia uses Cuda for their Lookahead additionally for their AQ and 2pass and most probably coming weight-b as well :)

It's fully up to Nvidias Buisness decisions what they enable where and what makes sense for which Platform and Predicted target audience System Platform resources (nowadays gathered by Nvidia via direct telemetry system data of their target customers) ;)

So if Nvidia want's to they could enable 10 Bit VP9 Decoding\Encoding Cuda accelerated on the GP102 without needing the ASIC at all, it's a pure business and resource control decision and how much sense it makes with the Shader count available for certain Targets and Usage Scenarios of Customers and Competition advances and their decisions and progress.

CruNcher
5th March 2017, 13:16
@NikosD

Nvidias HEVC Encoder already performs better then Nvidias AVC result wise HEVC is overall often sharper especially vs Nvidia AVC with B-Frames

But Metrics neither PSNR nor SSIM agree with me they would rate Nvidia AVC + B-frames results as overall better.

I cant agree.

There are 2 Main factors immediately visible Higher Sharpness (Detail retention) and better Banding avoidance at the same Target Hit without even touching AQ.

The Better Banding avoidance surprised me a little as i expected to see that mainly from 10 Bit but nope in Nvidias case their overall HEVC tuning does it better by default vs their AVC + B-frames even already heavily visible at 8 bit.

And you endup at pretty much the same Encoding speed in this case slow (AVC) vs fast (HEVC) gained almost the same overall Performance results with Nvidia HEVC leading visually overall even if PSNR/SSIM disagrees by a small margin towards Nvidia AVC.


Though surprisingly in Rigayas Encoder tuning case the Better banding avoidance by default gets pretty much lost completely and he achieves a higher PSNR/SSIM.

the highest speed i reached so far with the GTX 970 (GM204)

on the i5-2400

1080p 8mbits = 235 FPS ~9.4x @ 25 FPS
4K DCI 20 mbits = 57 FPS ~0.93x @ 59,940 FPS (almost identical to AVC slow with b-frames)


I never saw such failures in either Rigayas, FFMpegs or Nvidias own sample Encoder as your AMD result produced on the Journey sample with the Polaris Encoder no matter the bitrate.

Im pretty unsure about Rigayas work though he seems to follow a specific targeted goal that might be problematic, he seems to be to much fixed on something i really don't hope he just blindly tunes based on Metrics though.

Though i couldn't test Temporal AQ yet with AVC + B-frames it could have overall some rather big impact.

NikosD
8th March 2017, 13:20
I have exchanged a few emails today with the developer of OBS Studio (xaymar) regarding AMF (AMD's API) and Polaris HW.

He confirmed these issues:

1) Only HEVC content is affected by Pre-Analysis for now and only with CBR/VBR.
You can only really measure the impact using PSNR and SSIM, since it's visual impact is pretty much non-existent except for a few test scenes.

2) VBAQ also only works on CBR/VBR and the effects are pretty much non-existent, at least perceived effects.

3) HEVC does not really have Quality Presets. You can see a slight difference between Speed and Balanced/Quality, that is all.

But he disagrees that you can't set REF frames on H.264 encoding that he has already tested.

He told me that you just need to make sure that you are within either H264 or H265 limits.

Well, I don't know about the last one but for the first three issues we are all agree.

NikosD
8th March 2017, 15:37
Using latest VCEEnc v3.05v2 as a benchmark tool, which leverages AMF API for HW decoding/ encoding, I did some tests today on the HW H.264 decoder/encoder of Polaris RX 470 card.

It seems that the HW H.264 encoder doesn't change its encoding speed by using different rate modes (CBR/VBR, CQP) or different bitrate for CBR/VBR (maximum bitrate for VBR/CBR H.264 encoding is 100Mbps and above 700Mbps for CQP)

The one and only parameter that affects HW encoding speed is quality.

So, for 1080p H.264 clip the encoding speed is:

fast -> ~140 fps

balanced -> ~ 100 fps

slow -> ~ 55 fps

The HW H.264 decoder can keep up with the fast quality setting(~140 fps at 1080p) with clips at 60Mbps.

After that limit, the HW H.264 decoder drops its speed reaching ~95fps at 1080p100Mbps H.264 clip.

My Core i5 2400 can reach 1080p at 140fps when bitrate is 20Mbps.

So, although the Polaris HW H.264 decoder seems a little slow, it's still 3 times faster than Core i5 2400.

Can you test your Maxwell 2nd gen HW H.264 decoder/encoder that way ?

Similar HEVC tests are not feasible at the moment, as the quality parameter for HEVC is still not working (it has the same speed for all settings slow/balanced/fast)

Doing the same test with HW H.265 decoder, it seems that it has a very similar performance to HW H.264 decoder and it's a tad faster.

The HW H.265 decoder can keep up with the fast quality setting(~140 fps at 1080p) of H.264 encoder with clips at 60Mbps.

After that limit, the HW H.265 decoder drops its speed reaching ~108fps at 1080p110Mbps H.265 clip.

So it's faster in higher bitrates than HW H.264 decoder.

Benchmark tests were done based on Jellyfish (H.264 & H.265) and Birds (H.264) bitrate samples.

JohnLai
8th March 2017, 18:25
But he disagrees that you can't set REF frames on H.264 encoding that he has already tested.

He told me that you just need to make sure that you are within either H264 or H265 limits.


So, one Polaris VCE hevc or h264 1080p sample with 3 reference frame? Smaller size and shorter duration please....:D
Surely this is within the 'limits' of level 4.1?

Note: for AMD r7 260x, VCE H264 B-frame can make use of 2 reference frames just fine. However, setting B-frame to 0 = each P-frame only makes use of one preceding frame as reference.

Since Polaris VCE doesn't have B-frame support for H264....well....:(

NikosD
9th March 2017, 17:20
Note: for AMD r7 260x, VCE H264 B-frame can make use of 2 reference frames just fine. However, setting B-frame to 0 = each P-frame only makes use of one preceding frame as reference.

Since Polaris VCE doesn't have B-frame support for H264....well....:(

I have sent you a PM, but probably you didn't see it.

The reply regarding this issue was:


I-Frames may only reference the last I-Frame if in Intra-Refresh/Slice mode
P-Frames may reference the previous I-Frame or P-Frame
P-Frames may not reference the next I-Frame or P-Frame
P-Frames can reference I-Frames and P-Frames further in the past (Reference Frame range)
B-Frames must reference the next P- or I-Frame
B-Frames must reference the previous P- or I-Frame

So you can end up with the following: IPBBBPBBBPBBB.
Your Reference Frame group restarts with the I-Frame and slowly builds up to 16 frames, until it exceeds 16 frames and one of the P or B frames is dropped out.
This continues on until the next I-Frame, where it restarts again.

I have verified this behavior myself on R9 285, R9 390 and RX 480 using ffprobe and Elecard StreamEye.

JohnLai
9th March 2017, 18:58
I have sent you a PM, but probably you didn't see it.

The reply regarding this issue was:

I saw it. No time to reply yet.

P-Frames can reference I-Frames and P-Frames further in the past (Reference Frame range)

The problem = decoded picture buffer in elecard shown otherwise for Polaris HEVC sample. One can set 5 or 16 reference frames for VCE, but decoded picture buffer clearly has only ONE preceding frame as reference in use to display current frame.

Your Reference Frame group restarts with the I-Frame and slowly builds up to 16 frames, until it exceeds 16 frames and one of the P or B frames is dropped out.
For this one.....something like nvidia hevc sample. It has a lot of previously decoded frames stored in DPB, but only make use of one preceding frame as reference to display current frame.

NikosD
9th March 2017, 19:12
The problem = decoded picture buffer in elecard shown otherwise for Polaris HEVC sample. One can set 5 or 16 reference frames for VCE, but decoded picture buffer clearly has only ONE preceding frame as reference in use to display current frame.


He was clearly insisted on H.264 encoding, not HEVC.

Have you analyzed VCE H.264 samples for REF frames ?

NikosD
10th March 2017, 05:50
One can set 5 or 16 reference frames for VCE, but decoded picture buffer clearly has only ONE preceding frame as reference in use to display current frame.


I got his final reply, you were right about that from the beginning.


The AVC encoded content by default only references Index 0 (the last possible reference frame).
This is likely by design and hasn't been improved on, the VCE firmware has not seen many modifications since ATI days.

Same thing for HEVC encoded content, though the HEVC core is rather buggy at the moment.

JohnLai
10th March 2017, 09:19
He was clearly insisted on H.264 encoding, not HEVC.

Have you analyzed VCE H.264 samples for REF frames ?

Based on your Polaris VCE H264 samples last, same issue for P-frame in those samples. One ref only.

Bonaire and Hawaii VCE H264 supports B-frame where each B-frames can properly make use of 2 preceding frames as references.

Nvidia Nvenc H264 B-frame also can refer up to 4 preceding IPB frames. However, if NVENC is set not to use B-frame, then its P-frame only makes use of 1 preceding P-frame as reference.

Only Intel QSV reference frame actually somehow works for both H264 (B-frame can refer to I, P and B, meanwhile P-frame can only refer to another preceding P-frame) and HEVC (But it is GBP, not exactly IPB).

There is no perfect fixed function encoder :( . Intel, AMD and Nvidia fixed function encoders omit a lot of 'features'. Speed/quality tradeoff.
Stick with software encoders for the best quality per bitrate plus flexibility.

NikosD
10th March 2017, 09:43
There is no perfect fixed function encoder :( . Intel, AMD and Nvidia fixed function encoders omit a lot of 'features'. Speed/quality tradeoff.
Stick with software encoders for the best quality per bitrate plus flexibility.

This is not an option for H.265 encoding.

The x265 SW encoder, although it has extreme optimizations it's still extremely slow.

x264 nowadays with multicore and high frequency processors is OK.

But still, HW transcoders provide full HW hardware transcoding with 0% CPU utilization with good enough results regarding visual quality and bitrate.

CruNcher
11th March 2017, 08:09
0% is a little extreme even if you wouldn't count copy related overhead (transcoding with very low sub 1% CPU utilization) ;)

Soon new results for both Nvidia and Intel on both of their newest Cores available.

I decided to get a combined Mobile x86 Platform todo further testing not as low power limited overall as Easyfabs though (35W/45W target with roughly i expect max of 120W).

Before Raven Ridge arrives.

Should be enough Power to reach the 4K 60 fps target compared to my 170W Desktop output currently ;)

so as i said Kaby Lake + 1050 TI (GP107) both newest time to market VPU and pretty much GPU Cores as well ;)

And then later a Mobile Raven Ridge System in compare where im much much more excited about overall to see and hope AMD will have most of the Problems in the Encoder on the Software side solved by then ;)

Though the loss of the N17P-G1 is heavy compared to a 1060 you can calculate ~60% shader efficiency loss that balance feels wrong for just 10/12 bit VP9/HEVC Decoding/Encoding.

only with really good code optimization you end at the exact 50% loss

this would also impact and hit the encoder overall in terms of its cuda performance parts by a not so negligible performance amount.

Which also pretty much is the Performance of my GTX 970 Desktop GM204 that 50% loss should be currently what i drive @ 170W out of my higher Maxwell Shader count on the Encoder side.

Which shows once again Pascal is nothing more then a Paxwell ;)

CruNcher
19th March 2017, 12:02
@easyfab
could you please post your current dxvachecker output

it is identical to this right ?

http://www.notebookcheck.com/fileadmin/Notebooks/MSI/CX72-7QL/dxva_kaby.png

easyfab
19th March 2017, 13:28
@CruNcher

here the mine : http://pastebin.com/sfx4z608

It's the same with some name modification ( newer version ? )

easyfab
19th March 2017, 13:52
I aslo tried some AVC encode with vaapi.
And with this (https://github.com/01org/intel-vaapi-driver/pull/74) It become really good. I gain more than 1db for SSIM 16->17 . I need to test a litlle more but it could be better than QSV. Vaapi HEVC is not so good for the moment.
I also tried VP9 HW encode with libyami but I Couldn't set bitrate correcltly. I will wait that libav got it in (https://lists.libav.org/pipermail/libav-devel/2017-March/082884.html) to make some more tests.

CruNcher
19th March 2017, 13:52
yes its from the Ultrabook CPU line introduction end of 2016 by reviewers little older also before

Added "VP9_VLD_Intel" as an alternative name of Decoder Device

though ultrbook cpu seems to be rather rare for flexible Mobiles with at least thunderbolt 3 connection and Nvidia GPU combination

or i dont see it in the list im studying because the products are ridiculously higher priced ;)

when we go lower then 120W Max Mobile Platform with still pretty nice and sufficient overall CPU omph ;)

Though 120W Max ist still more efficient then the PS4 Pro overall, though also far away in price *grml* ;)

above that you go into Game Marketing Branding area and then it starts to become really expensive for shit you don't really want ;)

my current favors are

i5-7300HQ
i5-7200U
i5-7300U

:)

The Lenovo Yoga 720-13IKB was my favorite though with 1050 ti practically for a very small uptake now in addition for the HQ series it seems a very bad overall decision now even with that higher tdp on the CPU side and max of 120W TSP is still very nice.

a 7200U/7300U featuring the GTX 1050/TI would be nice though :)

Though investments for all those current OEM seems to high i wonder if Microsoft could pull it of based on their SurfaceBook Designs they should have the know how needed to bring it in, though it would be more expensive again, way more expensive ;)

Though maybe i go total insane and build the entire thing based on the new i7-7500U 2in1s and the whole GTX 1050 TI GPU chain sideways with Thunderbolt 3 and transfer it into a 3in1 that way with ~20W higher efficiency then a i7-7700HQ series CPU for the Core Unit ;)

http://www.notebookcheck.com/fileadmin/Notebooks/MSI/CX72-7QL/hevc10.png

The cost for that efficiency would still somewhat acceptable and i would have some future headroom for some years (if everything goes well) with that system though very high overall investment to amortize i would surely get somewhere in the range of a current Surfacebook for the complete setup (never paid or even thought to pay so much for a x86 build) :D

Though not really fair for a future Raven Ridge AM4 compare far away from AMDs Target overall, though would be interesting to see how their 4 Cores will holdup vs the overall efficiency.

Definitely the small lead they had buildup with their Carizzo UVD decision or should we rather say wrong time to market planing faded away by now entirely.

i5/i7-7x00U will become very popular hard to bite through it for AMD without a overall good balance and pricing on the Mobile side again.

easyfab
19th March 2017, 19:09
Just for fun a preview of HW VP9 Encode ( with experimental VP9_vaapi ). speed 40fps on my little hd graphics 615. I thought it would be worse

https://www.sendspace.com/file/qees6f

CruNcher
19th March 2017, 20:54
Must be a premiere first VP9 bitstream out of a GPU Accelerated Encoder :)

i guess quality and stability wont be up to what we showcased here so far on the HEVC side and you on TU7 ?

PS: Ho looks partly even more stable then NikosD current VCE result, but obviously being worked on ;)

easyfab
19th March 2017, 21:45
And another expimental HW VP9_vaapi encode but with -bf 3 option ( 3 bframes or whatever it's called for VP9 )
IMO it's a little better with it than without. I will check with SSIM/PSNR later be confirm.

https://www.sendspace.com/file/ei5z1y

CruNcher
20th March 2017, 00:23
This is a also a very nice efficient combination overall

Lenovo created a very sane configuration here but it didn't showed up in my list after the Yoga so surely not that cheap and most probably they will have eradicated the thunderbolt port :D

https://www.youtube.com/watch?v=kK84987AOwg

It will be super interesting to see if AMD really can get up to that efficiency with Zen now :)

Pretty awesome results on that channel if you think about the max power constraints.

Nvidias DCE is so freakting efficient

Some pretty nice low latency recordings FBC most probably going directly into NVENC :D

https://www.youtube.com/watch?v=mwiY2OOz1sY

Predator,Aorus,Rog, MSI Gaming and now we have the Legion Brand.

luigizaninoni
22nd March 2017, 18:59
excuse me, where can I find an explanation of the various options of intel h265 hardware encoder (kaby lake) ? I am using qsvencc via staxrip, but there are over 70 parameters and I really can't understand what most of them do

JohnLai
30th April 2017, 18:14
Hmm....seem like AMF 1.4.2 was released few days ago.....
https://github.com/GPUOpen-LibrariesAndSDKs/AMF/commit/c7f29fec2326a253d319bb569cbc183b24504df9

NikosD
30th April 2017, 18:25
I'm following this link since the beginning:
https://github.com/GPUOpen-LibrariesAndSDKs/AMF/issues

In almost every new driver, AMD updates with fixes the AMF runtime API, but unfortunately this is not the case lately.

They have allocated a lot of resources (developers) to VEGA drivers and RyZen and so AMF is low priority nowadays for AMD.

JohnLai
30th April 2017, 18:39
I'm following this link since the beginning:
https://github.com/GPUOpen-LibrariesAndSDKs/AMF/issues

In almost every new driver, AMD updates with fixes the AMF runtime API, but unfortunately this is not the case lately.

They have allocated a lot of resources (developers) to VEGA drivers and RyZen and so AMF is low priority nowadays for AMD.

Same case with Nvidia.
Been waiting for SDK8.0 with these two interesting features:
•High-bit-depth (10/12-bit) decoding (VP9/HEVC)
•Weighted Prediction
Said to be enabled for 378.66 (driver released on 2017.2.14).
No news till now. The only official statement is " SDK 8.0 will be released shortly. "

Also reported the weird chrominance and luminance full range/limited range bug caused by nvdec + nvenc transcoding since last year, but still not yet fixed.

Meanwhile, Intel.....well.....totally no news on new features. Where is lookahead algorithm for QSV hevc? It would be great if Intel adds adaptive quantization.:(

nevcairiel
30th April 2017, 21:32
NVIDIA GTC is next week, maybe they'll use the chance to release the new SDK.

JohnLai
8th May 2017, 18:31
NVIDIA GTC is next week, maybe they'll use the chance to release the new SDK.

Well, you are right.
Video Codec SDK 8.0 was released just now......


Meanwhile....@NikosD,I leave it to you for informing rigaya about the update. :p

NikosD
8th May 2017, 18:33
Meanwhile....@NikosD,I leave it to you for informing rigaya about the update. :p

Don't worry!

He knows everything before us ;)

JohnLai
9th May 2017, 15:27
Huh...seem like spatial AQ for NVENC HEVC had been enabled at driver (378.66) level.
Playing around with rigaya nvencc --aq-strength 1 to 15 and driver aq default (using rate control = CQP I20:P23)

Driver AQ =minqp 12 maxqp 27 size 29247KB

AQ1 = 19 23 28315
AQ4 = 15 25 28193
AQ8 = 12 27 29247
AQ12 = 9 29 34994
AQ14 = 7 30 41234
AQ15 = 7 31 45051

EDIT 30-05-2017: Apparently, weighted prediction feature is using CUDA cores. I wonder why Nvidia makes this feature exclusive to Pascal only. If it is cuda cored based, it can be backported to maxwell series or even kepler.

hajj_3
25th September 2017, 09:47
https://i.imgur.com/RPDt9iA.png

check out the small print for the new 8th gen intel coffee lake cpu's announced today. Those big figures on the slide are when comparing to 4000 series intel chips not 7000 series chips. Ridiculously misleading. Would be nice to know if there are any encoding/decoding performance/quality improvements in these 8th gen chips but no info has been released so far to my knowledge.

nevcairiel
25th September 2017, 09:55
Please don't post enormous images like that, it blows up the entire forum layout.

hajj_3
25th September 2017, 10:02
Please don't post enormous images like that, it blows up the entire forum layout.

doom9 should upgrade the forum software so that it automatically scales images to the size of the post with the ability to enlarge.

WhatZit
26th September 2017, 00:15
Those big figures on the slide are when comparing to 4000 series intel chips not 7000 series chips. Ridiculously misleading.

4th Generation Intel is the median that most people own, and most of those "most people" are due to think about an upgrade right about now. For example, I average a 4-5 year upgrade cycle, so "now" is perfect timing for my 3rd Gen gear.

What Intel are desperate to do is market an Intel solution for the 4th Gen owners rather than have them explore an AMD solution.

Would be nice to know if there are any encoding/decoding performance/quality improvements in these 8th gen chips but no info has been released so far to my knowledge.

I'd LOVE to see a QSV MAIN10 vs MAIN HEVC quality shootout, but I can't find anything even remotely like it online. Guess I'll be seeing for myself once I upgrade in a couple of months.

NikosD
27th September 2017, 20:14
Would be nice to know if there are any encoding/decoding performance/quality improvements in these 8th gen chips but no info has been released so far to my knowledge.

According to anandtech.com and various sources, the iGPU of "8th" generation Core is the same as the iGPU of the 7th generation Core.

The only difference is clock speed.

ShogoXT
18th February 2018, 02:10
Hi everyone sorry for the bit of a old bump, but I feel this thread is still relevant today.

Live x264 encoding works right now, but I feel like because of how complex x265, vp9, and av1 will become, hardware encoders will surely take over vs expensive stream pcs in terms of "adequate" quality vs speed. I was pleasantly surprised to hear HEVC Nvenc was actually nearly equal to x264 on medium.

Now for my question. I been playing with stream settings on the program OBS, messing with custom x264 commands and such. Now I have been testing out nvenc more.

Does anyone know what 2 pass does exactly? Is it valuable for live encoding purposes? For Twitch it doesn't need to be super low latency (usually hits view at 15-20 seconds) setting which I assume those presets are meant for Nvidia Shield and such.

I always thought the asic encoders weren't that great at self analysis as they have more strict limitations, so most of the time you wanted to run it without that option, and the second encoder was meant for dual streaming.

The only relevant post I could find was this:
https://obsproject.com/forum/threads/nvenc-in-0-14-1.46986/page-3#post-211863

Thanks

Selur
18th February 2018, 21:01
Does anyone know what 2 pass does exactly?
It encodes each frame twice, was what it said in one of the NVEnc SDK pdfs iirc. :)
I doubt that it's useful for lice encoding, but 'usually hits view at 15-20 seconds' might be enough to use this.

IgorC
18th February 2018, 21:08
I was pleasantly surprised to hear HEVC Nvenc was actually nearly equal to x264 on medium.
Thanks
Intel H.265 encoder with speed@60fps is on par with x264 Placebo.
And that's actually old their H.265 encoder. Their new one should be even better.

http://www.compression.ru/video/codec_comparison/hevc_2016/

http://www.compression.ru/video/codec_comparison/hevc_2016/figures/graph3.png

nevcairiel
18th February 2018, 22:34
Note that the Intel Media Server Studio (MSS) HEVC Encoder is not what you get when you do "QuickSync" Encoding, its a separate and commercial product you have to buy (its one of the features missing from the free Community-edition of Intel Media Server Studio) - and its expensive.

benwaggoner
19th February 2018, 01:46
As codecs have gotten more complex, they've become less suitable for ASIC and GPU acceleration. So many coding options to choose between means latency between main and HW memory becomes a huge slowdown. And CPU's look a lot more like DSP these days with many cores and SIMD functionality like AVX2 and beyond.

Fixed-function encoders need to tape out well before they hit market, and psychovisual tuning of more complex standards takes a long time and makes a huge difference. So no fixed-function solution will be able to approach quality of the software encoders available by the time they are available in products.

Broadcast-grade encoders have increasingly moved to CPU and software defined solutions over the last decade.

I can see FPGA accelerated potentially being competitive, but we haven't seen any competitive real-world implementations yet.

IgorC
20th February 2018, 23:18
Note that the Intel Media Server Studio (MSS) HEVC Encoder is not what you get when you do "QuickSync" Encoding, its a separate and commercial product you have to buy (its one of the features missing from the free Community-edition of Intel Media Server Studio) - and its expensive.

Yes, it's most likely cheaper to buy 16-32 core CPU and use x265 at that point.

hajj_3
21st February 2018, 00:55
Note that the Intel Media Server Studio (MSS) HEVC Encoder is not what you get when you do "QuickSync" Encoding, its a separate and commercial product you have to buy (its one of the features missing from the free Community-edition of Intel Media Server Studio) - and its expensive.

Does their commercial product have better compression than their quicksync then? Are there are reviews comparing them or public specifications of the differences, would be nice to know.

easyfab
21st February 2018, 10:50
Last time I try, h264_qsv was better than hevc_qsv and 2x faster.

h264_qsv has more options available ( look-ahead, B-pyramid .... )

I see that with latest drivers b-pyramid and weighted-b frame is in. I will try to do some tests to see the improvements.

nevcairiel
21st February 2018, 10:57
Does their commercial product have better compression than their quicksync then? Are there are reviews comparing them or public specifications of the differences, would be nice to know.

Its entirely unrelated products, so yes, much better. Why would anyone buy it otherwise? :)
The Intel MSS Encoder is basically a software encoder with hardware acceleration, while QuickSync is a full hardware encoder. Full hardware is faster, but as outlined in various posts above, also quite limited in quality.

hajj_3
5th April 2018, 13:53
Nvidia 8.1 SDK is out, supports b-frames in h264 and other improvements: https://developer.nvidia.com/nvidia-video-codec-sdk

shades
17th April 2018, 00:35
As codecs have gotten more complex, they've become less suitable for ASIC and GPU acceleration. So many coding options to choose between means latency between main and HW memory becomes a huge slowdown. And CPU's look a lot more like DSP these days with many cores and SIMD functionality like AVX2 and beyond.

Fixed-function encoders need to tape out well before they hit market, and psychovisual tuning of more complex standards takes a long time and makes a huge difference. So no fixed-function solution will be able to approach quality of the software encoders available by the time they are available in products.

Broadcast-grade encoders have increasingly moved to CPU and software defined solutions over the last decade.

I can see FPGA accelerated potentially being competitive, but we haven't seen any competitive real-world implementations yet.

What about stuff like this?
http://www.advantech.com/products/pci-express-cards/sub_half-length_pcie_card
And there are cheap x264 cards on eBay

Has anyone had any luck (reasonable qualtiy output) with this sort of stuff?

foxyshadis
17th April 2018, 04:10
What about stuff like this?
http://www.advantech.com/products/pci-express-cards/sub_half-length_pcie_card
And there are cheap x264 cards on eBay

Has anyone had any luck (reasonable qualtiy output) with this sort of stuff?

Broadcast encoding is never particularly efficient, but they get around that by throwing tons of bitrate at the problem. When they don't, quality suffers enormously. Those cards are basically NvEnc on steroids, using simple searches and no RDO with extremely fast memory -- and a few bottlenecks turned into inflexible hardware paths -- to get better-than-CPU speed, rather than being hyper-efficiently designed around the nuances of the spec. RDO in particular is still a parallel-killer, since the best decision depends on the exact encoding of the last best decision.

When predictable speed is all that matters, hardware will always be key, but there's a low ceiling for the maximum quality you can get out of an FPGA/ASIC without reducing it to near-CPU speed.

Intel MSS is a different beast; it's essentially a good software encoder with lots of knobs to twiddle with a few hardware-accelerated paths, but it gets most of its speed gain by being paired with insanely fast eDRAM on Iris Pro models. Otherwise, it's just SW encoder speed to go with SW encoder quality. Nvidia or AMD could dedicate their GDDR5 to it to crunch a lot harder for the same fps, but they haven't seemed willing to so far; cheap and good enough is good enough.

foxyshadis
17th April 2018, 04:12
Note that the Intel Media Server Studio (MSS) HEVC Encoder is not what you get when you do "QuickSync" Encoding, its a separate and commercial product you have to buy (its one of the features missing from the free Community-edition of Intel Media Server Studio) - and its expensive.

That changed last week, Intel released their full HEVC GPU module in the Windows 2018 R1 community release. They haven't yet done it in the Linux version for some reason.

shades
18th April 2018, 02:27
That changed last week, Intel released their full HEVC GPU module in the Windows 2018 R1 community release. They haven't yet done it in the Linux version for some reason.

I just downloaded the Linux linked version from the Intel site if you're interested.

MediaServerStudioEssentials2018R1.tar.gz

shades
18th April 2018, 02:35
Broadcast encoding is never particularly efficient, but they get around that by throwing tons of bitrate at the problem. When they don't, quality suffers enormously. Those cards are basically NvEnc on steroids, using simple searches and no RDO with extremely fast memory -- and a few bottlenecks turned into inflexible hardware paths -- to get better-than-CPU speed, rather than being hyper-efficiently designed around the nuances of the spec. RDO in particular is still a parallel-killer, since the best decision depends on the exact encoding of the last best decision.

When predictable speed is all that matters, hardware will always be key, but there's a low ceiling for the maximum quality you can get out of an FPGA/ASIC without reducing it to near-CPU speed.

Intel MSS is a different beast; it's essentially a good software encoder with lots of knobs to twiddle with a few hardware-accelerated paths, but it gets most of its speed gain by being paired with insanely fast eDRAM on Iris Pro models. Otherwise, it's just SW encoder speed to go with SW encoder quality. Nvidia or AMD could dedicate their GDDR5 to it to crunch a lot harder for the same fps, but they haven't seemed willing to so far; cheap and good enough is good enough.


VEGA-3310
4K HEVC Broadcast Video Encoding/ Decoding / Transcoding Card
35W seems a lot less than nVidia card, still, I haven't tried one. Here's a blurb from the Datasheet.

"The technology behind VEGA-3310 can do the same task in under 35W, and VEGA-3310 can also support
up to 4Kp120 high frame rate for next generation sports broadcasts and 360 degree VR applications..
This card feature a simple-to-use API and example code for FFmpeg and GStreamer multimedia frameworks to streamline product development and integration into existing applications."

xabregas
19th April 2018, 13:45
The biggest improvements might be the new amazon hevc CRAP when comparing to their old H264.

Congrats to all the people that are working in HEVC. So many years to finnaly sell something thats worst in every way against previous stuff. Instead of improving...

Yeah Yeah You save bandwith at the cost of our eyes. Same quality my ass. Its a macroblock and banding feast in every official BS stream i put my eyes on.

Even some UHD blurays suffer....

shades
22nd April 2018, 23:47
The biggest improvements might be the new amazon hevc CRAP when comparing to their old H264.

Congrats to all the people that are working in HEVC. So many years to finnaly sell something thats worst in every way against previous stuff. Instead of improving...

Yeah Yeah You save bandwith at the cost of our eyes. Same quality my ass. Its a macroblock and banding feast in every official BS stream i put my eyes on.

Even some UHD blurays suffer....

You must be doing something different to me. I can see a clear video improvemnt at the same bitrate using handbrake and x265 over x264. Mind you, it's using software encoding, not hardware.
This is using a BluRay rip of media, which is already encoded. So, If I rip say 50 gig, then recode it down to 3 gig, x265 wins every single time hands down. Recoding to x264 at 3 gig size looks terrible in comparison.