Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Hardware & Software > Software players

Reply
 
Thread Tools Search this Thread Display Modes
Old 4th February 2015, 16:26   #281  |  Link
NikosD
Registered User
 
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,661
Thanks for the prompt and detailed reply.
__________________
Win 10 x64 (18362.388) - Core i3-9100F - nVidia 1660 (436.15)
HEVC decoding benchmarks
H.264 DXVA Benchmarks for all
NikosD is offline   Reply With Quote
Old 10th February 2015, 15:41   #282  |  Link
Neet009
Registered User
 
Join Date: Jan 2014
Posts: 21
CPU:core2quad Q9550
GPU:GTX 960 (driver 347.25)
OS:win7 x64

I did some test with LAV 0.63.0.75-git:
Astra clips(4K 10bit HEVC), Copy-Back direct mode
DXVAChecker x64, decode, P010 out
76~78 fps , 15% CPU
DXVAChecker x64, decode, NV12 out
92~93 fps, 14% CPU

DXVAChecker x86, decode, P010 out
67~69 fps, 15% CPU
DXVAChecker x86, decode, NV12 out
86~88 fps, 14% CPU

Ducks-2160p@50fps-4Mbps, Copy-Back direct mode
DXVAChecker x64, decode
152~154 fps, 15% CPU

DXVAChecker x86, decode
153~154 fps, 16% CPU

It seems the 10bit results are slower than Nevcairiel's test. What reason is likely to make a difference? Is my cpu too old or lack of some instruction sets?

Last edited by Neet009; 10th February 2015 at 15:44.
Neet009 is offline   Reply With Quote
Old 10th February 2015, 16:49   #283  |  Link
cyberbeing
Broadband Junkie
 
Join Date: Oct 2005
Posts: 1,859
Quote:
Originally Posted by Neet009 View Post
It seems the 10bit results are slower than Nevcairiel's test. What reason is likely to make a difference? Is my cpu too old or lack of some instruction sets?
Assuming nevcairiel's i7-4770K results still hold true, it's probably both your CPU & Memory subsystem being too slow, especially if you see a similar performance difference with P010 vs NV12 output when using software decoding of 10bit H264 video. P010 output requires transferring 16bit (padded) data, while NV12 is only 8bit data. Use of P010 output doubles the bandwidth requirements, the amount of data being transferred across system memory, and to/from your GPU over PCI-E. It's not abnormal for the overhead of outputting P010 directly to be larger than the CPU overhead of converting to NV12, especially on an older system.

Last edited by cyberbeing; 10th February 2015 at 16:53.
cyberbeing is offline   Reply With Quote
Old 10th February 2015, 21:44   #284  |  Link
nevcairiel
Registered Developer
 
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 9,796
Note that I'm on a i7-5930K now with DDR4 memory, so my memory performance is quite good, and core 2 is quite old, the memory may be much more of a bottleneck there than on newer CPUs.

Interesting is the difference between NV12 and P010. LAV has to read the full P010 from the GPU in any case, in the NV12 case its just converted before writing to sysmem (in processor registers), so that would clearly point to the system memory being a bottleneck here, since the only major difference is the amount of memory being written into system memory.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders

Last edited by nevcairiel; 10th February 2015 at 23:53.
nevcairiel is offline   Reply With Quote
Old 11th February 2015, 15:40   #285  |  Link
Neet009
Registered User
 
Join Date: Jan 2014
Posts: 21
OK, it seems that system memory would be bottleneck on some old systems when using 4K 10bit HW decoding. Thank you for explaining.

Last edited by Neet009; 11th February 2015 at 15:42.
Neet009 is offline   Reply With Quote
Old 11th February 2015, 16:53   #286  |  Link
NikosD
Registered User
 
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,661
Quote:
Originally Posted by Neet009 View Post
CPU:core2quad Q9550
GPU:GTX 960 (driver 347.25)
OS:win7 x64
Could you test using your GTX 960 with latest DXVA Checker 3.3.2 x64 in DXVA native and "decode" mode using latest LAV x64 0.63.0.75 the clip below ?

http://xhmikosr.1f0.de/samples/2160p...x264.CRF25.mkv


It's a 2160p H.264 video with huge bitrate.
__________________
Win 10 x64 (18362.388) - Core i3-9100F - nVidia 1660 (436.15)
HEVC decoding benchmarks
H.264 DXVA Benchmarks for all

Last edited by NikosD; 11th February 2015 at 17:09.
NikosD is offline   Reply With Quote
Old 11th February 2015, 18:50   #287  |  Link
Neet009
Registered User
 
Join Date: Jan 2014
Posts: 21
Quote:
Originally Posted by NikosD View Post
Could you test using your GTX 960 with latest DXVA Checker 3.3.2 x64 in DXVA native and "decode" mode using latest LAV x64 0.63.0.75 the clip below ?
OK!
DucksTakeOff_2160p50.x264.CRF25
DXVAChecker 3.3.2 x64, DXVA native, decode
78.5 fps, cpu 2% (Test 5 times, Avg)

DXVAChecker 3.3.2 x64, DXVA copy-back direct, decode
77.9 fps, cpu 9%

Last edited by Neet009; 11th February 2015 at 19:15.
Neet009 is offline   Reply With Quote
Old 11th February 2015, 23:12   #288  |  Link
NikosD
Registered User
 
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,661
Quote:
Originally Posted by Neet009 View Post
OK!
DucksTakeOff_2160p50.x264.CRF25
DXVAChecker 3.3.2 x64, DXVA native, decode
78.5 fps, cpu 2% (Test 5 times, Avg)

DXVAChecker 3.3.2 x64, DXVA copy-back direct, decode
77.9 fps, cpu 9%
Thanks! I thought so!

It seems that Nvidia uses the same H.264 decoder for 960, like the previous models VP6.

They didn't improve H.264 performance, so it's still a 4K80 fps H.264 decoder.

Just for comparison, my Core i7-4790- iGPU HD 4600 (GT2) at 1.5GHz has these results:

DucksTakeOff_2160p50.x264.CRF25
DXVAChecker 3.3.2 x64, DXVA native, decode
200 fps, cpu 1% (Test 5 times, Avg)
__________________
Win 10 x64 (18362.388) - Core i3-9100F - nVidia 1660 (436.15)
HEVC decoding benchmarks
H.264 DXVA Benchmarks for all
NikosD is offline   Reply With Quote
Old 14th February 2015, 11:21   #289  |  Link
NikosD
Registered User
 
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,661
Quote:
Originally Posted by Neet009 View Post
CPU:core2quad Q9550
GPU:GTX 960 (driver 347.25)
OS:win7 x64
Quote:
Originally Posted by nevcairiel View Post
NVIDIA GTX 960 Benchmarks
I encoded a 2160p50fps-HEVC_Main10 file with an average bitrate of 128Mbps close to BluRay UHD.

The clip is here:
http://www.filedropper.com/crowdrun2...main10-127mbps

I also added, just for testing reasons, a 300Mbps clip here:
http://www.filedropper.com/crowdrun2...cmain10300mbps

I would like to see some results using LAV x64 v0.64 DXVA-CBD of Nvidia 960 of both decode and playback modes.

I'm interested in MIN/AVG/MAX fps.

Using my Core i7-4790 the results are:

1st clip (130Mbps)
Decode: 20/38/42
Playback (1280x720):12/33/35

2nd clip (300Mbps)
Decode: 3/27/30
Playback (1280x720):0/24/28
__________________
Win 10 x64 (18362.388) - Core i3-9100F - nVidia 1660 (436.15)
HEVC decoding benchmarks
H.264 DXVA Benchmarks for all

Last edited by NikosD; 14th February 2015 at 12:46. Reason: Added a 300Mbps clip, just for testing
NikosD is offline   Reply With Quote
Old 14th February 2015, 12:05   #290  |  Link
nevcairiel
Registered Developer
 
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 9,796
Decode (P010): 94/100/104
Decode (NV12): 94/100/105
Playback (NV12, 720p): 36/89/101

I'm not sure when the Playback min value occured, it did seem to run through the video very fast at all times. Maybe right at the start when it was still loading or something. Not sure I would put too much value on it.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders
nevcairiel is offline   Reply With Quote
Old 14th February 2015, 12:48   #291  |  Link
NikosD
Registered User
 
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,661
Quote:
Originally Posted by nevcairiel View Post
Decode (P010): 94/100/104
Decode (NV12): 94/100/105
Playback (NV12, 720p): 36/89/101

I'm not sure when the Playback min value occured, it did seem to run through the video very fast at all times. Maybe right at the start when it was still loading or something. Not sure I would put too much value on it.
Just added a 300Mbps test clip.

It's either that you write or the clip has some spikes with huge bitrate.

A simple bitrate viewer I have it's not compatible with HEVC in order to check it out.
__________________
Win 10 x64 (18362.388) - Core i3-9100F - nVidia 1660 (436.15)
HEVC decoding benchmarks
H.264 DXVA Benchmarks for all
NikosD is offline   Reply With Quote
Old 14th February 2015, 16:28   #292  |  Link
Neet009
Registered User
 
Join Date: Jan 2014
Posts: 21
My results:
(GTX 960, Q9550 3.4GHz, DDR3-1333 16G)
LAV x64 v0.64 DXVA-CBD

1st clip (130Mbps)
Decode(P010): 73/80/83
Decode(NV12): 84/92/95
Playback (1280x720):54/85/91

2nd clip (300Mbps)
Decode(P010): 60/78/82
Decode(NV12): 64/79/84
Playback (1280x720):39/75/84
Neet009 is offline   Reply With Quote
Old 17th February 2015, 20:10   #293  |  Link
P.J
🎸
 
Join Date: Jun 2008
Posts: 513
LAV 0.64.0 got much better but the only problem is the glitches
The video doesn't play smoothly while the CPU/GPU are ~50%
P.J is offline   Reply With Quote
Old 18th February 2015, 10:03   #294  |  Link
NikosD
Registered User
 
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,661
For anyone interested in testing, I encoded a 8K (7680x4320) HEVC clip at 60fps in both 10bit and 8bit versions.

Of course, you can't HW accelerate the decoding of those clips even if you have a 960 card, due to high resolution.

You need a very fast CPU.

The clips are just 1 sec and very small.

Clip 1- HEVC 4320p60fps -10bit
http://www.filedropper.com/crysis3-7...1sec-hevc10bit

Clip 2 - HEVC 4320p60fps -8bit
http://www.filedropper.com/crysis3-7...1mbps1sec-hevc


Tests

DXVA Checker x64 3.3.2 - LAV x64 v0.64 - Core i7 4790


Clip 1

Decode 2/16/17


Clip 2

Decode 10/21/22


Quote:
Originally Posted by nevcairiel View Post
Note that I'm on a i7-5930K now with DDR4 memory, so my memory performance is quite good, and core 2 is quite old, the memory may be much more of a bottleneck there than on newer CPUs.
I'm sure you are going to need a faster processor, like Core i7-5960X for realtime decoding of those 8K HEVC clips
__________________
Win 10 x64 (18362.388) - Core i3-9100F - nVidia 1660 (436.15)
HEVC decoding benchmarks
H.264 DXVA Benchmarks for all
NikosD is offline   Reply With Quote
Old 18th February 2015, 10:41   #295  |  Link
foxyshadis
ангел смерти
 
foxyshadis's Avatar
 
Join Date: Nov 2004
Location: Lost
Posts: 9,413
What was the cpu utilization like? Pegging all 8 threads? I wonder how much of that is just CABAC processing at 140mbps.
__________________
There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order.
foxyshadis is offline   Reply With Quote
Old 18th February 2015, 12:45   #296  |  Link
NikosD
Registered User
 
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,661
It sure uses all 8 threads and the CPU utilization is on average ~80% by putting 16 threads in LAV Video properties, as in the above benchmark tests.

Looking at the decoding performance of the 4K HEVC 10bit ~130Mbps, it has more than double fps (38 fps vs 16 fps) than 8K HEVC 10bit ~140Mbps.

So, the resolution plays the most important role for decoding.
__________________
Win 10 x64 (18362.388) - Core i3-9100F - nVidia 1660 (436.15)
HEVC decoding benchmarks
H.264 DXVA Benchmarks for all
NikosD is offline   Reply With Quote
Old 21st February 2015, 05:29   #297  |  Link
Aleksoid1978
Registered User
 
Aleksoid1978's Avatar
 
Join Date: Apr 2008
Location: Russia, Vladivostok
Posts: 2,239
Hi all.
I buy GTX 960, test LAV Video Decoder on HEVC 10bit, then modify MPC-BE's video decoder. And ... perfect work DXVA HEVC 10bit, EVR Custom accept P010 DXVA input.
Here test build - http://aleksoid.voserver.net/MPC-BE/...VC_4K_10Bit.7z - it's test build and work only 10bit HEVC DXVA decoder.

P.S. Only MPC-BE' EVR Custom can accpet P010 DXVA.
__________________
I7 2600K@4.2 /Asrock P67 Extreme4 Gen 3 /Kingston HyperX 8Gb 1866 (4x2) Kit /OCZ Vertex 3 256Gb /Gigabyte GTX 960 /BenQ EW2430 /LG 47LM620T /Yamaha RX-V471 + NS-555 + NS-C444 + NS-333 + YST-SW215
Aleksoid1978 is offline   Reply With Quote
Old 21st February 2015, 07:17   #298  |  Link
NikosD
Registered User
 
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,661
Quote:
Originally Posted by Aleksoid1978 View Post

P.S. Only MPC-BE' EVR Custom can accpet P010 DXVA.
So, does that mean that you can use DXVA native with 10bit HEVC ?

I asked nevcairiel about EVR-CP but he didn't answer.

What's the difference between MPC-HC EVR custom and MPC-BE EVR custom ?
__________________
Win 10 x64 (18362.388) - Core i3-9100F - nVidia 1660 (436.15)
HEVC decoding benchmarks
H.264 DXVA Benchmarks for all
NikosD is offline   Reply With Quote
Old 21st February 2015, 07:24   #299  |  Link
Aleksoid1978
Registered User
 
Aleksoid1978's Avatar
 
Join Date: Apr 2008
Location: Russia, Vladivostok
Posts: 2,239
Quote:
Originally Posted by NikosD View Post
So, does that mean that you can use DXVA native with 10bit HEVC ?

What's the difference between MPC-HC EVR custom and MPC-BE EVR custom ?
Yes - native DXVA HEVC 10bit but only in EVR Custom, vanilla EVR show only white screen.

About MPC-HC - it's use hook and drop P010 input for EVR
__________________
I7 2600K@4.2 /Asrock P67 Extreme4 Gen 3 /Kingston HyperX 8Gb 1866 (4x2) Kit /OCZ Vertex 3 256Gb /Gigabyte GTX 960 /BenQ EW2430 /LG 47LM620T /Yamaha RX-V471 + NS-555 + NS-C444 + NS-333 + YST-SW215
Aleksoid1978 is offline   Reply With Quote
Old 21st February 2015, 07:42   #300  |  Link
wanezhiling
Registered User
 
Join Date: Apr 2011
Posts: 1,169


Any proof?
Like dxva checker trace log.
wanezhiling is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 01:26.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2019, vBulletin Solutions Inc.