View Single Post
Old 6th July 2019, 17:37   #56800  |  Link
el Filou
Registered User
 
el Filou's Avatar
 
Join Date: Oct 2016
Posts: 896
dxva copyback benchmarks

Quote:
Originally Posted by huhn View Post
you don't have to test HEVC and and HDR doesn't help here anyway h264 should be good enough.
Unfortunately the Radeon 7870 doesn't even support 4K H264, so useless to test. The real limitations of copyback decoding only start to become a problem with 10-bit 4K, because it takes up 8 times the bandwidth of 8-bit FHD. With lower resolutions, using copyback or native doesn't have an impact on which madVR settings I am able to use. With 4K 10-bit it does.

So I've done some benchmarks on my HTPC...

Notes:
- 'GPU' and 'video' numbers are the frequency reported / usage % (e.g. 1290 MHz at 50% load = 'GPU 645')
- CPU and GPU usage counters of DXVA Checker are completely wrong, I don't know how it computes them. Maybe the GPU counter reports only shader usage so it could be right but not useful, but the CPU usage is always wrong. I used HWMonitor which gives the same values as other monitoring tools.

CPU @ 3500 MHz (FSB 333), RAM verified dual channel

1. 4K HEVC 10-bit. Best of 5 passes with DXVA Checker decode/playback, best of 3 runs with madVR. Playback at 1920x1080. madVR settings: scale chroma separately; no compromise on HDR quality; SSIM2D downscale; clip pre-measured for HDR; no black bars detection.

RAM @ 666:

Decode: 63,0 fps, CPU 65, GPU 1006, Bus 24
Playback: 34,9 fps, CPU 81, GPU 731, Bus 21
madVR: 439 dropped frames, avg 50,16 ms, max 78,17 ms, GPU 1772, CPU 95

RAM @ 800:

Decode: 66,8 fps, CPU 60, GPU 1017, Bus 25
Playback: 34,5 fps, CPU 84, GPU 656, Bus 21
madVR: 315 dropped frames, avg 45,68 ms, max 63,78 ms, GPU 1772, CPU 90

For reference, with Native:

Decode: 178,5 fps, CPU 46, GPU 1642, video 1467
Playback: 177,0 fps, CPU 54, GPU 1785, video 1467
madVR: 0 dropped frames, avg 34,38 ms, max 38,46 ms, GPU 1613, CPU 68

Difference of dropped frames and max render times under madVR just with 20% faster RAM is massive.
With DXVA Checker, CPU is not fully loaded with decode and only 6% faster decode with 20% faster RAM. Software/platform inefficiency?

2. Same test but with madVR 'light' settings: compromise on HDR quality checked; Bicubic downscaling instead of SSIM2D

copyback: avg 16,5 ms, max 25,04 ms, GPU 1136, CPU 78
native: avg 14,93 ms, max 17,78 ms, GPU 592, CPU 25

max render time is 40% better while GPU is two times less loaded, CPU three times less loaded. Massive performance impact.
I understand why CPU would be loaded if it has to wait for frames to be read/written from/to system RAM, but why more GPU load? Can't the GPU render a frame it has received from the renderer while the next queued frame from the decoder is transfered over the PCIe bus and back?
A single 4K P010 frame is 25 MB, at PCIe 2 x16 it should take 3,125 ms, 6,25 ms round-trip just for the time over the bus. If the rendering has stalls it could explain the difference of a few ms between copyback & native even with very high end GPUs.

3. A lighter test comparing Jellyfish clip at 1080p HEVC, same bitrate, in 8-bit and 10-bit:

decode 8-bit: 266,5 fps, CPU 54, GPU 1797, video 1430, bus 11
decode 10-bit: 210,7 fps, CPU 60, GPU 1797, video 996, bus 22
playback 8-bit: 240,2 fps, CPU 75, GPU 1797, video 1141, bus 15
playback 10-bit: 181,9 fps, CPU 75, GPU 1743, video 852, bus 20

We see 10-bit decode takes up exactly two times the bus bandwidth as 8-bit, as expected.
The 10-bit decode performance doesn't scale to 4x the speed of the 4K clip (would be 267 fps).

for reference, 10-bit native: 299,2 fps, CPU 18, GPU 1613, video 1415, bus 2

4. Just out of curiosity I underclocked the CPU to 2100 MHz (FSB 200), to be able to test more different RAM speeds:

(Jellyfish 10-bit DXVA Checker decode):

RAM @ 400: 131,6 fps (native 268,4), CPU 76, GPU 1589, video 989, bus 13
RAM @ 533: 139,6 fps (native 275,7), CPU 70, GPU 1642, video 909, bus 14
RAM @ 666: 148,8 fps (native 281,1), CPU 67, GPU 1642, video 798, bus 15
RAM @ 800: 146,0 fps (native 281,8), CPU 68, GPU 1428, video 766, bus 15

for reference, CPU @ 3500 & RAM @ 800: 210,7 fps, CPU 60, GPU 1797, video 996, bus 22

With same RAM speed but 66% faster CPU, 40-45% more fps.
With same (slow) CPU speed but 66% faster RAM, 13% more fps.
__________________
HTPC: Windows 10 22H2, MediaPortal 1, LAV Filters/ReClock/madVR. DVB-C TV, Panasonic GT60, Denon 2310, Core 2 Duo E7400 oc'd, GeForce 1050 Ti 536.40
el Filou is offline   Reply With Quote