Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Hardware & Software > Software players

Reply
 
Thread Tools Search this Thread Display Modes
Old 16th November 2011, 21:59   #1  |  Link
NikosD
Registered User
 
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
H.264 DXVA Benchmarks: QuickSync vs UVD 2.2 vs VP4 vs VP5

Latest update with LAV Video x64 0.64 in DXVA native and pure decode mode, using latest ASICs like VP7 from Nvidia GTX 960 and QuickSync 3 from Haswell.

Added also AMD Polaris RX 470 results and just one result of Pascal GTX 1060 VP8 decoder.

All Intel CPUs from Haswell to Kabylake have exactly the same 4K HW H.264 decoder, they differ only in clock speed.

Take a look here:
http://forum.doom9.org/showthread.ph...50#post1712350


I've recently flashed my Radeon 5750 BIOS with 6750 BIOS.

The two cards use the same UVD2.2

But it seems that 6750 BIOS on a 5750, can lead to a UVD2.2 overclocking.

The default 5750 BIOS put UVD2.2 in standard UVD mode at Core/GPU = 400MHz / 900MHz
The 6750 BIOS on a 5750 card, put UVD2.2 in 3D mode at Core/GPU = 710MHz / 1160MHz

So I have two UVD2.2 systems benchmarked.
One plain UVD2.2 (400/900) and one UVD2.2 OC (710/1160)

I did my tests with the new DXVA checker x86 v2.7.0 http://bluesky23.yukishigure.com/en/index.html

Two systems tested:

1) My signature system:
Win 7 x64 SP1 - C2D@2.83 GHz - Radeon (6)750 - Catalyst 12.1 preview,

RAM configuration for AMD system: (mostly for DXVA-CB comparisons)

4GB (2 x 2GB) of DDR2 at FSB: DRAM = 1:1
Speed = 4-4-4-12@566 MHz (283x2)

2) Intel/ Nvidia system:
Win 7 SP1 x86 - Core i5-2400 (3.1GHz) - Geforce GT 440 (DDR5) - Nvidia beta 290.53 - Intel HD 2000 - Intel drivers v.2622

RAM configuration for Intel/ Nvidia system: (mostly for DXVA-CB comparisons, QuickSync decoder)

4GB (2 x 2GB) of DDR3 at FSB: DRAM = 1:5
Speed = 9-9-9-24@1338 MHz (669x2)

The decoders used are:

CoreAVC 3.0.1 (both modes - DXVA native, NVCUVID)

LAV Video 0.47 (in all modes - DXVA2 native, DXVA2 copy-back, NVCUVID, QS)

MS DS/MFT

FFDShow v4322 (QS)


For VC-1/ WMV3 I used the AMD Playback Decoder MFT and for CPU results I used the built-in WMVideo Decoder DMO (because is faster than LAV slow VC-1/WMV decoder)


I used five Reference H.264 files from here:
http://forum.doom9.org/showthread.php?t=159486

and I added five new reference files.

You can find every sample posted (from 1 to 10) here:

ftp://helpedia.com/pub/multimedia/x264/testvideos/


6.Avatar-1080p60fpsRef4-44.9Mbps

7.Vortexx_1088p24fpsRef3-109Mpbs

8.Birds_1080p24fpsRef4-112Mbps

9.Ducks.Take.Off.1080p30fpsRef5-108Mbps

10.Crowd.Run.1080p25Ref4-116Mbps


Also I used VC-1 and WMV3 files from here:
http://forum.doom9.org/showthread.php?t=156660


For CPU results (Core 2 Duo - Core i5) I used LAV Video 0.47.

Every benchmark mode used EVR renderer.


The results:

First is the Video Processor - QuickSync (QS), UVD2.2, VP4, CPU etc

Second is the decoder - MS DS (Microsoft's DirectShow), MS MFT (Microsoft's Media Foundation), LAV Video etc

Third is the decoder's mode - Native DXVA, Copy-Back (CB) DXVA, Quicksync (QS) etc


H.264


1. Twinpeaks-30fps


1. QS MS DS 401/401/401

QS MS MFT 390/395/400

QS CoreAVC 368/375/383

QS LAV NATIVE 366/374/375

CPU Core i5@3.1 253/264/274

QS LAV QS 200/201/202

QS FFDShow 158/161/163

VP5 LAV CUDA 130/139/143

VP5 MS DS 133/138/141

QS LAV CB 131/137/140

VP5 MS MFT 128/137/141

VP5 CoreCUDA 84/89/93

CPU C2D@2.83 73/85/96

VP4 LAV CUDA 80/84/88

VP4 MS MFT 80/84/88

VP4 LAV CB 81/84/87

VP4 MS DS 80/84/87

VP4 LAV NATIVE 77/79/82

UVD2.2 OC MS MFT 73/77/88

UVD2.2 OC LAV NATIVE 75/77/83

UVD2.2 OC MS DS 76/77/80

VP4 CoreCUDA 62/65/66

UVD2.2 LAV NATIVE 57/58/62

UVD2.2 OC LAV CB 56/57/59

UVD2.2 MS DS 52/57/66

UVD2.2 MS MFT 51/57/65

VP4 CoreAVC 55/56/57

UVD2.2 LAV CB 48/53/55

UVD2.2 OC CoreAVC 46/53/56

UVD2.2 CoreAVC 44/51/55


2. Samsung-30fps



1. QS MS MFT 234/271/341

QS MS DS 224/266/333

QS CoreAVC 229/263/321

QS LAV NATIVE 219/259/325

QS LAV QS 134/162/190

QS FFDShow 136/150/160

CPU Core i5@3.1 99/132/197

VP5 CUDA 82/115/129

VP5 CoreCUDA 89/107/123

QS LAV CB 86/105/128

VP5 MS DS 93/105/121

UVD2.2 OC MS MFT 52/62/75

UVD2.2 OC LAV NATIVE 55/62/71

UVD2.2 OC MS DS 50/62/72

UVD2.2 OC LAV CB 53/55/58

VP4 MS MFT 34/55/91

VP4 LAV CB 34/55/84

VP4 LAV CUDA 31/55/90

VP4 LAV NATIVE 35/54/83

VP4 MS DS 33/54/82

VP4 CoreCUDA 32/51/77

UVD2.2 OC CoreAVC 39/49/57

CPU C2D@2.83 34/49/81

UVD2.2 MS MFT 35/46/62

UVD2.2 LAV CB 35/46/56

UVD2.2 LAV NATIVE 35/45/54

UVD2.2 MS DS 32/45/56

VP4 CoreAVC 27/41/62

UVD2.2 CoreAVC 30/38/44



3. Basket-60fps

DXVA checker 2.8.0b3 used for DXVA-CB, QS, NVCUVID and CPU.



1. QS CoreAVC 461/504/550

QS MS MFT 455/502/567

QS MS DS 455/502/553

QS LAV NATIVE 439/483/536

CPU Core i5@3.1 265/286/317

QS LAV QS 200/203/209

QS FFDShow 160/151/172

VP5 CoreCUDA 137/149/164

VP5 MS DS 118/133/149

QS LAV CB 109/115/120

VP4 CoreCUDA 77/84/106

VP4 LAV CB 75/82/99

VP4 MS MFT 74/82/107

CPU C2D@2.83 72/82/103

VP4 MS DS 75/81/89

VP4 LAV CUDA 73/81/99

VP4 LAV NATIVE 71/81/103

UVD2.2 OC LAV NATIVE 74/76/77

UVD2.2 OC MS MFT 70/76/82

UVD2.2 OC MS DS 62/76/78 *

UVD2.2 OC CoreAVC 56/57/58

UVD2.2 LAV NATIVE 55/57/59

UVD2.2 LAV CB 54/57/58

UVD2.2 MS MFT 53/57/65

VP4 CoreAVC 54/57/66

UVD2.2 MS DS 41/57/59 *

UVD2.2 CoreAVC 42/44/50



4. Girls-60fps



1. QS MS DS 410/423/436

QS MS MFT 410/420/456

QS CoreAVC 401/414/430

QS LAV NATIVE 397/412/430

CPU Core i5@3.1 198/209/234

QS LAV QS 193/200/203

QS FFDShow 160/166/171

VP5 LAV CUDA 141/143/146

VP5 CoreCUDA 110/128/144

VP5 MS DS 114/122/133

QS LAV CB 93/97/109

VP4 LAV CB 74/76/80

VP4 LAV NATIVE 74/76/79

VP4 LAV CUDA 74/76/78

VP4 MS MFT 73/76/79

UVD2.2 OC MS MFT 72/76/81

UVD2.2 OC MS DS 72/76/77 *

VP4 MS DS 73/75/77

UVD2.2 OC LAV NATIVE 71/75/77

VP4 CoreCUDA 67/73/81

CPU C2D@2.83 63/70/85

UVD2.2 OC LAV CB 51/59/60

UVD2.2 MS MFT 52/57/62

UVD2.2 LAV NATIVE 54/56/59

UVD2.2 LAV CB 54/56/58

UVD2.2 OC CoreAVC 55/56/57

VP4 CoreAVC 54/56/58

UVD2.2 MS DS 50/56/59 *

UVD2.2 CoreAVC 42/45/50


5. Cat-60fps

No MFT splitter for M2TS files



[U]1. QS CoreAVC 394/402/411

QS MS DS 383/400/412

QS LAV NATIVE 372/381/387

QS LAV QS 190/194/197

CPU Core i5@3.1 157/187/206

QS FFDShow 156/161/166

VP5 LAV CUDA 138/141/147

VP5 CoreCUDA 134/138/145

QS LAV CB 117/118/125

VP5 MS DS 78/98/106

UVD2.2 OC MS DS 69/74/76

VP4 CoreCUDA 68/74/81

UVD2.2 OC LAV NATIVE 70/73/74

VP4 LAV CUDA 68/72/78

VP4 LAV CB 67/72/78

VP4 MS DS 68/71/79

VP4 LAV NATIVE 65/71/77

CPU C2D@2.83 54/62/71

UVD2.2 OC LAV CB 57/58/59

UVD2.2 OC CoreAVC 54/55/56

UVD2.2 LAV NATIVE 54/55/56

UVD2.2 LAV CB 53/55/57

UVD2.2 MS DS 50/55/58

VP4 CoreAVC 48/50/56

UVD2.2 CoreAVC 42/44/46



6. Avatar-60fps

MS MFT crashes DXVA Checker all versions, all platforms


1. QS MS DS 328/345/367

QS CoreAVC 322/330/336

QS LAV NATIVE 322/329/337

QS LAV QS 189/193/195

QS FFDShow 155/162/166

CPU Core i5@3.1 143/159/175

QS LAV CB 108/114/122

VP4 MS DS 65/76/84

VP4 LAV CUDA 64/76/82

VP4 LAV CB 64/76/82

VP4 LAV NATIVE 67/73/80

UVD2.2 OC LAV NATIVE 68/70/73

UVD2.2 OC MS DS 68/70/72 *

UVD2.2 OC LAV CB 55/57/58

UVD2.2 OC CoreAVC 53/55/57

UVD2.2 LAV CB 48/53/57

UVD2.2 LAV NATIVE 49/52/54

CPU C2D@2.83 44/52/63

UVD2.2 MS DS 39/52/55 *

VP4 CoreAVC 46/51/57

UVD2.2 CoreAVC 40/42/44



7. Vortex-24fps



1. QS FFDShow 122/125/129

QS LAV QS 119/121/124

QS MS DS 119/120/120

QS MS MFT 119/120/120

QS LAV NATIVE 118/120/122

QS CoreAVC 118/119/122

QS LAV CB 112/117/121

VP5 LAV CUDA 71/73/76

VP5 CoreCUDA 72/73/76

VP5 MS DS 71/73/75

VP5 MS MFT 72/73/76

CPU Core i5@3.1 56/59/61

UVD2.2 OC LAV CB 35/37/44

UVD2.2 OC MS MFT 34/37/42

UVD2.2 OC LAV NATIVE 36/36/39

UVD2.2 OC MS DS 36/36/37

UVD2.2 OC CoreAVC 28/29/31

UVD2.2 LAV NATIVE 26/28/28

UVD2.2 LAV CB 25/27/32

UVD2.2 MS DS 25/26/29

UVD2.2 MS MFT 24/26/34

CPU C2D@2.83 19/24/27

VP4 LAV CUDA 21/22/26

VP4 LAV CB 21/22/25

UVD2.2 CoreAVC 21/22/25

VP4 MS MFT 21/22/24

VP4 CoreCUDA 21/22/24

VP4 LAV NATIVE 20/22/24

VP4 MS DS 19/22/24

VP4 CoreAVC 17/18/21



8. Birds-24fps



1. QS MS MFT 110/118/133

QS CoreAVC 110/117/129

QS FFDShow 112/115/120

QS LAV NATIVE 109/114/122

QS LAV QS 108/114/122

QS MS DS 108/113/122

QS LAV CB 78/83/88

VP5 CoreCUDA 71/77/87

VP5 LAV CUDA 59/69/77

VP5 MS DS 59/63/71

CPU Core i5@3.1 50/54/59

UVD2.2 OC MS MFT 32/38/47

UVD2.2 OC LAV NATIVE 36/37/43

UVD2.2 OC LAV CB 34/37/41

UVD2.2 MS DS 35/37/39

UVD2.2 OC CoreAVC 28/30/34

UVD2.2 LAV NATIVE 26/27/34

UVD2.2 LAV CB 25/27/35

UVD2.2 MS DS 25/27/35

UVD2.2 MS MFT 23/27/33

UVD2.2 CoreAVC 21/23/27

VP4 CoreCUDA 20/22/29

VP4 MS MFT 19/22/30

VP4 LAV CUDA 10/22/35 (A lot of breaks)

VP4 LAV CB 19/21/29

VP4 LAV NATIVE 19/21/28

VP4 MS DS 18/21/24

CPU C2D@2.83 18/21/23

VP4 CoreAVC 15/18/25



9. Ducks -30fps



1. QS CoreAVC 119/136/147

QS MS MFT 117/135/152

QS LAV NATIVE 118/134/144

QS LAV QS 116/130/139

QS FFDShow 129/129/132

QS MS DS 123/125/126 *

VP5 CoreCUDA 74/83/95

VP5 LAV CUDA 71/82/92

QS LAV CB 67/75/91

CPU Core i5@3.1 56/63/71

VP5 MS DS 48/57/84

UVD2.2 OC LAV NATIVE 36/41/48

UVD2.2 OC MS MFT 36/41/48

UVD2.2 OC LAV CB 35/41/48

UVD2.2 OC MS DS 30/39/43 *

UVD2.2 OC CoreAVC 29/33/37

UVD2.2 LAV CB 25/30/39

UVD2.2 MS MFT 24/30/38

UVD2.2 LAV NATIVE 26/29/35

UVD2.2 MS DS 21/27/32 *

VP4 LAV CUDA 15/26/39

UVD2.2 CoreAVC 22/25/29

VP4 MS MFT 21/25/34

VP4 CoreCUDA 21/25/31

VP4 LAV CB 20/25/34

VP4 LAV NATIVE 20/25/30

CPU C2D@2.83 20/24/29

VP4 MS DS 19/24/31 *

VP4 CoreAVC 18/21/27





10. Crowd Run-25fps



1. QS FFDShow 118/118/118

QS MS DS 112 (Only Average result)

QS MS MFT 109/109/110

QS CoreAVC 109/109/109

QS LAV QS 109/109/109

QS LAV NATIVE 107/107/107

QS LAV CB 71/72/73

VP5 LAV CUDA 68/70/72

VP5 CoreCUDA 68/69/70

VP5 MS DS 59/59/59

CPU Core i5@3.1 52/53/54

UVD2.2 OC LAV CB 32/33/37

UVD2.2 OC LAV NATIVE 32/33/33

UVD2.2 OC MS MFT 31/33/35

UVD2.2 OC MS DS 25/31/35

UVD2.2 OC CoreAVC 26/27/28

UVD2.2 LAV CB 23/24/27

UVD2.2 MS MFT 23/24/25

UVD2.2 LAV NATIVE 23/24/24

UVD2.2 MS DS 16/22/24

VP4 LAV CUDA 20/21/23

VP4 MS MFT 20/21/23

VP4 LAV CB 20/21/23

VP4 CoreCUDA 20/21/22

CPU C2D@2.83 19/21/22

VP4 LAV NATIVE 20/20/22

UVD2.2 CoreAVC 20/20/21

VP4 MS DS 18/20/22 *

VP4 CoreAVC 17/18/19


* MS DS decoder has a lot of artifacts at the beginning of the decoding, resulting low min value and probably lower average value


VC-1/WMV3


A) VC-1 - Devil May Cry 1080/60p-40Mbps


1. QS LAV QS 214/218/222

QS FFDShow 158/167/170

UVD2.2 OC AMD MFT 87/88/89

UVD2.2 OC LAV NATIVE 87/88/89

VP4 LAV CUVID 80/80/84

VP4 LAV CB 79/80/84

VP4 LAV NATIVE 76/80/84

CPU Core i5@3.1 68/76/96

UVD2.2 AMD MFT 65/66/67

UVD2.2 LAV NATIVE 64/66/69

UVD2.2 OC LAV CB 51/57/59

UVD2.2 LAV CB 48/55/56

CPU C2D@2.83 41/44/47



B) WMV3 - MP10 Digital Life 1080/24p-10Mbps


1. QS LAV QS 242/247/252

QS FFDShow QS 170/172/175

CPU Core i5@3.1 89/102/110

UVD2.2 OC AMD MFT 87/90/91

UVD2.2 OC LAV NATIVE 87/90/91

VP4 LAV CUVID 84/89/101

VP4 LAV CB 82/83/86

VP4 LAV NATIVE 81/83/84

UVD2.2 AMD MFT 68/68/69

UVD2.2 LAV NATIVE 68/68/69

CPU C2D@2.83 56/63/68

UVD2.2 OC LAV CB 57/58/59

UVD2.2 LAV CB 55/57/59


Comments:

1) The performance of QuickSync HW is beyond any competition, using native DXVA mode with every decoder used (MS DS/MFT, CoreAVC, LAV NATIVE).

The performance of copy-back mode using Intel's MSDK QuickSync decoder v0.28 software (FFDshow, LAV QS) is heavily multi-threaded and optimized but for some reason is a lot slower than QS decoder v0.20 (more than 20%) and a lot slower from DXVA2 native with the above system configuration in high frame rate clips - 60fps and/or low bitrate clips (Clips 1 to 6)

But it's very good and sometimes faster than native DXVA when used for high birate - low frame rate clips (Clips 7 to 10)

The performance of LAV DXVA2 copy-back is simply awful. It's slower than VP5 most of the times!

For laptop users, or for people who want their CPU and GPU load as low as possible during playback mode, native DXVA2 decoders (MS DS/MFT, CoreAVC, LAV NATIVE) are BY FAR the most efficient decoders.

2) VP5 is about 2 times faster than VP4 in "easy" low bitrate clips from 1 to 6. But it's more than 3 times faster in "difficult" high bitrate clips from 7 to 10, as it is built for 4K x 2K decoding. For those huge bandwidth clips, it closes the gap with QuickSync, but the distance is still obvious.

3) VP4 has a lot of problems at huge bandwidths starting from clip 7 up to 10 like 5750 UVD2.2, although the latter is a little faster. It seems absolutely reasonable for Nvidia to go for VP5 in order to support 4K x 2K and large bandwidths.
UVD2.2 OC can play easily every clip from 1 to 10!

4) UVD2.2 OC is about 35% - 42% faster than 5750 UVD2.2 in H.264 and 32% faster in VC-1/WMV3 and it's faster than VP4 and Core2Duo too in bandwidth heavy files starting from clip 7.
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1)
HEVC decoding benchmarks
H.264 DXVA Benchmarks for all

Last edited by NikosD; 1st March 2017 at 12:12. Reason: Added results of Polaris RX 470 and one result of Pascal GTX 1060
NikosD is offline   Reply With Quote
Old 16th November 2011, 22:58   #2  |  Link
pirlouy
_
 
Join Date: May 2008
Location: France
Posts: 692
Interesting results.
If I'm not wrong, after flashing your BIOS, all sample can be read without dropped frames (fps always > sample fps).

Not the kind of stuff I'd try with my 5770 though. :-)
pirlouy is offline   Reply With Quote
Old 31st December 2011, 12:32   #3  |  Link
NikosD
Registered User
 
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
Only VP5 results are missing...

I would like to see some results especially for clips from 7 to 10.
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1)
HEVC decoding benchmarks
H.264 DXVA Benchmarks for all
NikosD is offline   Reply With Quote
Old 2nd January 2012, 11:53   #4  |  Link
wanezhiling
Registered User
 
Join Date: Apr 2011
Posts: 1,184
clip 3 4 5 6... I doubt the result.

Use PotPlayer DXVA, AMD never reach 60fps, even on HD6990... nVidia and Intel are 60 easily~
wanezhiling is offline   Reply With Quote
Old 2nd January 2012, 12:03   #5  |  Link
NikosD
Registered User
 
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
Quote:
Originally Posted by wanezhiling View Post
clip 3 4 5 6... I doubt the result.

Use PotPlayer DXVA, AMD never reach 60fps, even on HD6990... nVidia and Intel are 60 easily~
Don't!

Flashing 5750 BIOS with 6750 BIOS put UVD 2.2 in a "special" PowerPlay mode with 3D clocks (core:710 MHz/ Memory:1160 MHz - Sapphire Vapor-X edition).

So there is an overclock to UVD2.2, only allowed in this strange situation (5750 flashed by 6750 BIOS)

When I tried to edit my 5750 BIOS in order to work in UVD mode at 3D clocks like the flashed 5750, it worked but with no performance advantage.

The only solution seems to be a flashed 5750 card.

I don't have a real 6750 card to check.

And VP4 going to 93% to 97% utilization during playback in PotPlayer DXVA for 5th and 6th clip is not easily!
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1)
HEVC decoding benchmarks
H.264 DXVA Benchmarks for all

Last edited by NikosD; 2nd January 2012 at 12:07.
NikosD is offline   Reply With Quote
Old 2nd January 2012, 14:56   #6  |  Link
wanezhiling
Registered User
 
Join Date: Apr 2011
Posts: 1,184
http://www.gokuai.com/f/2u7UmrSwOj6ehL26
http://www.gokuai.com/f/S9yFt7d55975qK38
NikosD, here's two 1080p60fps samples, please just use "your (6)750 + PotPlayer DXVA + Fraps" to test it.

Last edited by wanezhiling; 2nd January 2012 at 15:04.
wanezhiling is offline   Reply With Quote
Old 2nd January 2012, 15:40   #7  |  Link
mariush
Registered User
 
Join Date: Dec 2008
Posts: 589
For those that have problems using Rapidshare or Hotfile or can't download large files without disconnections, I'm still hosting these files and the previous ones here:

ftp://helpedia.com/pub/multimedia/x264/testvideos/

Resume supported, can use several download threads to speed things up etc...
mariush is offline   Reply With Quote
Old 2nd January 2012, 17:12   #8  |  Link
NikosD
Registered User
 
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
Quote:
Originally Posted by wanezhiling View Post
http://www.gokuai.com/f/2u7UmrSwOj6ehL26
http://www.gokuai.com/f/S9yFt7d55975qK38
NikosD, here's two 1080p60fps samples, please just use "your (6)750 + PotPlayer DXVA + Fraps" to test it.
1) Using PotPlayer you don't need Fraps.
PotPlayer has internal statistics and file information activated by pressing Tab key.

2) Your second file called "SNSD_-_Tell_Me_Your_Wish_(Genie)" is EXACTLY the same file called "Girls" the 4th clip of my collection. It took me 30 minutes to download the same clip I had.

PLEASE PAY ATTENTION to what I write at my first post, read the whole text, download the files and CHECK THINGS by YOURSELF.

3) Your first file is a little strange.
Although it reaches bitrates up to 188Mbps, it is not that difficult after all.

But it crashes MS MFT decoder in DXVA checker.

5750 MS DS 54/56/60

6750 MS DS 72/75/79

As you can see it's completely playable under (6)750

4) You should notice the performance of (6)750 not only in clips from 3 to 6, but from 7 to 10 which are unplayable by 5750 UVD and VP4 in realtime.

They are more difficult clips that show the performance of (6)750 and make the difference from 5750 UVD and VP4.

Maybe AMD graphics card holders should make some "noise" with my accidental finding, about how AMD "handles" UVD2.2/UVD3 in their cards.

Take a look here:
http://www.agile-news.com/news-32346...on-2160p!.html

How is it possible for UVD3 to accelerate 4K x 2K video files with 238Mbps bitrate and 14% CPU usage and the same card cannot play 1080p60 ???
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1)
HEVC decoding benchmarks
H.264 DXVA Benchmarks for all
NikosD is offline   Reply With Quote
Old 3rd January 2012, 01:33   #9  |  Link
wanezhiling
Registered User
 
Join Date: Apr 2011
Posts: 1,184
Fraps is useful, I don't like PotPlayer's OSD info, it has many mistakes..

"It took me 30 minutes to download the same clip I had." --> My fault
Cuz it really shocked me when I glanced at your (6)750's performance...
I never reach 60fps with "HD6850 + PotPlayer DXVA" just like 5750, so maybe I should flash my BIOS to test this again..

6750 MS DS 72/75/79 --> What I can say is congratulation~

In fact, I had made this comparison(PureVideo VS QuickSync VS UVD) for several times on my forum, I got same result with you except (6)750 .
http://forum.doom9.org/showthread.ph...68#post1546068 This is VP5 DXVA 2160P from my forum,you must remember it.

btw, I hate english!as a remote eastern person, I can understand every word you say, but..

Last edited by wanezhiling; 3rd January 2012 at 05:01.
wanezhiling is offline   Reply With Quote
Old 3rd January 2012, 09:09   #10  |  Link
NikosD
Registered User
 
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
For me - 5750 UVD2.2 - it's not working even though DXVA checker says about the Device decoders:

"ModeH264_VLD_NoFGT: DXVA2, 720x480 / 1280x720 / 1920x1080 / 3840x2160"
"ModeH264_VLD_NoFGT_Flash: DXVA2, 720x480 / 1280x720 / 1920x1080 / 3840x2160"

So from the side of hardware and driver, UVD 2.2 is capable of 4K x 2K and if CoreAVC DXVA is capable of 4K x 2K, as it is clearly seen by your screenshots, then the problem must be an artificial restriction in everything else but VP5 inside the code of CoreAVC DXVA.

DXVA checker reports for clips beyond 1080p and CoreAVC DXVA the following:

"ModeUnknown (NV12): DXVA1 (VMR)"

and of course it's not working.

Quote:
Originally Posted by wanezhiling View Post
In fact, I had made this comparison(PureVideo VS QuickSync VS UVD) for several times on my forum, I got same result with you except (6)750 .
Which is your forum ?

Quote:
Originally Posted by wanezhiling View Post
This is VP5 DXVA 2160P from my forum,you must remember it.
Do you have access on VP5 hardware or could someone else from your forum post some benchmark results for clips 1 to 10 or at least 7 to 10 ?

Quote:
Originally Posted by wanezhiling View Post
btw, I hate english!as a remote eastern person, I can understand every word you say, but..
No problem at all!
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1)
HEVC decoding benchmarks
H.264 DXVA Benchmarks for all
NikosD is offline   Reply With Quote
Old 3rd January 2012, 09:40   #11  |  Link
nm
Registered User
 
Join Date: Mar 2005
Location: Finland
Posts: 2,641
Quote:
Originally Posted by NikosD View Post
Take a look here:
http://www.agile-news.com/news-32346...on-2160p!.html

How is it possible for UVD3 to accelerate 4K x 2K video files with 238Mbps bitrate and 14% CPU usage and the same card cannot play 1080p60 ???
Weren't you guys talking about 6750, which isn't the same card at all. 6750 has UVD 2.2, but the agile-news article is about 6570, which has UVD 3.
nm is offline   Reply With Quote
Old 3rd January 2012, 11:36   #12  |  Link
NikosD
Registered User
 
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
Quote:
Originally Posted by nm View Post
Weren't you guys talking about 6750, which isn't the same card at all. 6750 has UVD 2.2, but the agile-news article is about 6570, which has UVD 3.
I was referring to a previous comment of Wanezhiling

Quote:
Originally Posted by wanezhiling View Post
Use PotPlayer DXVA, AMD never reach 60fps, even on HD6990... nVidia and Intel are 60 easily~
It is true BTW.

Take whatever 5xxx or 6xxx card and try to run a demanding 1080p60fps clip in any player you want.
I don't have a real 6xxx card but take a look here when renq tried the first 3 clips:

http://forum.doom9.org/showthread.ph...78#post1489378

I rewrite the results of UVD3 here by renq:

DIVX ver is 9.01.21
Arcsoft ver 2.27.319.108
FFDShow DXVA ver is 3800

All post-processing turned OFF.

1. Clip
MS MFT - 48/57/85
DIVX DXVA - 48/58/85
Arcsoft - 45/57/69
Ffdshow - 44/57/70

2. Clip
DIVX - 38/48/75
MS DTV/DVD - 31/48/86
Arcsoft - Wouldn't play
FFDSHow - Same

3. Clip
DIVX - 51/57/84
MS DTV/DVD - 48/57/79
Arcsoft - 53/57/64
ffdshow - 54/57/71

So when you see the original UVD2.2 performance by 5750 and the performance of UVD3 posted above which is the same as of 5750, how is it possible the same UVD3 card to perform like here:

http://www.agile-news.com/news-32346...on-2160p!.html

In fact I can't actually even play in DXVA mode a simple clip beyond 1080p like 2048 x 1200 in any combination of codec/player I have tried.

What I'm trying to say is that I'm looking for a suitable H.264 DXVA codec and player (DXVA checker is fine for me) to check the performance in 4K x 2K of "original" UVD2.2, my "Frankenstein" UVD2.2+ and UVD3, VP5 by other users.

Also I'm trying to say that the perfomance of UVD2.2 and UVD3 is burried by AMD by not suitable BIOS/ drivers.

I would really like to know the codec and player used by http://www.agile-news.com to play in DXVA mode 4K x 2K video clip file with UVD3
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1)
HEVC decoding benchmarks
H.264 DXVA Benchmarks for all
NikosD is offline   Reply With Quote
Old 3rd January 2012, 18:38   #13  |  Link
wanezhiling
Registered User
 
Join Date: Apr 2011
Posts: 1,184
Quote:
Which is your forum ?
A Chinese PotPlayer fan's forum, I'm not administrator,just a moderator.
I totally agree with your interesting things: QS's amazing speed, VP4's poor performance at huge bandwidths(I only have Duck takes off 1080p30p), AMD never reach 60fps expect your (6)750 and so on.
Because I tested hd3850 - m hd4650 - hd5770 - hd6850, 8600gt - 9300m gs - gt240 - gts450, I5 2300(hd2000)

Quote:
Do you have access on VP5 hardware or could someone else from your forum post some benchmark results for clips 1 to 10 or at least 7 to 10 ?
I had contacted the geforce 410m's owner, but he say he had deleted those samples(screenshots) which I given to him..
And I had to tell you one thing: we can't download clips 1 to 10, we even cannot browse those download link page because of national policy...sigh...
Fortunately, 410m's owner promised me to re-download 2 sample(Duck takes off 1080p30p and clip4"girls"). Tomorrow I will post his benchmark results here.


I found you always look for a way to make AMD's 4k x 2k, my notebook, m hd4650, DXVA checker also says it support 3840x2160...but even on HD6850, I never seen it decoded 2160p, I only seen BSOD....
Quote:
So from the side of hardware and driver, UVD 2.2 is capable of 4K x 2K and if CoreAVC DXVA is capable of 4K x 2K, as it is clearly seen by your screenshots, then the problem must be an artificial restriction in everything else but VP5 inside the code of CoreAVC DXVA.
No, VP5 can't decode 4K x 2K by CoreAVC DXVA. CoreAVC CUDA is ok. you can see that screenshots. In fact besides CoreAVC CUDA, VP5 can docode 4K x 2K by PotPlayer self DXVA(note: old version,at least before 2011.11), TMT5, MainConcept(Broadcast) AVC/H.264 deocoder.
Failure lists: PowerDVD11 DXVA, ffdshow DXVA, MPC-HC DXVA, CoreAVC DXVA, MS DTV-DVD, LAV CUVID

Last edited by wanezhiling; 3rd January 2012 at 19:02.
wanezhiling is offline   Reply With Quote
Old 3rd January 2012, 19:29   #14  |  Link
nm
Registered User
 
Join Date: Mar 2005
Location: Finland
Posts: 2,641
Quote:
Originally Posted by NikosD View Post
I was referring to a previous comment of Wanezhiling
Ah, I see now. Thanks for the full explanation.
nm is offline   Reply With Quote
Old 3rd January 2012, 19:57   #15  |  Link
NikosD
Registered User
 
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
Quote:
Originally Posted by wanezhiling View Post
And I had to tell you one thing: we can't download clips 1 to 10, we even cannot browse those download link page because of national policy...sigh...
Sorry to hear that...I' ve heard in the news about the Chinese government's restrictions on Internet use and you confirm the bad news.
Maybe you could try the link below:

ftp://helpedia.com/pub/multimedia/x264/testvideos/

It has all 10 clips.

Quote:
Originally Posted by wanezhiling View Post
No, VP5 can't decode 4K x 2K by CoreAVC DXVA. CoreAVC CUDA is ok. you can see that screenshots. In fact besides CoreAVC CUDA, VP5 can docode 4K x 2K by PotPlayer self DXVA(note: old version,at least before 2011.11), TMT5, MainConcept(Broadcast) AVC/H.264 deocoder.
Failure lists: PowerDVD11 DXVA, ffdshow DXVA, MPC-HC DXVA, CoreAVC DXVA, MS DTV-DVD, LAV CUVID
Thanks for the info.
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1)
HEVC decoding benchmarks
H.264 DXVA Benchmarks for all
NikosD is offline   Reply With Quote
Old 4th January 2012, 06:07   #16  |  Link
wanezhiling
Registered User
 
Join Date: Apr 2011
Posts: 1,184
Quote:
Maybe you could try the link below:

ftp://helpedia.com/pub/multimedia/x264/testvideos/
OK, I'll try it @30k/s.. So the complete result should be here tomorrow.. I hate the ISP!!

Quote:
Thanks for the info.
It doesn't matter.


OK, this's VP5's Benchmark on Clip4 and Clip9.
I5 2430M
nVidia 410M
Driver: ForceWare 267.54/Win7 64
4G DDR3



4. Girls-60fps

VP5 CoreCUDA 110/128/144
VP5 CUVID 141/143/146
VP5 MS DS 114/122/133



9. Ducks -30fps

VP5 CoreCUDA 74/83/95
VP5 CUVID 71/82/92
VP5 MS DS 48/57/84


Absolutely, VP5 is much faster than VP4. Think about one thing: 410M is a so "weak" graphics card, imagine if GTX560ti is VP5

Last edited by wanezhiling; 4th January 2012 at 08:18.
wanezhiling is offline   Reply With Quote
Old 4th January 2012, 07:47   #17  |  Link
nevcairiel
Registered Developer
 
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,336
Quote:
Originally Posted by wanezhiling View Post
Think about one thing: 410M is a so "weak" graphics card, imagine if GTX560ti is VP5
The speed of the card is of little importance for progressive video.
The Video decoder is separate from the 3D engine, and always runs at the same speed. Its not faster in faster cards.
Assuming there is enough memory bandwidth, every card using VP5 should run at the same speed.

For example (from VP4):
A GT430 and a GTX570 have the same video decoding performance, despite their massive differences in 3D power.

Anyhow, NVIDIA said that VP5 would be double the speed of VP4 approximately.
Sadly, thats still way below the Intel decoder. And IVB which is coming in 3-4 month will once again increase that performance.
Now all Intel has to do is improve their drivers and iron out the last remaining issues.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders

Last edited by nevcairiel; 4th January 2012 at 07:52.
nevcairiel is online now   Reply With Quote
Old 4th January 2012, 09:40   #18  |  Link
NikosD
Registered User
 
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
Quote:
Originally Posted by nevcairiel View Post
The speed of the card is of little importance for progressive video.
The Video decoder is separate from the 3D engine, and always runs at the same speed. Its not faster in faster cards.
Assuming there is enough memory bandwidth, every card using VP5 should run at the same speed.
This is maybe true for Video decoder hardware inside Nvidia cards and ATI cards (with the exception of my Frankenstein (6)750 card), but it's not true for Intel Video Hardware.

For ATI and 5xxx series I know that UVD2.2 clocks are 400MHz for Video processor and 900 MHz for memory.

For AMD and 6xxx series (including 67xx cards) I assume that UVD2.2/UVD3 is clocked at the same frequency of 400/900 MHz, from the performance I see by UVD3.

From my GT440 I see that VP4 is clocked at max 3D clocks of the card.
I know nothing about other VP4 cards, but I believe Nevcariel saying that they all clock at the same speed (frequency).

So for progressive video - not interlaced - the pure video decoding performance will be the same for every AMD/ Nvidia graphics card using the same video processor, assuming there are no tricks from AMD/ Nvidia regarding BIOS/drivers for specific models and cards (favoring specific models over some others by clocking the video processor to different speeds for example)

For the interlaced video, the deinterlacing process is executed by GPU shaders or computational units or whatever name you call them.
So the GPU performance in general and not video processor alone, does matter for interlaced video.

Also GPU performance does matter regarding video processing like De-noise, De-blocking, Edge-Enhancement etc.
These post-processing filters and many more are executed by AMD GPU shaders at hardware/driver level for AMD cards.
Of course those kind of video filters and many others can be executed by software like FFDshow in the CPU if you have a weak graphics card and a powerful processor.

For Intel the QuickSync video engine has the same speed of the GPU inside the processor, which varies from 650 MHz for the low end processors, up to 1350 MHz for the upper class processors working in GPU turbo mode.
For example my Core i5-2400 uses QS from 850 MHz to 1100MHz when Core i7-2600K uses the same QS from 850 MHz to 1350 MHz.
So for Intel hardware the choice of the CPU/GPU has some difference in the performance of video decoders.

Quote:
Originally Posted by nevcairiel View Post
Anyhow, NVIDIA said that VP5 would be double the speed of VP4 approximately.
Sadly, thats still way below the Intel decoder. And IVB which is coming in 3-4 month will once again increase that performance.
From the figures of Wanezhiling it seems that for "easy" clips - meaning clips with low bitrate like 4. Girls - VP5 has less than double the performance of VP4.
But for "heavy" clips with huge bitrates like 9. Ducks, VP5 has more than 3 times the performance of VP4 which is great.
Because VP5 seems to be a very well balanced video processor, pushing the performance figures where it is needed - to large bitrates required by 4K x 2K and special encoded clips like from 7 to 10 in my collection.
And for those clips - 7 to 10 - the performance of VP5 is close to QS 1st generation.

Of course Ivy will extend the gap even more in order to support multiple 4K x 2K streams simultaneously and 4K x 4K (square resolution)
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1)
HEVC decoding benchmarks
H.264 DXVA Benchmarks for all

Last edited by NikosD; 4th January 2012 at 10:40.
NikosD is offline   Reply With Quote
Old 4th January 2012, 10:07   #19  |  Link
wanezhiling
Registered User
 
Join Date: Apr 2011
Posts: 1,184
nev, NikosD, I really appreciate it that you correct my wrong opinion.

And I always wonder know this reason... Since the ForceWare 285.62, this became worse and worse...

btw, can you tell me the principle of Intel's amazing speed(more than several times the performance of nVidia/AMD)? Thx.
wanezhiling is offline   Reply With Quote
Old 4th January 2012, 11:32   #20  |  Link
NikosD
Registered User
 
Join Date: Aug 2010
Location: Athens, Greece
Posts: 2,901
Quote:
Originally Posted by wanezhiling View Post
btw, can you tell me the principle of Intel's amazing speed(more than several times the performance of nVidia/AMD)? Thx.
This is a hard question to be answered in detail and very technical.

Maybe Egur (Eric Gur) at http://forum.doom9.org/showthread.ph...38#post1523738 can answer this with the help of "inside" Intel information.

First of all Intel's MFX engine (QS) has an obvious advantage over UVDx and VPx.

Greater speed (frequency).

MFX works in the range of 850 MHz - 1100 MHz for most of the desktop processors.
AMD UVD2.2 works at 400MHz ! with no dynamic change of frequency, which means that no matter if you play or benchmark a clip (full load), UVD2.2 always works at 400 MHz

UVD3, VP4 and I'm sure VP5 have dynamic change of working frequency depending on the load.
VP4 reaches 820MHz during benchmarking (full load).

Of course it's not only about frequency, for example because of the integration inside a very fast processor MFX can take advantage of very fast access to memory/ caches.

In general I could say that your question seems of the same principle as of saying why SandyBridge is faster than Bulldozer, or why AMD 79xx series graphics cards (Tahiti architecture) are faster than Nvidia's 5xx series (GF110 architecture)

Of course Sandy, Bulldozer, Southern Island, GF110 architecture chips are extremely complicated chips inside, they are monsters with billions transistors, but executing in the end the same kind of x86 instructions (more or less) for CPU and same kind of D3D, OpenGL, OpenCL instructions for GPUs but with a much different way between different architectures.

Video processors like VP4/5, UVD2.2/3, QuickSync - to be more exact the video decoding/encoding engine of QuickSync is called MFX engine (Multi-Format Codec) - are a lot lot simpler processors than a modern CPU or a GPU processor.

They use fixed function logic, not general purpose logic like CPUs and GPUs (in our days), and they have a form of an ASIC.

They do actually just one thing but they do it extremely fast with very low power consumption, comparing to CPU when you see the low frequency and low number of transistors used by Video processors.

That thing is decoding Video compression algorithms like MPEG-2, MPEG-4 ASP, MPEG-4 AVC, VC-1.

If you study those algorithms you will see that most of the resources needed to decode them are used for Inverse Discrete Cosine Transformations (iDCT) which is a mathematical equation/ transformation.

So if we go deeper, the performance of MFX engine as of every fast video decoder has to do about how quickly performs iDCT, but of course this is a very simple approach of video processor performance.

For example VP5 increased the performance of huge bitrate video clips much more than low bitrate comparing to VP4.
That point has to do with internal changes to access in memory/ caches and wider buses etc.

We need hardware experts and specialized knowledge to go deeper from here, I think!
__________________
Win 10 x64 (19042.572) - Core i5-2400 - Radeon RX 470 (20.10.1)
HEVC decoding benchmarks
H.264 DXVA Benchmarks for all

Last edited by NikosD; 4th January 2012 at 13:51.
NikosD is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 08:13.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.