Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Video Encoding > MPEG-4 AVC / H.264

Closed Thread
 
Thread Tools Search this Thread Display Modes
Old 19th September 2008, 08:41   #241  |  Link
qyqgpower
Registered User
 
Join Date: Jul 2005
Posts: 108
@blubberbirne
you are running vista 64bit, so may be the nvcuvid.dll(is it a 32bit runtime?) should be in the SysWoW64 folder
qyqgpower is offline  
Old 19th September 2008, 08:58   #242  |  Link
roozhou
Registered User
 
Join Date: Apr 2008
Posts: 1,181
Great work!
Will there finally be a standalone dshow filter that can be used for real-time playback? That will be an alternative way to DXVA.
roozhou is offline  
Old 19th September 2008, 10:18   #243  |  Link
squid_80
Registered User
 
Join Date: Dec 2004
Location: Melbourne, AU
Posts: 1,963
Quote:
Originally Posted by roozhou View Post
Great work!
Will there finally be a standalone dshow filter that can be used for real-time playback? That will be an alternative way to DXVA.
What would be the point? If your card supports this, it already supports DXVA.
squid_80 is offline  
Old 19th September 2008, 10:43   #244  |  Link
Quark.Fusion
Registered User
 
Quark.Fusion's Avatar
 
Join Date: Jun 2008
Posts: 177
DXVA don't support post-processing. BTW resizing and sharpening also can be done at GPU

Best benefit will be if someone combine decoding, lanczos/blackman/spline resizing and FFT3DGPU in one package to avoid transfer of image back and forth. (Like this)

Last edited by Quark.Fusion; 19th September 2008 at 10:48.
Quark.Fusion is offline  
Old 19th September 2008, 10:54   #245  |  Link
NanoBot
Registered User
 
Join Date: Sep 2003
Posts: 209
Hi,

since Sagekilla asked for a comparision between DGAVCIndex and DGAVCIndexNV on a quadcore CPU, a ran a few tests.

My PC is a Q6600 @ 3,4GHz with 2GBytes DDR2-800 RAM running at 756MHz. The graphics card is a 9600GT with forceware 177.83 drivers, and the OS is Windows XP Pro 32bit. To get DGAVCIndexNV to run, I only had to unpack it and to copy the nvcuvid.dll into my system32 directory, like instructed. Both the denoising and the sharpening sliders in the Nvidia control panel where set to 0% during this tests.


The first clip I used for my tests is a DVB recording from "Anixe HD", which is about 2 minutes long. It is 1080i50, the footage seems to be shot with a video camera, since it really is interlaced, it has an AR of 1,78:1 and looks very clear ( no noise or grain )

1.) Decoded with DGAVCDecode and encoded as interlaced:

Code:
LoadPlugin("D:\MPEG\AVISYNTH\PLUGINS\DGAVCDecode.DLL")
AVCsource("anixe.dga")

D:\MPEG\megui\tools\x264>x264.exe --crf 22.0 --level 4.1 --ref 4 --mixed-refs --no-fast-pskip --bframes 3 --b-rdo --bime --weightb -
-trellis 1 --partitions p8x8,b8x8,i4x4,i8x8 --8x8dct --me umh --threads auto --thread-input --sar 1:1 --progress --no-psnr --interla
ced --output "H:\anixe.mkv" "G:\Aufnahmen\anixe.avs"

avis [info]: 1920x1080 @ 25.00 fps (3023 frames)
x264 [warning]: NAL HRD parameters require VBV max bitrate and buffer size to be specified
x264 [info]: using SAR=1/1
x264 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 Cache64
x264 [info]: slice I:14    Avg QP:21.34  size:229347
x264 [info]: slice P:1330  Avg QP:23.36  size: 65230
x264 [info]: slice B:1679  Avg QP:25.90  size: 20724
x264 [info]: consecutive B-frames:  9.8% 41.6% 17.0% 31.5%
x264 [info]: mb I  I16..4: 13.0% 50.0% 37.0%
x264 [info]: mb P  I16..4:  1.4%  6.9%  3.2%  P16..4: 44.7% 15.1% 10.9%  0.0%  0.0%    skip:17.8%
x264 [info]: mb B  I16..4:  0.1%  0.3%  0.3%  B16..8: 39.7%  1.1%  2.0%  direct: 6.7%  skip:49.8%  L0:36.9% L1:54.5% BI: 8.6%
x264 [info]: 8x8 transform  intra:58.1%  inter:55.6%
x264 [info]: ref P L0  52.5% 20.0% 12.9%  4.0%  3.7%  2.1%  3.6%  1.2%
x264 [info]: ref B L0  65.5% 20.6%  7.2%  2.4%  3.0%  1.3%
x264 [info]: ref B L1  70.9% 29.1%
x264 [info]: SSIM Mean Y:0.9713192
x264 [info]: kb/s:8254.2

encoded 3023 frames, 5.64 fps, 8254.32 kb/s
2.) Decoded with DGAVCDecodeNV with "deinterlace=false" and encoded as interlaced:

Code:
LoadPlugin("D:\MPEG\AVISYNTH\PLUGINS\DGAVCDecodeNV.DLL")
AVCsource("anixe_nv.dga", deinterlace=false)

D:\MPEG\megui\tools\x264>x264.exe --crf 22.0 --level 4.1 --ref 4 --mixed-refs --no-fast-pskip --bframes 3 --b-rdo --bime --weightb -
-trellis 1 --partitions p8x8,b8x8,i4x4,i8x8 --8x8dct --me umh --threads auto --thread-input --sar 1:1 --progress --no-psnr --interla
ced --output "H:\anixe_nv.mkv" "G:\Aufnahmen\anixe_nv.avs"
avis [info]: 1920x1080 @ 25.00 fps (3026 frames)
x264 [warning]: NAL HRD parameters require VBV max bitrate and buffer size to be specified
x264 [info]: using SAR=1/1
x264 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 Cache64
x264 [info]: slice I:14    Avg QP:21.01  size:228003
x264 [info]: slice P:1162  Avg QP:22.76  size: 66468
x264 [info]: slice B:1850  Avg QP:24.86  size: 16716
x264 [info]: consecutive B-frames:  9.0% 18.8% 25.3% 46.9%
x264 [info]: mb I  I16..4: 12.9% 47.7% 39.4%
x264 [info]: mb P  I16..4:  1.3%  7.5%  3.3%  P16..4: 40.3% 15.9% 11.0%  0.0%  0.0%    skip:20.9%
x264 [info]: mb B  I16..4:  0.1%  0.2%  0.3%  B16..8: 37.4%  0.9%  1.5%  direct: 5.2%  skip:54.5%  L0:33.0% L1:58.9% BI: 8.1%
x264 [info]: 8x8 transform  intra:59.7%  inter:55.5%
x264 [info]: ref P L0  56.6% 22.8%  8.5%  3.8%  2.7%  2.1%  2.3%  1.2%
x264 [info]: ref B L0  71.8% 20.6%  3.7%  2.0%  1.0%  0.9%
x264 [info]: ref B L1  76.9% 23.1%
x264 [info]: SSIM Mean Y:0.9719650
x264 [info]: kb/s:7359.7

encoded 3026 frames, 5.99 fps, 7359.81 kb/s
3.) Decoded with DGAVCDecodeNV with "deinterlace=true" and encoded as progressive:

Code:
LoadPlugin("D:\MPEG\AVISYNTH\PLUGINS\DGAVCDecodeNV.DLL")
AVCsource("anixe_nv.dga", deinterlace=true)

D:\MPEG\megui\tools\x264>x264.exe --crf 22.0 --level 4.1 --ref 4 --mixed-refs --no-fast-pskip --bframes 3 --b-rdo --bime --weightb -
-trellis 1 --partitions p8x8,b8x8,i4x4,i8x8 --8x8dct --me umh --threads auto --thread-input --sar 1:1 --progress --no-psnr --output
"H:\anixe_nv_deint.mkv" "G:\Aufnahmen\anixe_nv_deint.avs"
avis [info]: 1920x1080 @ 25.00 fps (3026 frames)
x264 [info]: using SAR=1/1
x264 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 Cache64
x264 [info]: slice I:14    Avg QP:19.71  size:161555
x264 [info]: slice P:1117  Avg QP:21.46  size: 54310
x264 [info]: slice B:1895  Avg QP:23.47  size: 14413
x264 [info]: consecutive B-frames:  8.6% 15.1% 22.5% 53.8%
x264 [info]: mb I  I16..4: 10.8% 70.6% 18.7%
x264 [info]: mb P  I16..4:  1.8% 14.8%  1.6%  P16..4: 41.4% 11.9%  7.8%  0.0%  0.0%    skip:20.7%
x264 [info]: mb B  I16..4:  0.2%  0.7%  0.1%  B16..8: 32.7%  0.7%  1.1%  direct: 6.1%  skip:58.3%  L0:34.0% L1:58.0% BI: 7.9%
x264 [info]: 8x8 transform  intra:80.1%  inter:68.1%
x264 [info]: ref P L0  74.8% 13.3%  7.6%  4.2%
x264 [info]: ref B L0  89.6%  7.0%  3.4%
x264 [info]: SSIM Mean Y:0.9820854
x264 [info]: kb/s:5964.2

encoded 3026 frames, 9.01 fps, 5964.27 kb/s

The second clip I used is also a DVB recording, this time from "Premiere HD", which also is about 2 minutes long. It is 1080i50 and seems to be shot on celluloid since the footage is progressive. This clip has an AR of 1,85:1 and looks quite noisy to be, might be film grain.


4.) Decoded by DGAVCDecode and encoded as progressive:

Code:
LoadPlugin("D:\MPEG\AVISYNTH\PLUGINS\DGAVCDecode.DLL")
AVCsource("Premiere.dga")

D:\MPEG\megui\tools\x264>x264.exe --crf 22.0 --level 4.1 --ref 4 --mixed-refs --no-fast-pskip --bframes 3 --b-rdo --bime --weightb -
-trellis 1 --partitions p8x8,b8x8,i4x4,i8x8 --8x8dct --me umh --threads auto --thread-input --sar 1:1 --progress --no-psnr --output
"H:\Premiere.mkv" "G:\Aufnahmen\Premiere.avs"
avis [info]: 1920x1080 @ 25.00 fps (2977 frames)
x264 [info]: using SAR=1/1
x264 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 Cache64
x264 [info]: slice I:42    Avg QP:14.42  size:171198
x264 [info]: slice P:2020  Avg QP:16.96  size: 84495
x264 [info]: slice B:915   Avg QP:19.28  size: 27135
x264 [info]: consecutive B-frames: 39.6% 55.8%  2.1%  2.5%
x264 [info]: mb I  I16..4: 14.2% 77.5%  8.4%
x264 [info]: mb P  I16..4:  2.4% 11.8%  0.7%  P16..4: 45.1% 20.9% 10.1%  0.0%  0.0%    skip: 8.9%
x264 [info]: mb B  I16..4:  0.3%  0.8%  0.0%  B16..8: 35.5%  0.8%  1.3%  direct:16.4%  skip:44.8%  L0:43.3% L1:50.3% BI: 6.3%
x264 [info]: 8x8 transform  intra:78.7%  inter:60.6%
x264 [info]: ref P L0  57.8% 20.9% 11.3%  9.9%
x264 [info]: ref B L0  80.5% 10.5%  9.0%
x264 [info]: SSIM Mean Y:0.9782869
x264 [info]: kb/s:13617.7

encoded 2977 frames, 5.89 fps, 13617.88 kb/s
5.) Decoded by DGAVCDecodeNV with "deinterlace=false" and encoded as progressive:

Code:
LoadPlugin("D:\MPEG\AVISYNTH\PLUGINS\DGAVCDecodeNV.DLL")
AVCsource("Premiere_nv.dga",deinterlace=false)

D:\MPEG\megui\tools\x264>x264.exe --crf 22.0 --level 4.1 --ref 4 --mixed-refs --no-fast-pskip --bframes 3 --b-rdo --bime --weightb -
-trellis 1 --partitions p8x8,b8x8,i4x4,i8x8 --8x8dct --me umh --threads auto --thread-input --sar 1:1 --progress --no-psnr --output
"H:\Premiere_nv.mkv" "G:\Aufnahmen\Premiere_nv.avs"
avis [info]: 1920x1080 @ 25.00 fps (2980 frames)
x264 [info]: using SAR=1/1
x264 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 Cache64
x264 [info]: slice I:37    Avg QP:14.35  size:149138
x264 [info]: slice P:1752  Avg QP:16.64  size: 78769
x264 [info]: slice B:1191  Avg QP:19.28  size: 25838
x264 [info]: consecutive B-frames: 24.9% 59.9% 10.1%  5.0%
x264 [info]: mb I  I16..4: 14.3% 77.9%  7.8%
x264 [info]: mb P  I16..4:  2.4% 12.1%  0.5%  P16..4: 44.7% 20.5% 10.0%  0.0%  0.0%    skip: 9.9%
x264 [info]: mb B  I16..4:  0.2%  0.7%  0.0%  B16..8: 39.4%  0.6%  1.2%  direct:14.4%  skip:43.4%  L0:39.8% L1:54.4% BI: 5.8%
x264 [info]: 8x8 transform  intra:80.1%  inter:60.0%
x264 [info]: ref P L0  63.5% 18.4% 12.5%  5.6%
x264 [info]: ref B L0  81.5% 11.8%  6.7%
x264 [info]: SSIM Mean Y:0.9784119
x264 [info]: kb/s:11697.7

encoded 2980 frames, 6.41 fps, 11697.84 kb/s
Since the second clip is noisy, I ran two more tests on this one while adjusting the denoiser slider in the Nvidia control panel to 25% and 51%. Nevertheless this has no impact on the result, both the performance and the file size are not influenced in anyway by the adjustment of the denoise slider. This makes me think that might it be necessary do explicitely activate the denoise postprocessing through the cuda api.


My preliminary conclusions are:

When using a quadcore CPU, offloading the decoding process to the GPU without deinterlacing at least atm speeds up things about 5% - 10%. This is a good step into the right direction, but it's such a big one as somebody might have hoped for. Nevertheless I suppose that the performance gain by GPU decoding might be much higher when using a single ore a dual core CPU.

On the other hand, the GPU decoding has a huge impact when used together with the GPU deinterlacing feature. Here I could observe a performace gain over 50%.

I also observed that decoding the video through the GPU seems to have an notable impact on the picture quality resulting in a significantly higher compressibility of the video during the encoding process.

And at last, the libavcodec.dll seems to have a bug which makes it loosing 3 frames compared to GPU decoding.

Subsuming the results of my tests it's my opinion that, even at this early stage of development, the usage of the GPU decoding through DGAVCDecodeNV is absolutely recommendable, if you own a suitable graphics card. Even if you might "only" get a speedup of 5% - 10% in the worst case, the better compressibility, the better picture quality and the very good deinterlacing feature are making DGAVCDecodeNV superior over DGAVCDecode.


C.U. NanoBot

Last edited by NanoBot; 19th September 2008 at 10:57. Reason: Typo
NanoBot is offline  
Old 19th September 2008, 10:58   #246  |  Link
lucassp
Registered User
 
Join Date: Jan 2007
Location: Romania, Timisoara
Posts: 223
Quote:
Originally Posted by Quark.Fusion View Post
DXVA don't support post-processing. BTW resizing and sharpening also can be done at GPU

Best benefit will be if someone combine decoding, lanczos/blackman/spline resizing and FFT3DGPU in one package to avoid transfer of image back and forth. (Like this)
All the HQV tests have shown that DXVA capable cards can do sharpening, noise reduction and other post-processing.

Firstly you need FFT3DGPU ported to CUDA because right now it uses the Direct3D backend.

So we need a good CUDA programmer to do all this
lucassp is offline  
Old 19th September 2008, 10:59   #247  |  Link
Daodan
unrecognized user
 
Join Date: Oct 2005
Location: home of Stella Artois
Posts: 303
Better picture quality? That, I think, shouldn't happen. Do you have maybe comparison pics? Or did you mean only when deinterlacer is used, compared to software ones?
__________________
zzz
Daodan is offline  
Old 19th September 2008, 11:03   #248  |  Link
NanoBot
Registered User
 
Join Date: Sep 2003
Posts: 209
Hi Daodan,

perhabs I used a misleading verbalization:

Decoding through libavcodec.dll gives me macroblock errors, which results in a loss of compression. Those errors are gone if I use GPU decoding.
So "better picture quality" was the wrong term in that context, I should have used "better decoding result".

C.U. NanoBot
NanoBot is offline  
Old 19th September 2008, 11:07   #249  |  Link
Quark.Fusion
Registered User
 
Quark.Fusion's Avatar
 
Join Date: Jun 2008
Posts: 177
AFAIK DXVA can't do FFT-NR, high-quality resize and limited sharpening — so you get medium quality NR, bilinear or at most bicubic resize and don't-know-what sharpening.
And if you want to encode that to view later on your not-so-powerful notebook or HTPC — you out of luck with DXVA.
Quark.Fusion is offline  
Old 19th September 2008, 11:19   #250  |  Link
blubberbirne
Registered User
 
Join Date: Sep 2004
Location: Germany, Hamm
Posts: 161
Quote:
Originally Posted by qyqgpower View Post
@blubberbirne
you are running vista 64bit, so may be the nvcuvid.dll(is it a 32bit runtime?) should be in the SysWoW64 folder
Great tip, i will test it, when i'm at home
blubberbirne is offline  
Old 19th September 2008, 11:22   #251  |  Link
sazanon
Registered User
 
Join Date: Apr 2002
Posts: 7
Quote:
Originally Posted by blubberbirne View Post
Great tip, i will test it, when i'm at home
Yes, it works that way
sazanon is offline  
Old 19th September 2008, 11:52   #252  |  Link
Guest
Guest
 
Join Date: Jan 2002
Posts: 21,901
Quote:
Originally Posted by JK1974 View Post
Thanks a lot for your clarification - donīt have the time to check the forum daily, so I must have missed it. With dumb bob you mean a simple SeparateFields()? Going to check if I find this thread...
Please take this off-topic discussion to the appropriate thread.
Guest is offline  
Old 19th September 2008, 12:08   #253  |  Link
JK1974
Registered User
 
Join Date: Mar 2005
Posts: 89
Quote:
Originally Posted by neuron2 View Post
Please take this off-topic discussion to the appropriate thread.
I have found the thread in the meantime, so no more discussion on this apart from a little request: If there is a special bobbing function inside the drivers that can be implemented with just a few commands, please give it a try - we can compare the results then - in a separate thread
JK1974 is offline  
Old 19th September 2008, 12:26   #254  |  Link
squid_80
Registered User
 
Join Date: Dec 2004
Location: Melbourne, AU
Posts: 1,963
I haven't tried it but bobbing should be possible by setting the deinterlace mode to cudaVideoDeinterlaceMode_Bob and toggling the second_field flag of the CUVIDPROCPARAMS struct when mapping the frame (set to 0 when an even frame is requested, 1 for an odd frame). No idea if this functionality is implemented in cuda yet though.
squid_80 is offline  
Old 19th September 2008, 13:15   #255  |  Link
rack04
Registered User
 
Join Date: Mar 2006
Posts: 1,538
Quote:
Originally Posted by Sagekilla View Post
As I said, I'm curious what the results would be on a workhorse like I just described I plan on upgrading to such a system in the near future so this will be helpful having one thing offloaded from my chain.

Also, the encode doesn't need to run at "50+ fps" for it to be a bottleneck. The decode eats up a significant portion of my cpu time on my desktop computer, where I have a lowly Opteron 170.
My Q6600 was running at 3.2GHz.
rack04 is offline  
Old 19th September 2008, 13:16   #256  |  Link
Guest
Guest
 
Join Date: Jan 2002
Posts: 21,901
It affects the entire infrastructure of DGAVCIndex and DGAVCDecode. Even if CUDA can support it, I don't plan to.
Guest is offline  
Old 19th September 2008, 20:24   #257  |  Link
Sagekilla
x264aholic
 
Join Date: Jul 2007
Location: New York
Posts: 1,752
I suppose this is, at the very least, a big win for anyone with a nvidia card and some HD interlaced material
__________________
You can't call your encoding speed slow until you start measuring in seconds per frame.
Sagekilla is offline  
Old 19th September 2008, 20:42   #258  |  Link
Inventive Software
Turkey Machine
 
Join Date: Jan 2005
Location: Lowestoft, UK (but visit lots of places with bribes [beer])
Posts: 1,953
Quote:
Originally Posted by crypto View Post
Ok, here are my results dgavcdecode vs. dgavcdecodeNV:

First good news: Using x264 cli two pass encodings are no problem.
Second good news: Acceleration is massive especially on the first pass.
Third good news: The quality of the final encode seems to be higher. I don't know why and have to do more tests.

Source: 1080p30 Apple music clip downscaled to 720p30
CPU: Q6600 2.4GHz
GPU: GF 8600 GTS 1.450 GHz

dgavcdecode:
pass #1: encoded 6580 frames, 32.68 fps, 4811.95 kb/s
pass #2: encoded 6580 frames, 21.48 fps, 5005.68 kb/s
x264 [info]: SSIM Mean Y:0.9689052
x264 [info]: PSNR Mean Y:41.892 U:48.994 V:48.284 Avg:43.078 Global:42.345 kb/s:5005.53


dgavcdecodeNV:
pass #1: encoded 6580 frames, 57.11 fps, 4811.95 kb/s
pass #2: encoded 6580 frames, 24.45 fps, 5005.42 kb/s
x264 [info]: SSIM Mean Y:0.9691189
x264 [info]: PSNR Mean Y:41.931 U:49.607 V:49.447 Avg:43.258 Global:42.404 kb/s:5005.27
Wow, that's a big increase! Same encoder settings I assume?

Quote:
Originally Posted by rack04 View Post
Don't really know how to translate the results but here goes:

Q6600
8800GT

DGAVCDecode


DGAVCDecodeNV


Directshow
Only a slight increase, but it helps. Good one.
__________________
On Discworld it is clearly recognized that million-to-one chances happen 9 times out of 10. If the hero did not overcome huge odds, what would be the point? Terry Pratchett - The Science Of Discworld
Inventive Software is offline  
Old 19th September 2008, 22:49   #259  |  Link
Atak_Snajpera
RipBot264 author
 
Atak_Snajpera's Avatar
 
Join Date: May 2006
Location: Poland
Posts: 7,803
Quote:
I suppose this is, at the very least, a big win for anyone with a nvidia card and some HD interlaced material
Exactly! High-quality deinterlacer for free along with PAFF decoding are the most important for me however I'm ATI user
Atak_Snajpera is offline  
Old 19th September 2008, 23:51   #260  |  Link
ooferomen
Registered User
 
Join Date: May 2008
Posts: 39
Vista SP1
Athlon64 3500+ 2.25ghz
1GB ram
8400GS with VP3

i took a 1080p hd-dvd clip and cropped a few lines off the top and bottom then bilinear resized to 640x352 and encoded using megui iphone profile.

no gpu:
pass 1 encoded 2168 frames, 6.03 fps, 1003.51 kb/s
pass 2 encoded 2168 frames, 6.06 fps, 1000.23 kb/s

gpu:
pass 1 encoded 2170 frames, 11.91 fps, 1003.26 kb/s
pass 2 encoded 2170 frames, 12.06 fps, 1000.27 kb/s
ooferomen is offline  
Closed Thread

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 12:30.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.