Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Video Encoding > MPEG-4 AVC / H.264

Reply
 
Thread Tools Search this Thread Display Modes
Old 10th July 2009, 07:54   #1  |  Link
St Devious
Hardware Aspirin
 
Join Date: Jul 2007
Posts: 461
CUDA H.264 vs x264 Speed and Image Quality Benchmarks, discussion

Hardware
  • Intel Q9450 @ 3.2 Ghz
  • 4GB RAM @ 800 MHz 5-4-4-12 2T
  • GTS 250 512 MB @ 738/1836/1100 MHz (Core/Shader/Memory)



Software
  • Nvidia GeForce 186.18 WHQL Drivers
  • MeGUI 0.3.1.1047
  • x264 r1178 Jeeb's Build
  • Mediacoder 0.7.1.4475 for CUDA GPU encoding


Source Videos
VforVendetta 2000Kbps 1280x720 Clip

Mediainfo on Source
Code:
Video
ID                               : 2
Format                           : AVC
Format/Info                      : Advanced Video Codec
Format profile                   : High@L4.0
Format settings, CABAC           : Yes
Format settings, ReFrames        : 4 frames
Muxing mode                      : Container profile=Unknown@4.0
Codec ID                         : V_MPEG4/ISO/AVC
Duration                         : 1mn 52s
Nominal bit rate                 : 2 000 Kbps
Width                            : 1 280 pixels
Height                           : 720 pixels
Display aspect ratio             : 2.35
Frame rate                       : 23.976 fps
Resolution                       : 24 bits
Colorimetry                      : 4:2:0
Scan type                        : Progressive
Bits/(Pixel*Frame)               : 0.091
Writing library                  : x264 core 65 r999+1 eb3ef1b
Encoding settings                : cabac=1 / ref=4 / deblock=1:-1:-1 / analyse=0x3:0x113 / me=tesa / subme=9 / psy_rd=1.0:0.0 / 
mixed_ref=1 / me_range=32 / chroma_me=1 /trellis=0 / 8x8dct=1 / cqm=0 / deadzone=4,4 / chroma_qp_offset=-2 / threads=3 / 
nr=0 / decimate=0 / mbaff=0 / bframes=3 / b_pyramid=1 / b_adapt=2 / b_bias=0 / direct=3 / wpredb=1 / keyint=250 / keyint_min=25 / 
scenecut=40(pre) / rc=2pass / bitrate=2000 / ratetol=1.0 / qcomp=0.60 / qpmin=10 / qpmax=51 / qpstep=4 / cplxblur=20.0 / qblur=0.5 / ip_ratio=1.40
 / pb_ratio=1.30 / aq=1:1.00
Encoding Settings
CUDA GPU



MeGUI x264
Preset ultrafast used here
Code:
program --bitrate 800 --no-mixed-refs --bframes 1 --no-weightb --direct temporal --nf --no-cabac --subme 1 --partitions none --scenecut 0 --me dia 
--threads auto --thread-input --aq-mode 0 --output "output" "input" --subme 0 --preset ultrafast
Speed Results
  • CUDA GPU H.264 - 23.5s 114.4 FPS
  • x264 preset ultrafast - 30s 90.8 FPS

Image comparison

Source



x264 @ 800 Kbps



CUDA GPU @ 800 Kbps


Output File Mediainfo

x264 @ 800 Kbps

Code:
Video
ID                               : 1
Format                           : AVC
Format/Info                      : Advanced Video Codec
Format profile                   : Main@L3.1
Format settings, CABAC           : No
Format settings, ReFrames        : 2 frames
Codec ID                         : avc1
Codec ID/Info                    : Advanced Video Coding
Duration                         : 1mn 52s
Bit rate mode                    : Variable
Bit rate                         : 900 Kbps
Nominal bit rate                 : 800 Kbps
Maximum bit rate                 : 2 229 Kbps
Width                            : 1 280 pixels
Height                           : 720 pixels
Display aspect ratio             : 16/9
Frame rate mode                  : Constant
Frame rate                       : 23.976 fps
Resolution                       : 24 bits
Colorimetry                      : 4:2:0
Scan type                        : Progressive
Bits/(Pixel*Frame)               : 0.041
Stream size                      : 12.1 MiB (100%)
Writing library                  : x264 core 68 r1179M 96e2229
Encoding settings                : cabac=0 / ref=1 / deblock=0:0:0 / analyse=0:0 / me=dia / subme=0 / psy_rd=0.0:0.0 / mixed_ref=0 / 
me_range=16 / chroma_me=1 / trellis=0 / 8x8dct=0 / cqm=0 / deadzone=21,11 / chroma_qp_offset=0 / threads=6 / nr=0 / decimate=1 / mbaff=0 /
 bframes=1 / b_pyramid=0 / b_adapt=1 / b_bias=0 / direct=1 / wpredb=0 / keyint=250 / keyint_min=25 / scenecut=0 / rc=abr / bitrate=800 /
 ratetol=1.0 / qcomp=0.60 / qpmin=10 / qpmax=51 / qpstep=4 / ip_ratio=1.40 / pb_ratio=1.30 / aq=0
CUDA GPU @ 800 Kbps

Code:
Video
ID                               : 1
Format                           : AVC
Format/Info                      : Advanced Video Codec
Format profile                   : High@L5.1
Format settings, CABAC           : Yes
Format settings, ReFrames        : 2 frames
Codec ID                         : avc1
Codec ID/Info                    : Advanced Video Coding
Duration                         : 1mn 52s
Bit rate mode                    : Variable
Bit rate                         : 869 Kbps
Maximum bit rate                 : 1 776 Kbps
Width                            : 1 280 pixels
Height                           : 720 pixels
Display aspect ratio             : 16/9
Frame rate mode                  : Constant
Frame rate                       : 23.976 fps
Resolution                       : 24 bits
Colorimetry                      : 4:2:0
Scan type                        : Progressive
Bits/(Pixel*Frame)               : 0.039
Stream size                      : 11.6 MiB (100%)


Shows that GPU temperature rose by 6 C when encoding. CPU Usage was almost 95% on all 4 cores during the encode.


More soon...

Suggest settings and comparisons you would like to see.

EDIT: Thank you for you suggestion guys.

As I said in my OP, that there is more to come with different sources at different resolutions.

I'm not trying to advertise anyone here. Just feeding my curiosity to see what kind of encoding does the free CUDA encoder do and probably helping others feeling the same way in the process.

The settings in this encode were used to test the pure speed of x264, to see if it could be as fast as GPU encode at similar or better quality. Since that didn't happen, I will try to match the quality and see what is the difference in performance.

As to the hardware used, this is the best thing I have access to right now. Also as others said GTS 250 is a last generation GPU based on the G92 chip used in 8800GTS 512 MB, 9800GTX, 9800 GTX+.

Also I may use Badaboom and MediaShow Espresso in future if time permits.

Also I plan on using this video

1080p VBR Video Quality Test Streams
Sony HDW-F900 footage, 1080p@25, 18 Mbps average, 30 Mbps peak in a 35 Mbps Transport Stream (259,534,064 bytes)

on this page http://www.w6rz.net/ as one of the sources. Please let me know if there is another uncompressed source you would like me to try.

I'm not too sure about the questions regarding the decoder, I only have ffdshow+haali media splitter installed on my system.
I would really appreciate if you let me know If i need to change something with the decoders.

Last edited by St Devious; 10th July 2009 at 15:03.
St Devious is offline   Reply With Quote
Old 10th July 2009, 08:45   #2  |  Link
roozhou
Registered User
 
Join Date: Apr 2008
Posts: 1,181
MeGUI uses avisynth to frameserve but mediacoder uses mencoder to decode and pipes raw data to encoders.
With "ultrafast" settings decoding may become a bottleneck for x264.
roozhou is offline   Reply With Quote
Old 10th July 2009, 08:56   #3  |  Link
kumi
Straight to video
 
kumi's Avatar
 
Join Date: Jun 2005
Posts: 637
I wonder what settings MediaCoder used to arrive at these benchmarks?
__________________
.
kumi is offline   Reply With Quote
Old 10th July 2009, 08:59   #4  |  Link
stanleyhuang
MediaCoder author
 
stanleyhuang's Avatar
 
Join Date: Sep 2005
Location: Shanghai
Posts: 65
Absolutely. Though the decoding is done in the separate process and there is a large ring-buffer to connect decoder and encoder, the decoding is still a bottleneck on a fast multi-core processor. Fortunately mplayer-mt/ffmpeg-mt has multi-threaded H.264 decoding.

Quote:
Originally Posted by roozhou View Post
MeGUI uses avisynth to frameserve but mediacoder uses mencoder to decode and pipes raw data to encoders.
With "ultrafast" settings decoding may become a bottleneck for x264.
__________________
When things work together, things work.
MediaCoder makes audio and video things work.
stanleyhuang is offline   Reply With Quote
Old 10th July 2009, 09:03   #5  |  Link
Dark Shikari
x264 developer
 
Dark Shikari's Avatar
 
Join Date: Sep 2005
Posts: 8,690
Why did you use such retardedly low quality settings with x264?

At a minimum your goal should be to match the quality of the two; it makes no sense to test two encoders against each other in terms of speed at vastly different compression settings.

Also, if you're going to compare two encoders, use raw video input, not a highly compressed H.264 stream whose decoding method differs between the two encoders.

Last edited by Dark Shikari; 10th July 2009 at 09:05.
Dark Shikari is offline   Reply With Quote
Old 10th July 2009, 09:06   #6  |  Link
stanleyhuang
MediaCoder author
 
stanleyhuang's Avatar
 
Join Date: Sep 2005
Location: Shanghai
Posts: 65
I think Q9450 and GTS250 is not the hardware of the same level, at least not the same price. ;-)
__________________
When things work together, things work.
MediaCoder makes audio and video things work.
stanleyhuang is offline   Reply With Quote
Old 10th July 2009, 09:13   #7  |  Link
Dark Shikari
x264 developer
 
Dark Shikari's Avatar
 
Join Date: Sep 2005
Posts: 8,690
Quote:
Originally Posted by stanleyhuang View Post
I think Q9450 and GTS250 is not the hardware of the same level, at least not the same price. ;-)
This as well. It's rather disingenuous to pick a last-generation CPU and compare to a current GPU (and then question why the former is slower).
Dark Shikari is offline   Reply With Quote
Old 10th July 2009, 09:14   #8  |  Link
stanleyhuang
MediaCoder author
 
stanleyhuang's Avatar
 
Join Date: Sep 2005
Location: Shanghai
Posts: 65
Actually x264 do generate better quality when it is configured for maximum quality, but that will also make the transcoding extremely slow.
__________________
When things work together, things work.
MediaCoder makes audio and video things work.
stanleyhuang is offline   Reply With Quote
Old 10th July 2009, 09:18   #9  |  Link
Fr4nz
Registered User
 
Join Date: Feb 2003
Posts: 448
This is a totally borked comparison...
Fr4nz is offline   Reply With Quote
Old 10th July 2009, 09:19   #10  |  Link
Dark Shikari
x264 developer
 
Dark Shikari's Avatar
 
Join Date: Sep 2005
Posts: 8,690
Quote:
Originally Posted by stanleyhuang View Post
Actually x264 do generate better quality when it is configured for maximum quality, but that will also make the transcoding extremely slow.
You hardly need "maximum quality"; even the (reasonable) faster settings beat the crappy CUDA encoder easily quality-wise, as has been tested dozens of time before with exactly this encoder.

Of course, it doesn't help that he even turned off subpixel motion vectors though... his settings are completely ridiculous.
Quote:
Originally Posted by Fr4nz View Post
This is a totally b0rked comparison...
Yes, and he posted it in a very official-looking fashion despite the entire thing being done in a completely idiotic and haphazard manner.

I mean seriously:

1. Pick a GPU that's more costly and faster than the CPU.
2. Go out of the way to pick the worst possible settings for x264 (literally!)
3. Use two different decoders to feed the different encoders.
4. Show how the GPU encoder looks so much better than x264.

I'm not going to bother with this anymore as it's clear that this guy is here solely to try to advertise crappy encoders by performing intentionally bad tests.

Last edited by Dark Shikari; 10th July 2009 at 09:22.
Dark Shikari is offline   Reply With Quote
Old 10th July 2009, 09:30   #11  |  Link
stanleyhuang
MediaCoder author
 
stanleyhuang's Avatar
 
Join Date: Sep 2005
Location: Shanghai
Posts: 65
I've published the x264 options in my benchmark. Under this configuration, both encoders have near (x264 is slightly better) output quality. The CPU I used costs about US$ 195, the GPU (display adapter with 896MB onboard GDDR3) I used costs about US$ 235.

PS1: For serious encoding, I myself use x264.
PS2: I don't know and have no relationship with St Devious and I don't think he can benefit anything by advertising the crappy encoder. I just saw this post by a back-link to my blog.
__________________
When things work together, things work.
MediaCoder makes audio and video things work.

Last edited by stanleyhuang; 10th July 2009 at 09:44.
stanleyhuang is offline   Reply With Quote
Old 10th July 2009, 10:04   #12  |  Link
Fr4nz
Registered User
 
Join Date: Feb 2003
Posts: 448
Quote:
Originally Posted by Dark Shikari View Post
2. Go out of the way to pick the worst possible settings for x264 (literally!)
I think this is the most important point that invalidates the comparison: you have to use *possibily* the same settings in both encoders in order to make a credible comparison.
Fr4nz is offline   Reply With Quote
Old 10th July 2009, 10:13   #13  |  Link
stanleyhuang
MediaCoder author
 
stanleyhuang's Avatar
 
Join Date: Sep 2005
Location: Shanghai
Posts: 65
He might just want x264 to work as fast as it can.
__________________
When things work together, things work.
MediaCoder makes audio and video things work.
stanleyhuang is offline   Reply With Quote
Old 10th July 2009, 10:52   #14  |  Link
roozhou
Registered User
 
Join Date: Apr 2008
Posts: 1,181
Quote:
Originally Posted by stanleyhuang View Post
He might just want x264 to work as fast as it can.
I wonder why x264 ran so slowly with --preset ultrafast on a Quad-core.
roozhou is offline   Reply With Quote
Old 10th July 2009, 11:19   #15  |  Link
slavickas
I'm Shpongled
 
slavickas's Avatar
 
Join Date: Nov 2001
Location: Lithuania
Posts: 303
Quote:
Originally Posted by Dark Shikari View Post
This as well. It's rather disingenuous to pick a last-generation CPU and compare to a current GPU (and then question why the former is slower).
err no GTS 250 = 9800GTX+ ~= 8800 GTS
slavickas is offline   Reply With Quote
Old 10th July 2009, 11:44   #16  |  Link
Reimar
Registered User
 
Join Date: Jun 2005
Posts: 278
Quote:
Originally Posted by roozhou View Post
I wonder why x264 ran so slowly with --preset ultrafast on a Quad-core.
Probably due to decoding speed. Unfortunately it is unclear which decoders were used. If either the source was uncompressed or at least DXVA+readback was used for x264 or the source used some format that the GPU can't accelerate it might make some sense as an encoder comparison so far the main conclusions is: A fast GPU can decode H.264 a lot faster than a slow CPU with some random (probably single-threaded) decoder. Not exactly news. And not in any way related to x264.
Reimar is offline   Reply With Quote
Old 10th July 2009, 11:53   #17  |  Link
roozhou
Registered User
 
Join Date: Apr 2008
Posts: 1,181
Quote:
Originally Posted by Reimar View Post
Probably due to decoding speed. Unfortunately it is unclear which decoders were used. If either the source was uncompressed or at least DXVA+readback was used for x264 or the source used some format that the GPU can't accelerate it might make some sense as an encoder comparison so far the main conclusions is: A fast GPU can decode H.264 a lot faster than a slow CPU with some random (probably single-threaded) decoder. Not exactly news. And not in any way related to x264.
How can one perform DXVA+readback? Is there any opensource implementation available?
roozhou is offline   Reply With Quote
Old 10th July 2009, 12:06   #18  |  Link
ajp_anton
Registered User
 
ajp_anton's Avatar
 
Join Date: Aug 2006
Location: Stockholm/Helsinki
Posts: 747
Quote:
Originally Posted by Dark Shikari View Post
This as well. It's rather disingenuous to pick a last-generation CPU and compare to a current GPU (and then question why the former is slower).
He DID pick a last-generation GPU.

And about picking the "worst possible settings", he's trying to match the speeds, not the quality.

However, it doesn't say if x264 was able to use all cores. It says "95% on all 4 cores", was this during the x264 encode? And why only show a picture of the last core? There's a nice picture of the task manager with all 4, separately or combined.
Not to mention what decoder was used (use uncompressed), and why not use the source of the source?
ajp_anton is offline   Reply With Quote
Old 10th July 2009, 13:07   #19  |  Link
Reimar
Registered User
 
Join Date: Jun 2005
Posts: 278
Quote:
Originally Posted by roozhou View Post
How can one perform DXVA+readback? Is there any opensource implementation available?
I actually didn't mean to imply it is possibly, I don't know (I know it is possible on Linux with VDPAU), but given that IDirectXVideoDecoderService::CreateVideoDecoder takes IDirect3DSurface9 as render target I'd expect you should be able to read back from those surfaces.
That is DXVA2 only though...
Reimar is offline   Reply With Quote
Old 10th July 2009, 13:19   #20  |  Link
tph
Registered User
 
Join Date: May 2008
Posts: 40
Any video encoder comparison needs to use raw video as input, otherwise you're benchmarking decoder performance.
tph is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 20:04.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2017, vBulletin Solutions Inc.