Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Video Encoding > MPEG-4 AVC / H.264

Reply
 
Thread Tools Search this Thread Display Modes
Old 20th October 2010, 14:11   #1  |  Link
Gannjunior
Registered User
 
Gannjunior's Avatar
 
Join Date: Jul 2002
Location: Italy, Genova
Posts: 206
Max speed from mpeg2 to h264

Hi,

I need more speed during encoding of many DVD to h264.
I tried different sw like megui, staxrip but I would like to improve dramatically fps.

I heard Nvidia CUDA GPU does miracles. For example in TMPGenc 4 and Premiere CS5. But what is not clear is if the "miracle" is only during the editing (to accelerate filter etc and so to have real time effect) or also in the final encoding.
That is what I need. I'm looking for something to improve only the compression from mpeg2 to h264.

Any suggestion?

Thanks
Ciao!
__________________
P6T| 12Gb DDR3| i7-920 D0@3.8Ghz@1.168v+NOCTUA NH-U12P| Liberty 620| Stacker|2xRAID0 (4x500gb) Seag.72k.11| Seag. 500gb 72k.12| Seag. 1.5Tb 72k.11| GTX470@668/1843| U2410 24"| Se7en x64 on VRapt 300gb
Gannjunior is offline   Reply With Quote
Old 20th October 2010, 14:29   #2  |  Link
nurbs
Registered User
 
Join Date: Dec 2005
Posts: 1,460
CUDA can be used for decoding and deinterlacing with the DGNVTools. If your content is HD that can be beneficial, especially if you let it do the deinterlacing. For SD there is still the deinterlacing, but apart from that don't expect too much speed gains.
There are also CUDA encoders. They can be useful if one has a slow CPU and doesn't care about quality, but in your case, since your GPU is old and CPU is new, it's probably better to simply use a faster x264 preset. That will also cost some quality since you are turning the psy optimizations off if you use faster settings than the defaults, but the CUDA encoders aren't good to begin with so you should still get better quality unless you use the fastest ones.
nurbs is offline   Reply With Quote
Old 20th October 2010, 14:36   #3  |  Link
nm
Registered User
 
Join Date: Mar 2005
Location: Finland
Posts: 2,643
Quote:
Originally Posted by Gannjunior
I need more speed during encoding of many DVD to h264.
I tried different sw like megui, staxrip but I would like to improve dramatically fps.
MPEG-2 decoding is fast, so your bottleneck is either in filtering or H.264 encoding. If you don't use heavy filters, speed can be gained by simply changing encoding parameters or by using more CPUs.

What x264 parameters have you tried and what kind of encoding speeds are you getting now?

Last edited by nm; 20th October 2010 at 14:39.
nm is offline   Reply With Quote
Old 20th October 2010, 14:38   #4  |  Link
Groucho2004
 
Join Date: Mar 2006
Location: Barcelona
Posts: 5,030
Quote:
Originally Posted by nm View Post
getting a faster CPU.
Look at his sig. It should be very fast.
Groucho2004 is offline   Reply With Quote
Old 20th October 2010, 14:40   #5  |  Link
nm
Registered User
 
Join Date: Mar 2005
Location: Finland
Posts: 2,643
Quote:
Originally Posted by Groucho2004 View Post
Look at his sig. It should be very fast.
Yes, I corrected the sentence already.
nm is offline   Reply With Quote
Old 20th October 2010, 14:42   #6  |  Link
nurbs
Registered User
 
Join Date: Dec 2005
Posts: 1,460
Yeah, I got an 860 with a GHz slower clock and for SD resolutions it's faster than realtime even with --preset veryslow (for progressive content at --crf 20).

edit: Not totally true. I forgot that I limited the number of B-frames to 3 when I did that test.

Last edited by nurbs; 20th October 2010 at 16:36.
nurbs is offline   Reply With Quote
Old 20th October 2010, 17:33   #7  |  Link
prOnorama
Registered User
 
Join Date: Mar 2005
Posts: 249
Did a test on a progressive PAL DVD, AR ca. 2.40:1 (with cropping and resizing), on a AMD Phenom II X4 955 BE quadcore @ 3.2 Ghz:

x264 (--preset x) --crf 16

avs [info]: 688x288p 0:0 @ 25/1 fps (cfr)

--preset ultrafast
x264 [info]: profile Constrained Baseline, level 2.1
encoded 5001 frames, 186.08 fps, 2576.02 kb/s
--preset superfast
x264 [info]: profile High, level 2.1
encoded 5001 frames, 182.58 fps, 2055.69 kb/s
--preset veryfast
x264 [info]: profile High, level 2.1
encoded 5001 frames, 181.03 fps, 859.70 kb/s
--preset faster
x264 [info]: profile High, level 2.1
encoded 5001 frames, 154.32 fps, 923.53 kb/s
--preset fast
x264 [info]: profile High, level 2.1
encoded 5001 frames, 106.55 fps, 1010.96 kb/s
default
x264 [info]: profile High, level 2.1
encoded 5001 frames, 91.45 fps, 981.13 kb/s
--preset slow
x264 [info]: profile High, level 2.1
encoded 5001 frames, 67.25 fps, 968.15 kb/s
--preset slower
x264 [info]: profile High, level 2.2
encoded 5001 frames, 28.82 fps, 935.38 kb/s
--preset veryslow
x264 [info]: profile High, level 3.1
encoded 5001 frames, 15.86 fps, 819.29 kb/s
--preset placebo
x264 [info]: profile High, level 3.1
encoded 5001 frames, 7.95 fps, 822.16 kb/s

I didn't use multithreaded AviSynth, the bottleneck seems te be around 180 FPS here for normal AviSynth.

Kind of surprised the bitrates on --preset veryfast/faster are actually lower than the default preset, not what I expected
prOnorama is offline   Reply With Quote
Old 20th October 2010, 18:04   #8  |  Link
sneaker_ger
Registered User
 
Join Date: Dec 2002
Posts: 5,563
Quote:
Originally Posted by prOnorama View Post
Kind of surprised the bitrates on --preset veryfast/faster are actually lower than the default preset, not what I expected
Don't draw the conclusion, that faster/veryfast create higher quality on this sample at the same bitrate than normal. CRF only means same quality if the settings are identical, which is obviously not the case.
sneaker_ger is offline   Reply With Quote
Old 20th October 2010, 18:05   #9  |  Link
nurbs
Registered User
 
Join Date: Dec 2005
Posts: 1,460
CRF only gives you roughly constant quality for the same settings, so faster settings don't necessarily have to result in higher bitrates. With the same settings and CRF turning off the psy optimizations reduces the bitrate by a lot. I guess if you'd turned of psy everything down from --preset fast would get a lower bitrate, because starting with faster RDO is off.
nurbs is offline   Reply With Quote
Old 20th October 2010, 21:26   #10  |  Link
stax76
Registered User
 
Join Date: Jun 2002
Posts: 6,502
Quote:
I heard Nvidia CUDA GPU does miracles.
With DGDecNV only with slow CPUs, with 4 or more cores gains are small, even with a HD source it might not be more then 1 fps.

Last edited by stax76; 20th October 2010 at 21:28.
stax76 is offline   Reply With Quote
Old 21st October 2010, 01:34   #11  |  Link
Blue_MiSfit
Derek Prestegard IRL
 
Blue_MiSfit's Avatar
 
Join Date: Nov 2003
Location: Los Angeles
Posts: 5,757
Unless you have to deinterlace. Then it can be a HUGE improvement.

Derek
Blue_MiSfit is offline   Reply With Quote
Old 22nd October 2010, 01:29   #12  |  Link
Gannjunior
Registered User
 
Gannjunior's Avatar
 
Join Date: Jul 2002
Location: Italy, Genova
Posts: 206
Hi guys,

thanks for your answers


Off course I need to deinterlace. I usually use yadif and a little tweak of brightness and contrast.
And for the string compression: --preset fast --crf 23 --ref 3 --rc-lookahead 40 --deblock -1:-1 --subme 5 --trellis 0 --output "<target>" "<source>" ...as you can see the settings are quite "light"...

The problem is that I'm trying to help a friend he needs to encode many clips per week of about a couple of hours each one from mpeg2 to h264. And the problem is not h264 or encoding software...The problem is he has many clips to encode and maybe some hardware help is the only way to really boost encode...I believed CUDA could help also in final encoding..but it seems CUDA is only useful during editing a video to get real time effect...

Maybe the only way is to update from i920 to i980x?

Any suggestion is welcome

Thanks again

ciao!
__________________
P6T| 12Gb DDR3| i7-920 D0@3.8Ghz@1.168v+NOCTUA NH-U12P| Liberty 620| Stacker|2xRAID0 (4x500gb) Seag.72k.11| Seag. 500gb 72k.12| Seag. 1.5Tb 72k.11| GTX470@668/1843| U2410 24"| Se7en x64 on VRapt 300gb
Gannjunior is offline   Reply With Quote
Old 22nd October 2010, 01:43   #13  |  Link
Blue_MiSfit
Derek Prestegard IRL
 
Blue_MiSfit's Avatar
 
Join Date: Nov 2003
Location: Los Angeles
Posts: 5,757
Uhh.. of course you need to deinterlace? You do realize that most newer DVDs aren't actually interlaced at all, but are actually coded in 24p and contain RFF flags to apply a soft 3:2 pulldown during playback on devices that have to output interlaced analog sigals... right?

Stuff originated on film is either coded this way, or as hard 3:2 pulldown, in which case you don't deinterlace either - you IVTC (inverse telecine) via field matching and decimation, neither of which can run on a GPU.

If you mostly process content that is actually natively interlaced (sitcoms, porn, some animated content, live music etc) then yes GPU decoding will help you a lot.

To my eyes, CUDA deinterlacing looks at least as good as YADIF, if not better. It is also quite fast, and frees CPU cycles for encoding power. The latter is usually the biggest benefit, though with very fast deinterlacers like YADIF, it may or may not be a huge issue.

Try, and tell us your results

Derek
Blue_MiSfit is offline   Reply With Quote
Old 22nd October 2010, 10:52   #14  |  Link
nm
Registered User
 
Join Date: Mar 2005
Location: Finland
Posts: 2,643
Quote:
Originally Posted by Blue_MiSfit View Post
Uhh.. of course you need to deinterlace? You do realize that most newer DVDs aren't actually interlaced at all, but are actually coded in 24p and contain RFF flags to apply a soft 3:2 pulldown during playback on devices that have to output interlaced analog sigals... right?
And since he's from Italy, most R2 DVD movies are straight 25p. Sometimes euro pulldown (2:2:2:2:2:2:2:2:2:2:2:3) may be used, but more often there are bad, field-blended conversions from NTSC sources. Those can sometimes be fixed with SRestore.
nm is offline   Reply With Quote
Old 22nd October 2010, 20:53   #15  |  Link
Didée
Registered User
 
Join Date: Apr 2002
Location: Germany
Posts: 5,394
Quote:
Originally Posted by Gannjunior View Post
I would like to improve dramatically fps.
Weee ... "dramatically".

Basic question: What is your actual encoding speed?

Related: Show a typical Avisynth script you're using.
__________________
- We´re at the beginning of the end of mankind´s childhood -

My little flickr gallery. (Yes indeed, I do have hobbies other than digital video!)
Didée is offline   Reply With Quote
Old 24th October 2010, 02:29   #16  |  Link
Gannjunior
Registered User
 
Gannjunior's Avatar
 
Join Date: Jul 2002
Location: Italy, Genova
Posts: 206
Thanks again for you support Blue, mm and Didè!

nm,
unfortunately, football match clip I'm speaking about, are taken from skytv and these clip are always interlaced.

Here a little sample:
http://www.speedyshare.com/files/24838975/1.mpg

I will give you some numbers, based on the complete conversion of the whole match uploaded above (1h 43').

I did two test, encoding with staxrip, using in both two test the same h264 settings (see my preset in the attached file at the bottom, but it is something near this: --preset fast --crf 22 --ref 2 --rc-lookahead 40 --deblock 0:0 --subme 5 --trellis 0 ).

Difference between two test is a denoise: often it happens source is quite degradated, so, using CRF, I need more compressibility to contain size and to reduce the visive pixelment impact..


first test

Quote:
MPEG2Source("F:\wisport\VIDEO_TS\VTS_01_1 temp files\VTS_01_1.d2v", CPU=6)
LoadCplugin("D:\Program Files (x86)\StaxRip\Applications\AviSynth plugins\Yadif\yadif.dll")
Yadif()
Crop(8,6,-20,-6)
BicubicResize(640,480,0,0.5)
tweak(bright=15,cont=0.85,sat=1.03)
final medium bitrate about 1350 kbps

about 45% cpu utilization during encoding

speed 105-110 fps, that is 24'-25' per clip

If I run another task of staxrip to saturate cpu, I can encode 2 clips in the same time, so it is how to hit the equivalent of about 200 fps.
------

second test

Quote:
MPEG2Source("F:\wisport\VIDEO_TS\VTS_01_1 temp files\VTS_01_1.d2v", CPU=6)
LoadCplugin("D:\Program Files (x86)\StaxRip\Applications\AviSynth plugins\Yadif\yadif.dll")
Yadif()
Crop(8,6,-20,-6)
BicubicResize(640,480,0,0.5)
VagueDenoiser(threshold=3, chromaT=3)
tweak(bright=15,cont=0.85,sat=1.03)
final medium bitrate about 1250 kbps **

about 20-25% cpu utilization during encoding

speed 41-42 fps, that is about 1h per clip

If I run 3 more tasks of staxrip to saturate cpu I can encode almost 4 clips in the same time, so it is how to hit the equivalent of about 150 fps.

Hope it is more clear,

ciao!!


** I often need to use or to add other denoisers to get a more consistent impact on the compressibility (but off course encoding time grows...)
Attached Files
File Type: rar test.rar (1.6 KB, 8 views)
__________________
P6T| 12Gb DDR3| i7-920 D0@3.8Ghz@1.168v+NOCTUA NH-U12P| Liberty 620| Stacker|2xRAID0 (4x500gb) Seag.72k.11| Seag. 500gb 72k.12| Seag. 1.5Tb 72k.11| GTX470@668/1843| U2410 24"| Se7en x64 on VRapt 300gb

Last edited by Gannjunior; 24th October 2010 at 02:43.
Gannjunior is offline   Reply With Quote
Old 24th October 2010, 14:48   #17  |  Link
nm
Registered User
 
Join Date: Mar 2005
Location: Finland
Posts: 2,643
Ok, since you need to deinterlace and sometimes denoise, DGDecNV should help with both speed and deinterlacing quality (compared to yadif).

I don't know how many parallel processes you can run with it though, if a single encode is still not fast enough to saturate your CPU. I also don't know if it exposes denoising and sharpening controls.
nm is offline   Reply With Quote
Old 24th October 2010, 22:43   #18  |  Link
Didée
Registered User
 
Join Date: Apr 2002
Location: Germany
Posts: 5,394
You are bottlenecked by the performance of the source filter. Obviously, you can not encode faster than you can decode the source.
Specifically, it's the postprocessing that sucks lots of performance. Without postprocessing, decoding is so much faster.

Alas, it seems multithreading is not a possibility to boost performance. I tried MTSource() for mpeg2source, performance was worse. For ffms2, the internal multithreading gives a big boost when not using pp ... but with postprocessing, there's not much difference. With pp, mpeg2source was faster than ffms2, anyway.
DGDecodeNV does not offer postprocessing at all. And without postprocessing, though it's quite fast, the competitors are even faster.


The numbers for decoding speed:

(PAL DVD source, i7-860, NV GT240)

Code:
postprocessing    off       H+V+dering
--------------------------------------
mpeg2source       460 fps   170 fps
DGDecodeNV        396 fps   -------
ffmpegsource 1T   495 fps  (125 fps)  (singlethread + postprocessing = crashes VDub )
ffmpegsource 4T   777 fps   131 fps

Bottom line: your wished "dramatical" performance increase is definetly possible ... but only if you do *not* use postprocessing.
__________________
- We´re at the beginning of the end of mankind´s childhood -

My little flickr gallery. (Yes indeed, I do have hobbies other than digital video!)
Didée is offline   Reply With Quote
Old 24th October 2010, 23:00   #19  |  Link
nm
Registered User
 
Join Date: Mar 2005
Location: Finland
Posts: 2,643
Quote:
Originally Posted by Didée View Post
You are bottlenecked by the performance of the source filter. Obviously, you can not encode faster than you can decode the source.
Specifically, it's the postprocessing that sucks lots of performance. Without postprocessing, decoding is so much faster.
I think he would also need to split Yadif and any denoisers he's using to multiple threads?
nm is offline   Reply With Quote
Old 24th October 2010, 23:24   #20  |  Link
Didée
Registered User
 
Join Date: Apr 2002
Location: Germany
Posts: 5,394
Yes, probably. I just wanted to point out the influence of source filters with internal deblocking/postprocessing. Not much point going into mutlithreading and searching for "faster filters", when the source filter happens to be the main bottleneck.

OTOH, when it's about converting a huge pile of DVDs anyway, then the easiest way is to run multiple encodes in parallel. When the queue contains dozens or hundreds of jobs, why jump through loops trying to make a single job run faster, when you simply can run several jobs in parallel. Of course you need sufficient ressources (RAM) for that ... but seeing Gannjunior has 12GB available, that shouldn't be a problem.
__________________
- We´re at the beginning of the end of mankind´s childhood -

My little flickr gallery. (Yes indeed, I do have hobbies other than digital video!)
Didée is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 05:50.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, vBulletin Solutions Inc.