View Full Version : What settings get used/ignored/disabled on 1st pass?
plugh
7th October 2006, 15:33
Title pretty much says it all...
FAQ seem to gloss over this area - for example,
By default XviD switches certain options off in the first pass to speed it up a bit. The options that are switched off are not really necessary for a 'normal' first pass (normal as in: you don't keep it afterwards) and turning them off can increase encoding speed considerably.
So what *is* turned off vs used on the first pass?
I'm using 1.1.0-final; have 'discard' UNchecked and 'Full quality' UNchecked, and have observed something about playback of the result I'm trying to understand.
It is a proper video file, and you can play it but it is not recommended that you keep it. You should think of it as no more than a rough approximation of the end result and even though you can play it, it might not be MPEG-4 compliant
So what *is* differant / possibly non-compliant about it?
Thanks!
foxyshadis
7th October 2006, 18:38
Fastest motion estimation, trellis off, a couple other things off, constant q2. GMC (?), QPel, B-frames are all the same. The "not mpeg-4 compliant" statement on AMV.org was outdated a long time ago, I think, it's basically just a really big xvid file right now.
plugh
7th October 2006, 19:43
Fastest motion estimation, trellis off, a couple other things off, constant q2.
Please expand...
Motion search precision = ?
VHQ mode = 1?
Use VHQ for bframes = off?
Use chroma motion = off?
Turbo = ?
And is it reasonable to conclude that the first pass encode imposes less 'motion compensation' processing at decode time?
Thanks!
sysKin
7th October 2006, 19:51
GMC (?), QPel, B-frames are all the same.
b-frmes yes but gmc and qpel are off.
The "not mpeg-4 compliant" statement on AMV.org was outdated a long time ago, I think, it's basically just a really big xvid file right now.
Unless it uses quant zones. Then, settings go back to originals at the zone boundary, which creates noncompliant file if GMC or qpel suddenly kick in.
@plugh: internal XviD settings don't correspond to GUI settings very well, so it's not easy to answer your question. But: motion precision 1-4 (it's all the same), vhq 0, turbo or most of turbo (turbo is several flags put together), bvhq off, chroma motion off, trellis off. Also qpel off and gmc off. And high precision AC/DC prediction off, something you can't do with VfW gui.
http://cvs.xvid.org/cvs/chora/co.php/xvidcore/src/plugins/plugin_2pass1.c?login=2&r=1.3
plugh
7th October 2006, 21:06
@plugh: internal XviD settings don't correspond to GUI settings very well, so it's not easy to answer your question. http://cvs.xvid.org/cvs/chora/co.php/xvidcore/src/plugins/plugin_2pass1.c?login=2&r=1.3[/url]
I figured as much - thanks for the pointer.
The observation I referred to...
I've been doing some "HD" encodes targetted at xbox / xbmc, and I observed that *in certain respects* the 1st pass output seemed to impose less decoding load than the second pass output. From observation, my guess was that in some fashion the 1st pass file incurred less 'motion related' processing (note: my max bframes=1, vhq=4, etc).
Building on that hunch, the thought occurred to me that I might trade off *some* compression efficiency for reduced cpu load during decode - if I could find the right knob to twist, and the cost / benefit ratio was attractive. ie say an average 100kb/s bitrate increase for a 10% reduction in decoder cpu load (arbitrary figures, but you should get the drift).
Is 'motion search precision' that knob? VHQ? ...?
Didée
7th October 2006, 21:23
The knobs are QPel and GMC. Mostly these, the rest is minor.
plugh
7th October 2006, 21:47
qpel and gmc are already off - they definitely incurred to much decoder cpu load.
Dropping max consecutive bframes down to 1 seemed to trade off some increase in bitrate for reduced decoding load (at an acceptable cost/benefit ratio). Setting to zero was unacceptable cost/benefit.
My understanding is that if motion search is unsuccessful, a given macroblock will be encoded less efficiently (more like I-frame) thus trading off space for more computationally intense motion vectors application.
If so, are motion search settings a 'blunt instrument' or a 'fine tuning' option to control this trade-off? And would the cost (size) vs benefit (cpu load) ratio be 'attractive'?
Thanks!
sysKin
8th October 2006, 15:09
*in certain respects* the 1st pass output seemed to impose less decoding load than the second pass output
I think that was just a noise in your measurement. Overall, there's no way to decrease CPU load this way - should be exactly contrary, the higher the bitrate the higher cpu load.
Limiting motion estimation precision will INCREASE this load as well.
plugh
8th October 2006, 17:02
Overall, there's no way to decrease CPU load this way - should be exactly contrary, the higher the bitrate the higher cpu load.
If all other things are held constant, I agree with you. However, as a gross example, if I encode using I-frames exclusively, I increase bitrate but decrease decoder cpu load - relative to the same source encoded using P and B frames as well. Not all bits are equal - some present a more demanding decoder load than others.
That's what I was wondering - does the motion search precision 'knob' allow me to trade off {a greater quantity of significantly less computationally demanding bits} for {a smaller quantity of very computationally demanding bits}.
Limiting motion estimation precision will INCREASE this load as well.
OK, thanks for the input...
sysKin
9th October 2006, 10:24
That's what I was wondering - does the motion search precision 'knob' allow me to trade off {a greater quantity of significantly less computationally demanding bits} for {a smaller quantity of very computationally demanding bits}.
There's a couple of knobs that *could* be there but aren't. The only one I can think of (that's available from gui) is cartoon mode.
plugh
9th October 2006, 16:27
I am still very much in learning mode, and I appreciate you taking the time to answer my questions.
My next question is a bit more technical, but relates to some other 'knobs' that I am experimenting with by patching constants in the xvidvfw.dll.
I am referencing this http://www.m4if.org/resources/profiles/index.php
in particular table A.3 therein, against the profile_t structure and associated data declarations in the vfw gui dll.
For reference (edited extracts):/* default vbv_occupancy is (64/170)*vbv_buffer_size */
/* name p@l w h fps obj Tvmv vmv vcv ac% vbv pkt max_bps vbv_peak dbf flags */
{ "asp5", 0xf5, 720, 576, 30, 4, 4860, 1620, 48600, 25, 112*16368, 16384, 8000000, 0, -1, PROFILE_AS }
{ "dxnhdtv", 0x00, 1280, 720, 30, 1,10800, 3600, 108000, 100, 768*8192, -1, 9708400, 16000000, 2, PROFILE_4MV|PROFILE_ADAPTQUANT|PROFILE_BVOP|PROFILE_INTERLACE|PROFILE_DXN }typedef struct
{
char * name;
p@l int id; /* mpeg-4 profile id; iso/iec 14496-2:2001 table G-1 */
w int width;
h int height;
fps int fps;
obj int max_objects;
Tvmv int total_vmv_buffer_sz; /* macroblock memory; when BVOPS=false, vmv = 2*vcv; when BVOPS=true, vmv = 3*vcv*/
vmv int max_vmv_buffer_sz; /* max macroblocks per vop */
vcv int vcv_decoder_rate; /* macroblocks decoded per second */
ac% int max_acpred_mbs; /* percentage */
vbv int max_vbv_size; /* max vbv size (bits) 16368 bits */
pkt int max_video_packet_length; /* bits */
max_bps int max_bitrate; /* bits per second */
vbv_peakint vbv_peakrate; /* max bits over anyone second period; 0=don't care */
dbf int xvid_max_bframes; /* xvid: max consecutive bframes */
unsigned int flags;
} profile_t;
To start off a very simple question:
Table A.3 says the VBV values are in units of 16384 bits.
The asp5 entry above uses a multiplier of 16368 ie 112*16368 vs Table A.3 says 112 (*16384)
Conversely, the dxnhdtv profile value IS a multiple of 16384
Am I spotting a typo, or is there something more subtle here...
Next question is more involved, but basically it boils down to matching columns in the data against columns in Table A.3, and wondering which are actually USED by xvid 1.1.0-final.
For example, Is the 'max_objects' parameter enforced?
Further, to what extent are the models described in that document representative of the implimentation in xvid?
So far, my research would indicate that the vmv and vcv are not used / enforced, but vbv is in two pass mode. Also, the dxn 'total over one second' is not part of the standard profiles, and I remember reading someplace that xvid actually impliments 'total over three seconds' (but that may be out of date info).
For general info, I'm trying to characterize / parameterize a particular target platform (an xbox running xbmc, which uses mplayer and the lav codec for playback). The dxn "hdtv" profile has proven effective at avoiding dropped frames for my HD'ish encodes, but I suspect that I could actually use higher values for some of those parameters - I see some pretty heavy 'squashing' of frame sizes (with significantly higher quants) during some difficult segments, more so than I suspect is actually needed.
At the moment, I'm experimenting with increasing the 'total over one second' value, but if I go much further I'm going to need to know which values in that dataset are actually used, and a better understanding of interdependancies among them.
Thanks again for taking the time...
plugh
10th October 2006, 04:23
Hmmm... tried a couple movie encodes with differant values of 'vbv_peakrate', and saw no effect.
Don't know if I'm looking at the right module revisions but...
http://cvs.xvid.org/cvs/chora/co.php/xvidcore/vfw/src/codec.c?login=2
revision 1.21, routine compress_begin, has the following
// XXX: xvidcore current provides a "peak bits over 3secs" constraint.
// according to the latest dxn literature, a 1sec constraint is now used
pass2.vbv_peakrate = profiles[codec->config.profile].vbv_peakrate * 3;
http://cvs.xvid.org/cvs/chora/co.php/xvidcore/src/plugins/plugin_2pass2.c?login=2
revision 1.7, routine check_curve_for_vbv_compliancy
if (peakrate>0.f && 8.f*bytes3s > 3*peakrate)
return(VBV_PEAKRATE);
which looks to me like the value in the profile table is multiplied by three in the vfw gui, and THIS value is then multiplied by three again in the compliancy check...
I'm probably missing something; there is a lot of code in between those two routines...
plugh
26th October 2006, 17:25
*in certain respects* the 1st pass output seemed to impose less decoding load than the second pass output
I think that was just a noise in your measurement. Overall, there's no way to decrease CPU load this way - should be exactly contrary, the higher the bitrate the higher cpu load.
Limiting motion estimation precision will INCREASE this load as well.
I've been poking at this some more (for example a 'normal' 1st pass file vs a 'full quality' 1st pass file), and it is starting to look like XVID_VOP_INTER4V (controlled by, among other things, the PROFILE_4MV flag) may be the culpret.
Does it make sense that turning this flag off would produce an encode that requires less cpu load during decode (at the same bitrate etc)?
Given I'm doing HD encodes (lots of macroblocks), what are the other ramifications of disabling this? Significantly lower quality? Significantly higher bitrate?
Thanks!
foxyshadis
26th October 2006, 19:28
Inter4V/4MV? Isn't that QPel? If it is, then it would make sense, since QPel does most calculations on images double the size of the hpel normally used. It sometimes makes a large quality difference, most of the time it's small but just noticeable, and occasionally (mostly low bitrate) hurts. It's also generally standalone-incompatible because of the high cpu usage.
plugh
26th October 2006, 20:24
From what I've been able to find online, this flag enables motion vectors for each of the 8x8 blocks within a 16x16 macroblock. Not sure, but I don't think it's qpel related...
Manao
26th October 2006, 20:27
No, Inter4V is the possibility to use 4 motion vectors per macroblocks instead on one. It's standalone-compatible.
plugh
26th October 2006, 21:41
No, Inter4V is the possibility to use 4 motion vectors per macroblocks instead on one. It's standalone-compatible.
Yeah, that's what I thought.
From what I can gather from scanning sources, a 'normal' first pass disables this flag. It is set if profile_4mv flag is on and motion search precision is greater than 4.
And my question isn't compatability (target isn't a standalone) but cpu load 'cost'. Logically, it should 'cost' four times as much cpu for motion processing, compared to having the flag off, since it quadruples the number of motion vectors. (?)
Which might explain my observation...
Manao
26th October 2006, 21:45
With one motion vector, you move a 16x16 block along the motion vector. With four, you move a 8x8 block along each motion vectors. So it's not 4 times slower. However, it's slower ( there's an overhead per move, and you need to average the 4 motion vectors to compute the vector for chroma ). So I'd say it's slower, though marginally.
Coding efficiency-wise, the bigger the resolution, the less usefull inter4v is.
plugh
26th October 2006, 22:57
That just might be the missing piece of my puzzle...
The vbv params were allowing me some control over the data flow, not overloading the target cpu with sequences of large frames. But it was apparent that even so, there were some cases, particularly during high motion / busy scenes, that loaded the cpu more than it could handle, resulting in frame drops. What puzzled me was that the not-full-quality first pass output files, even without the benefit of 2nd pass vbv adjustment, exhibited fewer frame drops during those particular scenes.
Technically, I wonder if there is also a cache / memory bandwidth element to it. The xbox has a 733MHz P3 with half the normal cache (like a celeron) but with a 133MHz FSB. However, memory is shared with the video subsystem. A single move of the larger block would make better use of the cache while dealing with GPU memory contention ??
Or it could simply be the way the libavcodec decoder handles this case, I guess.
Off to do some more tests...
plugh
27th October 2006, 20:35
Just finished some tests, and it appears that this flag IS related to the difference I observed in decoder cpu load on my target.
I cleared profile_4mv in my profile, reran the second pass (keeping all other settings the same) and playback got through the 'difficult' scenes more easily. I didn't notice any visible difference in image quality, however average quant for the 1.5 hour movie increased from 2.94 to 2.95. Looking at the quant distribution through some of these difficult scenes, there seemed to be a slightly greater spread; a few more frames at higher AND lower quants, a few less at the intermediate quants. At least one of the scenes was being 'squeezed' significantly by the vbv scaler, so not sure how to interpret this...
Anyway, thanks for the help people...
vBulletin® v3.8.4, Copyright ©2000-2009, Jelsoft Enterprises Ltd.