P4x4 and x264 level compliance... [Archive]

View Full Version : P4x4 and x264 level compliance...

Quarkboy

12th May 2007, 21:32

This is really a question for akupenguin, but if anyone else knows the answer, feel free.

In a previous post in a thread somewhere, it was mentioned by akupenguin that x264 doesn't produce level compliant streams for level >3 if the p4x4 macroblock analysis is set. After pouring over the H.264 specifications to figure out why, I saw that it was because (Table A.4 in ITU-T Rec. H.264 (03/2005)): In High Profile, for level >3, MinLumaBiPredSize is 8x8.

But I'm having trouble figuring out why P4x4 encoding would effect this... minlumabipredsize being 8x8 means that (Table 7-18) sub_MB types for B macroblocks can't be B_Bi_8x4, B_Bi_4x8, or B_Bi_4x4... in other words, the minimum Bi-predicted block size must be 8. BUT, since (Table 7-17) for P type macroblocks the only sub_block types you can get are pred, not bi_pred.... So how does a restriction on MinLumaBiPredSize effect P4x4?

It seems like something that needs to be addressed, what with all these new hardware stuff coming out that truly enforces level compliance.

It would be a really nice feature if x264 automatically assigned the level based on a final analysis of the encoded stream to the minimum compliant level by default.

akupenguin

12th May 2007, 22:26

It's not MinLumaBiPredSize, since I never implemented B4x4 blocks. The limit is MaxMvsPer2Mb.

Quarkboy

12th May 2007, 23:06

Okay, so then continuing to inspect the specs, tell me if I'm understanding correctly:

With a MaxMVcount of 16, if you have a P4x4 encoded P frame, you have 4*4 = 16 subMBs to get motion vectors for.

According to the decoding specs, each of these sub macro blocks has a subMVcnt which is equal to 0,1, or 2, depending on what prediction mode it is encoded in. So, in general, for a particular 4x4 block to be allowed it would need to force the total of the 16 subMBcnts to be <= 16 ?

I can see why you didn't implement this, as it would require either a per macroblock optimization routine for all P4x4 macroblocks, or simply clamping all of the subMBcnts to 1 by not using the predFlagL0 and predFlagL1 modes simultaneously.

As a practical matter, can you say anything about whether a normal encoded file ends up breaking this limit as a common occurrence?

Thanks a lot for answering my questions. I'm not sure if they are annoyingly specific for you, or actually interesting.

And any reason other than time for not implementing B4x4? I can imagine it's a process of limited returns.

akupenguin

12th May 2007, 23:58

predFlagL1 can't be used in P-frames. But MaxMvsPer2Mb applies to any 2 consecutive MBs (hence the name), which have a total of up to 32 partitions.

In normal encoding... it depends on bitrate. A test encode violated MaxMvsPer2Mb 650 times in 2500 frames at crf=18, or 120 times at crf=24, or 8 times at crf=30.

As a practical matter, the purpose of MaxMvsPer2Mb is to limit the cpu-time needed to decode (since more separate invocations of the motion compensation functions takes more time). So even though it's not defined in the standard, what really matters is MaxMvsPerFrame. And that's not likely to be a problem when P4x4 is used in only 0.5% of macroblocks.

I haven't implemented B4x4 due to programmer time and cpu time. I assume it would provide even less compression gain than P4x4, since B-frames use larger partitions on average than P-frames, and since there are already more alternative block types in B-frames.

Quarkboy

13th May 2007, 00:14

Aha, yes. That makes sense, I was wondering how a P frame could have 2 motion vectors per sub block. The answer is, it can't :).

So in the end, the restriction is probably nothing to worry about in most real-world encodes.

For the purpose of hardware playback of x264 encodes, then, it would be no problem to mark them as Level 4.1, say, even if P4x4 was used in this violating manner.

P.S. I've been procrastinating writing my phd thesis on string theory by decoding the H.264 specs. I'm such a freak.

P.P.S. Then, if the 4x4 sub partitioning is only used in 0.5% of the macroblocks anyway, would it really be such a performance hit to actually conform to the standard in the encoder? You'd only need to check the total number of mv's calculated in the previous block and cap the pred type in the current block if you go over the limit. It'd degrade quality, slightly, but you'd be level compliant without TOO much trouble.

akupenguin

13th May 2007, 04:07

It wouldn't be a speed hit even if every block used P4x4. I just don't like the number of lines of code it would take to "cap the pred type".

Quarkboy

13th May 2007, 05:04

It wouldn't be a speed hit even if every block used P4x4. I just don't like the number of lines of code it would take to "cap the pred type".

Hmm, you are really tempting me to put on my C-hacker hat from 5 years ago and try and make a patch. But thesis comes first!

While I'm at it, are there any other places where x264 doesn't check for violating the level constraints?

akupenguin

13th May 2007, 05:15

Interlacing.

SliceRate, because I don't even know what it means. Actually, x264 currently complies with SliceRate because it doesn't use multislice at all, but when/if slice support is re-added (and there's a patch for that on the mailinglist), it will return to being maybe non-compliant.

MinCompressionRatio, but you're very unlikely to violate that one unless you set QP=1.

And VBV warns in two stages: it warns if you ask for a VBV that doesn't fit in your level, and it warns during the encode if x264 didn't manage to respect the VBV you asked for. But if you don't enable VBV at all, then it doesn't check whether the resulting bit distribution complies with the implicit max VBV for your level.

Zero1

13th May 2007, 06:44

It seems like something that needs to be addressed, what with all these new hardware stuff coming out that truly enforces level compliance.

It would be a really nice feature if x264 automatically assigned the level based on a final analysis of the encoded stream to the minimum compliant level by default.
Or perhaps a couple of new options, --enforce-level and --enforce-profile which would check your command line and warn you of, or disable any options that would technically produce non-compliant decodes as best it can. Obviously it cannot be done so easily for the VBV stuff, but it would be a help if we could set an option to stick to the spec as close as possible; then again you may disagree and say that this isn't the job of the software, but something the end user should be aware of (which I also agree on to be fair).

Actually, x264 currently complies with SliceRate because it doesn't use multislice at all, but when/if slice support is re-added (and there's a patch for that on the mailinglist), it will return to being maybe non-compliant.What is the advantage of using slices over the current threading model, does it scale more efficiently for more CPUs/cores?

If I remember, sliced threading was not as fast and incurred more of a quality hit. Also isn't there some level of support required support decoder side?

Also, a little bit OT; but did anything ever come of lookahead or was it simply too computationally complex?

Thanks

akupenguin

13th May 2007, 07:41

What is the advantage of using slices over the current threading model, does it scale more efficiently for more CPUs/cores?
Slices provide no benefit for threading. People want it back for error resilience.

Also isn't there some level of support required support decoder side?
Slices are visible in the bitstream, so it's theoretically possible for a decoder to not support them. But I wouldn't think to ask about slice support, it's in all of the profiles and it's used by most hardware encoders. Then again, I would have said the same about mixed references, and Apple managed to bork that.

did anything ever come of lookahead or was it simply too computationally complex?
That depends whether you call 250 times slower than HQ-Insane "too complex" ;)