PDA

View Full Version : Trying to understand inter prediction in PAFF mode


ArnoF
26th June 2007, 16:31
Hi everybody!

I am just implementing PAFF support into a H.264 decoder as my Master's thesis (no, sorry, it's not x264), but I am having trouble understanding how the inter prediction works when fields and frames are mixed in a video stream. Judging from the DPB code in the JM reference decoder, it seems like frames are split into its two fields so that these can be used later for decoding frames and consequently a complementary field pair is combined into a full frame so that this can be used for decoding frames later.

However, looking at the standard in chapter 8.4.1, this assumption seems to be wrong as it talks about Frm_To_Fld modes and such, so that apparently a single fields is still used when these are referenced from a frame. I do not fully understand how this works, as obviously fields only have half the vertical resolution as frames.

Could somebody clarify how this works? It would help me understanding the standard a lot more.

Thank you for your time!

Regards
Arno

akupenguin
26th June 2007, 18:29
Judging from the DPB code in the JM reference decoder, it seems like frames are split into its two fields so that these can be used later for decoding frames and consequently a complementary field pair is combined into a full frame so that this can be used for decoding frames later.
Yes. As usual, JM implements it inefficiently. You don't have to actually merge or split anything, just always store a field pair interleaved in a frame buffer, and you can access it as either frame or field with a little pointer manipulation.

However, looking at the standard in chapter 8.4.1, this assumption seems to be wrong as it talks about Frm_To_Fld modes and such, so that apparently a single fields is still used when these are referenced from a frame.
8.4.1 deals with deriving motion vectors. Frm_To_Fld applies there. But once you've derived the motion vectors and start interpolating pixels, then you no longer care which mode the reference picture was coded with.

ArnoF
27th June 2007, 13:20
Yes. As usual, JM implements it inefficiently. You don't have to actually merge or split anything, just always store a field pair interleaved in a frame buffer, and you can access it as either frame or field with a little pointer manipulation.
I understand that this is the case. However I am still wondering how the standard deals with the combination of frames and fields:
Let's assume we are in a frame A that references a complementary field pair B with the fields B1 and B2. Does the inter prediction only reference field B1 with the missing lines interpolated from the lines above and below in field B1 or does it reference B1 and B2 together (interleaved as usual)?

Consequently, when I am in a field C that references a frame D (with fields D1 and D2) and I have a reference in the motion compensation, does that reference either D1 or D2 or does it reference D as a whole?

I hope you understand my question. I really appreciate your help.

akupenguin
27th June 2007, 16:03
frame A is predicted from frame B.
field C1 is predicted from field D1.

ArnoF
27th June 2007, 16:36
Okay, that is what I originally thought. 8.4.1 confused me a little bit though. ;)
Thank you!

Sergey A. Sablin
28th June 2007, 07:13
Okay, that is what I originally thought. 8.4.1 confused me a little bit though. ;)
Thank you!

just a small correction - any frame refers only frames (and all frames stored in DPB with respect to max num refs in listX), any field refers only fields (same applies here but for fields).
Thus C1 may be predicted both from D1 and D2 and any other field from DPB.