YV12 chroma placement

Guest · 28th April 2003, 05:48

Be very careful about averaging! You can make things worse. let's discuss it here before you change it. The 75%-25% weighting is arguably an error. I believe simple copying is fine. Do it as shown here:

http://forum.doom9.org/showthread.ph...236#post297236

[EDITED to remove inflammatory language.]

sh0dan · 28th April 2003, 08:53

Quote:

The 75%-25% weighting is an ERROR made by people who don't know what they are doing.

Be careful what you proclaim! Averaging is needed for proper choma placement and it helps stepping - The dilemma is the same as with RGB<->YUY2.

Averaging chroma gives visible better results than copying - especially when dealing with interlaced sources. Yes - chroma will get slightly blurred, but there is no easy way around that. 25/75 is not 100% accurate, but the difference to a 20/80 is so extremely small that it's not worth the relatively big speed penalty it'll give.

We could implement a "copy-mode" conversion routine. Just as the first versions of the converters were, and have them as ConvertBackToYUY2. An "interpolate" flag would probably be the best solution for this.

Note - the luma lines are lying between the black lines, in the following YV12 -> YUY2 illustration (The black lines illustrate the separation of two luma lines):

Si · 28th April 2003, 22:06

@sh0dan
I'm sure your diagrams mean something to you but to us ...

(esp the bottom pair)

regards
Simon

TheRealMoh · 29th April 2003, 02:04

@neuron2:

My code works exactly as your illustration in the other thread.

@sh0dan

I understand your diagrams, but why do you say it's 80/20? According to my calculations, it's 75/25.

Here is the calc for non-interlaced (for interlaced, I always think of it as separatefields().convert().weave()):
Suppose you start with a YUV picture. Consider the first four rows, which have V values V1, V2, V3, V4. When these values are converted to YV12, there are only two rows left: newV1 = (V1+V2)/2 and newV2 = (V3+V4)/2.

Let's make a simplifying assumption: the progression of pixels values is linear. Linear progression is basis of all averaging, so it's not a *terrible* assumption. In other words, the orginal 4 pixels are really:
V1 = V
V2 = V + d
V3 = V + 2 d
V4 = V + 3 d

where 'd' is the constant delta between the values. The YV12 values are therefore: newV1 = (2 V + d)/2 and newV2 = (2 V + 5 d)/2.

The point of this exercise is reconstruct V2. Notice that
newV2-newV1 = 2 d
V2 is then (newV2 - newV1)/4 + newV1 = (newV2 + 3*newV1)/4
In other words, a 75/25 ratio.

Confused yet

At any rate, the 25% doesn't seem so important. If anyone really wants it that way, they can just load up the avi using avisynth and use its conversion routines.

trbarry · 29th April 2003, 05:14

Quote:

V2 is then (newV2 - newV1)/4 + newV1 = (newV2 + 3*newV1)/4

I believe that is correct. And even nicer, since there is a pavgb assembler instruction that averages:

V2 = pavgb(newV1,pavgb(newV2,newV1))

or

Code:

  movq    mm0, newV2    // eight 1-byte newV2's
  pavgb   mm0, newV1
  psubusb mm0, AllOnes  // adjust for repeated rounding bias
  pavgb   mm0, newV1    
  movq    V2, mm0

- Tom

sh0dan · 29th April 2003, 08:30

@neuron: Glad we agree - I just had the 80/20 from some page in the "Please explain this...."-thread, but they also recommended the 25/75 for speed. Don't know where they got their result from though.

@trbarry: I use pavgb now - but I can't see why you would subtract 1? Just using pavgb twice should produce correct rounding.

Guest · 29th April 2003, 12:57

Quote:

Originally posted by sh0dan
@neuron: Glad we agree - I just had the 80/20 from some page in the "Please explain this...."-thread, but they also recommended the 25/75 for speed. Don't know where they got their result from though.

Who said I agree? This whole averaging business is silly.

I'll throw the onus on you. Justify it. Then I'll refute you.

trbarry · 29th April 2003, 14:00

Quote:

@trbarry: I use pavgb now - but I can't see why you would subtract 1? Just using pavgb twice should produce correct rounding.

There is a white paper somewhere on the Intel site that discusses this issue but today I couldn't find the darn thing. I'll try to reconstruct as I see it.

Say you wanted the average of 4 values a,b,c,d. First you take the avg(a,b). 50% of the time the sum a+b will be even and after adding 1 and dividing by 2 the answer will be the same as if you did it with real numbers. Error = 0

But the other 50% of the time a+b will be an odd number. After you add 1 and divide by 2 your answer will be too high by 1/2 over the real number answer. Error = +1/2.

So over the long run with random input there is a .25 upward bias by rounding up this way. And if both avg(a,b) and avg(c,d) are too high by .25 then pavg( pavgb(a,b), pavg(c,d)) is too high by a expected average of .5 after the second round of averaging.

But when you use pavgb if you first subtract 1 from one of the numbers then you are subracting .5 from the answer. Combined with the .25 upward bias of the instruction this means a .25 downward bias, or the same as rounding downward instead of up. So every second time when repeating pavgb combinations on the same values it's more accurate to round down this way.

Note in my example from the post above the second pavgb was applied not to the result of 2 previous pavgb instructions but only to one and one unaveraged number. Thus my final result was actually low by an expected 1/8, but still better than high by 3/8.

- Tom

sh0dan · 29th April 2003, 14:36

@neuron:

Please - I don't want to get into that discussion again. Look at:

http://www.hometheaterhifi.com/volum...ug-4-2001.html See "Encoding 4:4:4 to 4:2:0" and "Converting 4:2:0 Back to 4:2:2 or 4:4:4". Also look at the Last pages of the Sticky "Video Frame Properties" threads.

In short, there are pros and cons of both. I get VISIBLY better results on chroma intensive material using interpolation.

@trbarry:

Did a small test program (accumulating all errors between the two routines and float values), and it seems like you (and Intel) are right - there is slightly less bias, when subtracting one before the second average. Except for the slight memory read pentalty, there is no problem implementing it into the current conversion routines.

No manual rounding: Average bias: +0.3750
Manual rounding: Average Bias: -0.1250

Well spotted!

sh0dan · 29th April 2003, 16:24

I implemented the new testing, and made a minor adjustment to progressive YV12 -> YUY2 upsampling.

PSNR is about 55-60dB even after 5 conversions back and forth.

Guest · 29th April 2003, 16:33

@sh0dan

I know all about those links. If you're not interested in having your dogmas challenged so be it.

Still for the record... Sampling 80-20 or 75-25 when doing interlaced YV12 chroma subsampling is just wrong. That is not to say that people don't do it. But more and more people are realizing how wrong it is and they're stopping it. Whether our upsampling should allow for clips done the wrong way as well as the right way (possibly via an option) is another matter.

sh0dan · 29th April 2003, 16:38

Quote:

That is not to say that people don't do it. But more and more people are realizing how wrong it is and they're stopping it.

Then please provide us with references. Right now we are getting nowhere with this discussion.

Guest · 29th April 2003, 17:39

Quote:

Originally posted by sh0dan
Right now we are getting nowhere with this discussion.

I'm simply raising the point that there are different subsampling schemes used in the real world.

Unfortunately my references are STMicroelectronics and customer proprietary information. I will try to find public-domain citations for you. I remember seeing one and I'll try to find the link again.

You can do what you want but you have to acknowledge that there are 50-50 samplers out there. If the upsampler always assumes 80-20, that is wrong. Should it be configurable?

sh0dan · 29th April 2003, 18:18

Quote:

Originally posted by neuron2
You can do what you want but you have to acknowledge that there are 50-50 samplers out there. If the upsampler always assumes 80-20, that is wrong. Should it be configurable?

Yes - there are 50/50 upsamplers out there. There are also 100/0 upsamplers outthere. We can also agree that there probably is a lot of YV12 (or I420) material out there, that has been created with slightly displaced chroma.

IMO however AviSynth should assume that material is delviered as according to the specs. In the specs, there is no doubt that YV12 and I420 have chroma placed between luma lines. Therefore a 25/75 is the correct upsampling method for this.
If chroma is NOT placed where it should, it is not the job of the converter to correct this.

Can we agree on some of the above?

PS. I feel this is becoming one of the "arguments for the sake of arguments" discussion. Kinda like Hifi-nerds discussing gold-cables and green markers on their CD's.

MfA · 29th April 2003, 19:09

Where does this 80:20 mix come from BTW? H.263 specifies a shift of 1/4 pel, so I dont see why the 75:25 mix is termed an approximation (strangely enough the MPEG4 standard while showing a vertical shift does not specify the exact sampling position of chroma samples, but given the backward compatibility to h.263 we can take that as saying MPEG4 uses the 1/4 pel shift too).

sh0dan · 29th April 2003, 19:17

Quote:

Where does this 80:20 mix come from BTW?

Probably some strange place in my mind

Not sure where I got it from - just ignore it, and think 25/75 instead

MfA · 29th April 2003, 19:37

I think it is all MPEG's fault, at least ITU saw the ludicrousness of leaving something this fundamental undefined ... so they just put the quarter pel shift in an annex.

AFAICS MPEG not only forgot to define it in their MPEG-2 standard (beyond saying and showing it is neither halfway between luma samples nor co-sited) but has been trying to pretend for all these years that simply no problem exists by not acknowledging it ... leaving the industry and us to sort out the mess.

Marco

PS. does anyone actually understand MPEG's justification for the positioning BTW?

Quote:

In each field of an interlaced frame, the chrominance samples do not lie (vertically) mid way between the luminance samples of the field, this is so that the spatial location of the chrominance samples in the frame is the same whether the frame is represented as a single frame-picture or two field-pictures.

I just dont see it.

PPS. on second thought, MPEG probably didnt forget to define it ... it probably was just as controversial a topic at the time as it is now, and they didnt have the guts to make a decision

PPPS. looking around I see that Philips uses vertical chrominance filtering in their encoder ICs for 4:2:2 to 4:2:0 conversion, which assumes neither a co-sited nor a half-way sampling position (although the exact position they assume cannot be deduced from the datasheet). Also the MSSG MPEG-2 encoder based on TM5 (an unofficial document where MPEG did specify the sampling position and interpolation) uses interpolation for 4:2:2 to 4:2:0 conversion which assumes a 1/4 pel shift. I think a statement that both them, Sh0dan and a lot of other people are wrong based on some internal info from STM is silly ... there is no absolute right or wrong here. It is non normative for MPEG-2 (for MPEG-4 our friends at ITU have thankfully forced through a usefull standard ... and for it 75:25 interpolation is more right than 100:0 or 50:50).

MfA · 29th April 2003, 21:22

It would be nice to have a util which used correlation of edges between luminance and chrominance planes to find what kind of interpolation has been used on existing 4:2:0 sources.

BTW, the arguement that everyone should do it wrong because the majority does it wrong is a reasonable one ... if MPEG leaves too many things in the standard open-ended it is up to the industry to reach a consensus. In this case the consensus does not seem to have been reached entirely, so to me it seems best to leave it up to the user. Philips seems to disagree with STM, as does the closest thing to MPEG-2 reference software which is available in the open (the decoder also assumes 1/4 pel offset).

Guest · 29th April 2003, 22:02

H.263 does not define interlaced encoding. So taking it as a guideline for that and concluding that 75-25 is correct is the error I am talking about.

Think about it. If you encode a field picture, the chroma is sited between two adjacent lumas of that field. That is a 50-50 sampling. There is no justification for doing anything else, other than that so many have done it wrong.

I am curious to know whether it is worse to upsample a 50-50 as a 75-25, or to upsample a 75-25 as a 50-50. Perhaps some experiments are in order. But it seems to me that the best implementation would support both.

MfA · 29th April 2003, 22:17

As I said ... in an annex. It is the closest thing to a specification we have, since MPEG seems so determined to be vague (well apart from TM5).

Quote:

W.6.3.11 Interlaced field indications
<snip>
interlaced field coding of a top field picture are specified as shifted up by 1/4 luminance sample height relative to the field sampling grid in order for these samples to align vertically to the usual position relative to the full-picture sampling grid. The vertical sampling positions of the chrominance samples in interlaced field coding of a bottom field picture are specified as shifted down by 1/4 luminance sample height relative to the field sampling grid in order for these samples to align vertically to the usual position relative to the full-picture sampling grid. The horizontal sampling positions of the chrominance samples are specified as unaffected by the application of interlaced field coding

Also the MPEG standards clearly are indicating that the sampling position is not at the halfway position ... the piece of text I quoted earlier was from the MPEG-4 standard. Once more :

Quote:

In each field of an interlaced frame, the chrominance samples do not lie (vertically) mid way between the luminance samples of the field, this is so that the spatial location of the chrominance samples in the frame is the same whether the frame is represented as a single frame-picture or two field-pictures.

When you say that the vast majority of decoders out there assume 50:50 Ill believe you ... but it aint consistent with the standard.

28th April 2003, 05:48	#1 \| Link
Guest Guest Join Date: Jan 2002 Posts: 21,901	YV12 chroma placement Be very careful about averaging! You can make things worse. let's discuss it here before you change it. The 75%-25% weighting is arguably an error. I believe simple copying is fine. Do it as shown here: http://forum.doom9.org/showthread.ph...236#post297236 [EDITED to remove inflammatory language.] Last edited by Guest; 30th April 2003 at 00:39.

28th April 2003, 22:06	#3 \| Link
Si Simply me Join Date: Aug 2002 Location: Lancashire, England Posts: 610	@sh0dan I'm sure your diagrams mean something to you but to us ... (esp the bottom pair) regards Simon __________________ http://www.geocities.com/siwalters_uk/fnews.html

29th April 2003, 02:04	#4 \| Link
TheRealMoh Registered User Join Date: Mar 2003 Posts: 19	@neuron2: My code works exactly as your illustration in the other thread. @sh0dan I understand your diagrams, but why do you say it's 80/20? According to my calculations, it's 75/25. Here is the calc for non-interlaced (for interlaced, I always think of it as separatefields().convert().weave()): Suppose you start with a YUV picture. Consider the first four rows, which have V values V1, V2, V3, V4. When these values are converted to YV12, there are only two rows left: newV1 = (V1+V2)/2 and newV2 = (V3+V4)/2. Let's make a simplifying assumption: the progression of pixels values is linear. Linear progression is basis of all averaging, so it's not a terrible assumption. In other words, the orginal 4 pixels are really: V1 = V V2 = V + d V3 = V + 2 d V4 = V + 3 d where 'd' is the constant delta between the values. The YV12 values are therefore: newV1 = (2 V + d)/2 and newV2 = (2 V + 5 d)/2. The point of this exercise is reconstruct V2. Notice that newV2-newV1 = 2 d V2 is then (newV2 - newV1)/4 + newV1 = (newV2 + 3newV1)/4 In other words, a 75/25 ratio. Confused yet At any rate, the 25% doesn't seem so important. If anyone really wants it that way, they can just load up the avi using avisynth and use its conversion routines. __________________ Thanks Moh Last edited by TheRealMoh; 29th April 2003 at 03:03.*

29th April 2003, 08:30	#6 \| Link
sh0dan Retired AviSynth Dev ;) Join Date: Nov 2001 Location: Dark Side of the Moon Posts: 3,480	@neuron: Glad we agree - I just had the 80/20 from some page in the "Please explain this...."-thread, but they also recommended the 25/75 for speed. Don't know where they got their result from though. @trbarry: I use pavgb now - but I can't see why you would subtract 1? Just using pavgb twice should produce correct rounding. __________________ Regards, sh0dan // VoxPod

29th April 2003, 14:36	#9 \| Link
sh0dan Retired AviSynth Dev ;) Join Date: Nov 2001 Location: Dark Side of the Moon Posts: 3,480	@neuron: Please - I don't want to get into that discussion again. Look at: http://www.hometheaterhifi.com/volum...ug-4-2001.html See "Encoding 4:4:4 to 4:2:0" and "Converting 4:2:0 Back to 4:2:2 or 4:4:4". Also look at the Last pages of the Sticky "Video Frame Properties" threads. In short, there are pros and cons of both. I get VISIBLY better results on chroma intensive material using interpolation. @trbarry: Did a small test program (accumulating all errors between the two routines and float values), and it seems like you (and Intel) are right - there is slightly less bias, when subtracting one before the second average. Except for the slight memory read pentalty, there is no problem implementing it into the current conversion routines. No manual rounding: Average bias: +0.3750 Manual rounding: Average Bias: -0.1250 Well spotted! __________________ Regards, sh0dan // VoxPod

29th April 2003, 16:24	#10 \| Link
sh0dan Retired AviSynth Dev ;) Join Date: Nov 2001 Location: Dark Side of the Moon Posts: 3,480	I implemented the new testing, and made a minor adjustment to progressive YV12 -> YUY2 upsampling. PSNR is about 55-60dB even after 5 conversions back and forth. __________________ Regards, sh0dan // VoxPod

29th April 2003, 16:33	#11 \| Link
Guest Guest Join Date: Jan 2002 Posts: 21,901	@sh0dan I know all about those links. If you're not interested in having your dogmas challenged so be it. Still for the record... Sampling 80-20 or 75-25 when doing interlaced YV12 chroma subsampling is just wrong. That is not to say that people don't do it. But more and more people are realizing how wrong it is and they're stopping it. Whether our upsampling should allow for clips done the wrong way as well as the right way (possibly via an option) is another matter.

29th April 2003, 19:09	#15 \| Link
MfA Registered User Join Date: Mar 2002 Posts: 1,075	Where does this 80:20 mix come from BTW? H.263 specifies a shift of 1/4 pel, so I dont see why the 75:25 mix is termed an approximation (strangely enough the MPEG4 standard while showing a vertical shift does not specify the exact sampling position of chroma samples, but given the backward compatibility to h.263 we can take that as saying MPEG4 uses the 1/4 pel shift too).

29th April 2003, 21:22	#18 \| Link
MfA Registered User Join Date: Mar 2002 Posts: 1,075	It would be nice to have a util which used correlation of edges between luminance and chrominance planes to find what kind of interpolation has been used on existing 4:2:0 sources. BTW, the arguement that everyone should do it wrong because the majority does it wrong is a reasonable one ... if MPEG leaves too many things in the standard open-ended it is up to the industry to reach a consensus. In this case the consensus does not seem to have been reached entirely, so to me it seems best to leave it up to the user. Philips seems to disagree with STM, as does the closest thing to MPEG-2 reference software which is available in the open (the decoder also assumes 1/4 pel offset). Last edited by MfA; 29th April 2003 at 22:22.

29th April 2003, 22:02	#19 \| Link
Guest Guest Join Date: Jan 2002 Posts: 21,901	H.263 does not define interlaced encoding. So taking it as a guideline for that and concluding that 75-25 is correct is the error I am talking about. Think about it. If you encode a field picture, the chroma is sited between two adjacent lumas of that field. That is a 50-50 sampling. There is no justification for doing anything else, other than that so many have done it wrong. I am curious to know whether it is worse to upsample a 50-50 as a 75-25, or to upsample a 75-25 as a 50-50. Perhaps some experiments are in order. But it seems to me that the best implementation would support both. Last edited by Guest; 29th April 2003 at 22:04.