Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Announcements and Chat > General Discussion

Reply
 
Thread Tools Search this Thread Display Modes
Old 30th October 2024, 22:48   #1  |  Link
GeoffreyA
Registered User
 
Join Date: Jun 2024
Location: South Africa
Posts: 79
Dithering: Sierra-24A vs. Floyd-Steinberg

Just a quick question. When dithering down from a higher bit depth, is Floyd-Steinberg error diffusion better than Sierra-24A, or are they visually the same? Thanks.
GeoffreyA is offline   Reply With Quote
Old 31st October 2024, 01:44   #2  |  Link
Asmodian
Registered User
 
Join Date: Feb 2002
Location: San Jose, California
Posts: 4,428
For 8-bit output? 1-bit?

They are very similar. Better or worse probably depends on the exact image and which aspect you are paying attention to. For something like 16 bit to 8 bit I don't think anyone would be able to notice a difference.
__________________
madVR options explained
Asmodian is offline   Reply With Quote
Old 31st October 2024, 01:51   #3  |  Link
FranceBB
Broadcast Encoder
 
FranceBB's Avatar
 
Join Date: Nov 2013
Location: Royal Borough of Kensington & Chelsea, UK
Posts: 3,087
Good question. The answer is: it depends.
There are lots of dithering methods other than ordered dithering like Stucki and Atkinson (both available in the old Dither Tools back when 16bit stacked was the norm), however the most common ones are Floyd Steinberg and Sierra, the two you mentioned, for one simple reason: the first is integrated directly into ConvertBits() and the second is integrated within x264.

Before we dig into the differences between the two, let's make an extreme example: you have an 8bit full pc range image with 0 being black and 255 being white and you wanna encode it so that each pixel is either black or white. A normal rounding would simply check whether the pixel you're trying to encode is closer to 0 or 255 and it would mark it either as black or as white. If you have a pixel that is like 82, it would be converted to black, but so would a pixel with value 44, one with value 55, one with value 61, one with value 73, one with value 96 and so on, thus leading to huge regions of black until there's a jump to white like for pixels whose values are 130, 140, 150 etc. What "error diffusion" or "dithering" does is diffusing (i.e spreading) the error of each calculation to the neighboring pixel. For instance, suppose you have a pixel whose value is 82 and we still need to make it either black or white. With dithering, we would still turn it black as it's closer to 0 than it is to 255, however we would then "remember" that the value was actually 82 steps away from black so that when we move to the neighboring pixel, which has, let's say, a similar value of 92, instead of making it black we add the error (82) to it so that 92+82 = 174 which is closer to 255 so we turn it white. Once again, we "remember" the error 'cause 174 is actually -81 steps away from the white which we will add to the value of the next pixel and so on. This creates an "alternating" pattern of black and white pixels that is much "smoother" than going from a totally black to a totally white transition of several pixels. This is because, in theory, this pattern should do a better job of "mimicking" a section of pixels with values closer with one another (like a gradient). Once we finish processing a line of the image, we "forget" the error and we move on to the next line starting from scratch (i.e starting from an error of 0). Obviously this is not how dithering actually works 'cause in this example I've only really shifted the error to the next pixel (i.e I shifted it to the right), so in one dimension, but dithering algorithms are actually two dimensional as they spread the error in the horizontal and vertical components. There are many ways in which you can diffuse an error in two dimensions, for instance you can diffuse it to one or more pixels on the right, one or more pixels on the left, one or more pixels up, one or more pixels down, however dithering algorithms always push the error forward and never backward, which means that if you start from the top left pixel of an image and you move across it one pixel at a time, you're going right (->) moving down one line at a time so you will never add errors to the left and up as those would be the pixels you have already processed. This is pretty obvious 'cause if you were to push the errors left or up to the pixels you have already processed you would be pushing the errors "backwards", thus leading to more errors. In other words, dithering algorithms only spread the error forward, so right and down. The way in which they spread the error is the way those are differentiated and it's what makes Floyd Steinberg and Sierra different.

Let's suppose we have the following pixels in a section of an 8bit image:

Code:
 42  44 115 116 116
100 120 126 126 126
if we were to map it to black (0) or white (255) by rounding, then those pixels would all turn black as they're all closer to 0 than they are to 255

Code:
0 0 0 0 0
0 0 0 0 0

Floyd Steinberg distributes the error across the neighboring pixels (where "Pixel" is the pixel we're starting with) like this:


Code:
     0   Pixel  7/16
   3/16  5/16   1/16
In our example, the Pixel in question has a value of 44 and we wanna turn it either black or white. 44 is closer to 0 than it is to 255, so we turn it black but we "remember" the error of 44 (i.e 44-0 = 44). Then, we spread the error on the neighboring pixels by multiplying it with the fractions I showed above (i.e dividing it by 16 and multiplying it by whatever the numerator is). In other words:
44 * 0 = 0
44 * 7/16 = 19
44 * 3/16 = 8
44 * 5/16 = 14
44 * 1/16 = 3

so we're not spreading any errors to the left or up (backwards), only to the right, bottom left, bottom center and bottom right pixels (forward). In this case we're diffusing 19 to the right pixel, 8 to the bottom left, 14 to the bottom center and 3 to the bottom right neighbors of the pixel we started from.
Our new pixels will therefore be:
42 + 0 = 42
44 + 0 = 44
115 + 19 = 134
116 + 0 = 116
116 + 0 = 116
100 + 8 = 108
120 + 14 = 134
126 + 3 = 129
126 + 0 = 126
126 + 0 = 126

so our block subject to floyd steinberg error diffusion would be

Code:
 42  44 134 116 116
108 134 129 126 126
which means that when we make the decision to turn the pixel black or white we're now gonna have 3 pixels turning white:

Code:
0  0  255 0 0
0 255 255 0 0
Sierra 2-4A is very similar but it distributes the error like this

Code:
  0    0    Pixel 4/16 3/16
1/16  2/16  3/16  2/16 1/16
As you can see, not only the coefficients are different, but Sierra 2-4A takes into consideration more pixels than Floyd Steinberg, so it spreads the error "more". If we were to apply the same logic to our example Pixel whose value is 115 we would have:

115 * 0 = 0
115 * 0 = 0
115 * 0 = 0
115 * 4/16 = 29
115 * 3/16 = 22
115 * 1/16 = 7
115 * 2/16 = 14
115 * 3/16 = 22
115 * 2/16 = 14
115 * 1/16 = 7

So the Sierra 2-4A error diffusion calculation for our block would be:

42 + 0 = 42
44 + 0 = 44
115 + 0 = 115
116 + 29 = 145
116 + 22 = 138
100 + 7 = 107
120 + 14 = 134
126 + 22 = 148
126 + 14 = 140
126 + 7 = 133

so our block subject to Sierra 2-4A error diffusion error diffusion would be

Code:
 42  44 115 145 138
107 134 148 140 133
which means that when we make the decision to turn the pixel black or white we're now gonna have 6 pixels turning white:

Code:
0  0   0  255 255
0 255 255 255 255

Now, after having said all that, if you're using x264 which supports internal dithering via Sierra 2-4A, then feed it with a 16bit planar input and let it dither it down to either 10bit or 8bit (whatever your bit depth target is) so that it can do it in a way for which it's able to detect those patterns and encode it more efficiently. If however you're using an encoder that doesn't support that (like HCEnc for MPEG-2), then use Floyd Steinberg within Avisynth and feed the encoder with the already dithered down 8bit stream.

Last edited by FranceBB; 31st October 2024 at 02:01.
FranceBB is offline   Reply With Quote
Old 31st October 2024, 09:12   #4  |  Link
GeoffreyA
Registered User
 
Join Date: Jun 2024
Location: South Africa
Posts: 79
Quote:
Originally Posted by Asmodian View Post
For 8-bit output? 1-bit?

They are very similar. Better or worse probably depends on the exact image and which aspect you are paying attention to. For something like 16 bit to 8 bit I don't think anyone would be able to notice a difference.
Yes, 8- and 10-bit output from 16 bits, as well as 8-bit from 32-bit float.
GeoffreyA is offline   Reply With Quote
Old 31st October 2024, 10:13   #5  |  Link
FranceBB
Broadcast Encoder
 
FranceBB's Avatar
 
Join Date: Nov 2013
Location: Royal Borough of Kensington & Chelsea, UK
Posts: 3,087
Quote:
Originally Posted by GeoffreyA View Post
Yes, 8- and 10-bit output from 16 bits,
If you're using x264/x265 then feed them with the 16bit planar input and let them dither. If you're using x262, dither down to 8bit in Avisynth with Floyd Steinberg instead.

Quote:
Originally Posted by GeoffreyA View Post
8-bit from 32-bit float.
Avisynth doesn't support 32bit float dithering. If you're applying ConvertBits(bits=8, dither=1) to a 32bit float input what's gonna happen is that Avisynth first converts to 16bit planar and then it applies the Floyd Steinberg error diffusion to get to 8bit. As for x264/x265, they don't support 32bit float input either, so either you convert to 16bit inside Avisynth and then you let them dither or you convert already to the target bit depth.

x264 Sierra 2-4A is here: Link

Avisynth Floyd Steinberg is here: Link


About the Avisynth section, if you scroll down to line 355 of the source code, you're gonna see the coefficients I mentioned above.

You remember the

Code:
     0   Pixel  7/16
   3/16  5/16   1/16
for Floyd Steinberg I mentioned last night, right?
Well, here they are at line 355

Code:
static AVS_FORCEINLINE void diffuse_floyd_f(float err, float& nextError, float* error_ptr)
{
#if defined (FS_OPTIMIZED_SERPENTINE_COEF)
  const float    e1 = 0;
  const float    e3 = err * (4.0f / 16);
#else
  const float    e1 = err * (1.0f / 16);
  const float    e3 = err * (3.0f / 16);
#endif
  const float    e5 = err * (5.0f / 16);
  const float    e7 = err * (7.0f / 16);

  nextError = error_ptr[direction];
  error_ptr[-direction] += e3;
  error_ptr[0] += e5;
  error_ptr[direction] = e1;
  nextError += e7;
}

Sound familiar?

7/16 is (7.0f / 16)
3/16 is (3.0f / 16)
5/16 is (5.0f / 16)
1/16 is (1.0f / 16)

and the * you see in the code stands for "multiplication" of the error for those values, exactly like in the calculation I mentioned last night.
FranceBB is offline   Reply With Quote
Old 31st October 2024, 10:53   #6  |  Link
GeoffreyA
Registered User
 
Join Date: Jun 2024
Location: South Africa
Posts: 79
Quote:
Originally Posted by FranceBB View Post
Good question. The answer is: it depends....
Hats off to you, FranceBB! What a descriptive, thorough, and simple explanation. Before, I had a vague picture of what error diffusion does, but now I understand it considerably better. I appreciate your taking the time to write that, and I'm sure it will also be useful to others. Many thanks!

I asked because, recently, encoding a set of anime in AV1, HEVC, and AVC, I used the default dithering of fmtconv, Sierra 2-4A, after debanding. I wondered whether Floyd-Steinberg would have been better or the same. Now, as chance would have it, I have to re-encode the anime, and thought I'd get to the bottom of this. Also, for encoding live-action films after colour-space conversion and tone mapping, it is being dithered from 32-bit float to 8 bits (fmtconv, Sierra; and zimg/zscale, FS).

I had been piping the final, dithered result to FFmpeg. To follow your advice of sending the 16-bit data to the encoder, letting it handle dithering, I think I'll have to switch to using the x264/5 executable directly. I stand to be corrected, but in FFmpeg, I don't think it's possible for the encoding library to handle the final bit depth; it seems that is higher up in the architecture.
GeoffreyA is offline   Reply With Quote
Old 31st October 2024, 11:42   #7  |  Link
GeoffreyA
Registered User
 
Join Date: Jun 2024
Location: South Africa
Posts: 79
Quote:
Originally Posted by FranceBB View Post
If you're using x264/x265 then feed them with the 16bit planar input and let them dither. If you're using x262, dither down to 8bit in Avisynth with Floyd Steinberg instead.

Avisynth doesn't support 32bit float dithering. If you're applying ConvertBits(bits=8, dither=1) to a 32bit float input what's gonna happen is that Avisynth first converts to 16bit planar and then it applies the Floyd Steinberg error diffusion to get to 8bit. As for x264/x265, they don't support 32bit float input either, so either you convert to 16bit inside Avisynth and then you let them dither or you convert already to the target bit depth.
Once again, excellent description and it makes perfect sense. I do understand the code because I knew amateur C++ programming back in the day, but those days are passed, and I'm a layman now

I'm using VapourSynth: fmtconv does support dithering straight from 32-bit float to whatever integer format and has a couple of algorithms, Sierra 2-4A being the default. For the anime, because the f3kdb debanding works at 16 bits, I'll try to pipe that straight to the encoders. For the films, I'll dither from 32 to 8 bits and then pipe to FFmpeg.
GeoffreyA is offline   Reply With Quote
Old 31st October 2024, 11:58   #8  |  Link
Z2697
Registered User
 
Join Date: Aug 2024
Posts: 113
I'm trying to sound like an ___ (you name it) but since you are doing video encoding, based on my own testing the better choice is to aviod any kind of error diffusion methods. That leaves you with dmode 0, 8 and 9. For 8-bit target at least, for 10-bit I just following the result from 8-bit because it's less signifant so harder to compare.
Because this again is like "source: trust me bro" and you are only asking error diffusion algorithms, I'll pause here, if you want some explanations (which probably also sounds like "source: trust me bro") I shall continue.
Z2697 is offline   Reply With Quote
Old 31st October 2024, 12:28   #9  |  Link
GeoffreyA
Registered User
 
Join Date: Jun 2024
Location: South Africa
Posts: 79
Quote:
Originally Posted by Z2697 View Post
I'm trying to sound like an ___ (you name it) but since you are doing video encoding, based on my own testing the better choice is to aviod any kind of error diffusion methods. That leaves you with dmode 0, 8 and 9. For 8-bit target at least, for 10-bit I just following the result from 8-bit because it's less signifant so harder to compare.
Because this again is like "source: trust me bro" and you are only asking error diffusion algorithms, I'll pause here, if you want some explanations (which probably also sounds like "source: trust me bro") I shall continue.
Indeed, this is another question: of whether to use ordered or error diffusion when encoding. Do you think there's a big loss of efficiency when using the latter?
GeoffreyA is offline   Reply With Quote
Old 31st October 2024, 13:53   #10  |  Link
Z2697
Registered User
 
Join Date: Aug 2024
Posts: 113
https://f3kdb.readthedocs.io/en/late...rg-dither-algo
Quote:
Visual quality of mode 3 is the best, but the debanded pixels may easily be destroyed by x264, you need to carefully tweak the settings to get better result
My understanding is that due to the nature of error diffusion, the dither "pattern" is largely different frame to frame, making it "harder to be preserved", but if your video already has a lot dynamic noise that might be less of an issue.

This f3kdb doc only says Floyd-Steinberg as it's the one in f3kdb but it also applies to other error diffusions. And it's NOT only specific to the debanded pixels. The debanding and dithering of f3kdb is basically two independent parts, so this actually applies to any dither filter. I quoted f3kdb as a trusty source, not because the debanding is important in the context.

Last edited by Z2697; 31st October 2024 at 14:00.
Z2697 is offline   Reply With Quote
Old 31st October 2024, 13:55   #11  |  Link
FranceBB
Broadcast Encoder
 
FranceBB's Avatar
 
Join Date: Nov 2013
Location: Royal Borough of Kensington & Chelsea, UK
Posts: 3,087
Quote:
Originally Posted by GeoffreyA View Post
in FFmpeg, I don't think it's possible for the encoding library to handle the final bit depth; it seems that is higher up in the architecture.
Yeah, that's correct, unfortunately only the main executable of x264/x265 can do that, not their library version which ffmpeg is using, so you're already doing the right thing by dithering in the frameserver before feeding FFMpeg and libx264/libx265.

Quote:
Originally Posted by GeoffreyA View Post
I appreciate your taking the time to write that, and I'm sure it will also be useful to others.
I'm glad to be playing my part in the community. I should really add this to the Wiki, but I would then have to find the time to mention stucki and atkinson as well if I do that, so we'll see.
FranceBB is offline   Reply With Quote
Old 1st November 2024, 07:49   #12  |  Link
GeoffreyA
Registered User
 
Join Date: Jun 2024
Location: South Africa
Posts: 79
Quote:
Originally Posted by Z2697 View Post
https://f3kdb.readthedocs.io/en/late...rg-dither-algo


My understanding is that due to the nature of error diffusion, the dither "pattern" is largely different frame to frame, making it "harder to be preserved", but if your video already has a lot dynamic noise that might be less of an issue.

This f3kdb doc only says Floyd-Steinberg as it's the one in f3kdb but it also applies to other error diffusions. And it's NOT only specific to the debanded pixels. The debanding and dithering of f3kdb is basically two independent parts, so this actually applies to any dither filter. I quoted f3kdb as a trusty source, not because the debanding is important in the context.
I think, in theory, error diffusion should be similar to a very mild grain; and if there already is much temporal noise, that should outweigh any dithering. I'd be interested in testing this. Probably, a good starting point would be a clean, likely digitally-shot source, and then one with strong grain. For the latter, I've got 1987's "RoboCop," Arrow's restoration, and its grain is fearsome, killing any encoder thrown at it!
GeoffreyA is offline   Reply With Quote
Old 1st November 2024, 07:53   #13  |  Link
GeoffreyA
Registered User
 
Join Date: Jun 2024
Location: South Africa
Posts: 79
Quote:
Originally Posted by FranceBB View Post
I should really add this to the Wiki, but I would then have to find the time to mention stucki and atkinson as well if I do that, so we'll see.
Indeed, there are so many methods.
GeoffreyA is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 11:57.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.