PDA

View Full Version : VQ Test for MPEG-2 Encoders


FranceBB
15th July 2019, 22:18
VQ Test for MPEG-2 Encoders

Broadcast Encoder : Francesco Bucciantini (FranceBB (https://forum.doom9.org/member.php?u=219051))
Senior Video Editor : Livio Aloja (algia (https://forum.doom9.org/member.php?u=228816))

The files analysed are named “Test4” as we did several tests and they are aimed to
encode files from lossless masters for internal usage as mezzanine files.

1) Input file and encoding target
2) The impact of Dithering on objective metrics
3) Comparison between encoders
4) Results and final thoughts
5) Bibliography

1) Input file and encoding target

For this test, the original masterfile is an Apple ProRes, FULL HD (1920x1080), 10bit,
4:2:2 planar BT709 Limited TV Range with both progressive and interlaced contents at
25fps. The target is an XDCAM50 lossy mezzanine file for broadcast usage, which is an MPEG-2,
FULL HD, 50Mbit/s, 8bit, 4:2:2 planar (yv16), BT709 Limited TV Range, closed GOP, with both
progressive (flagged as interlaced) and interlaced contents at 25fps.
The test reel has different types of contents to test how encoders behave.

2) The impact of Dithering on objective metrics

Internally, whenever we get an high bit depth source, we apply Dithering in order to
avoid to introduce banding while bringing it to 8bit. In particular, we use the Floyd-Steinberg
error diffusion.
The algorithm achieves dithering using error diffusion, meaning it pushes (adds) the residual
quantization error of a pixel onto its neighboring pixels, to be dealt with later. It spreads the
debt out according to the distribution (shown as a map of the neighboring pixels):

https://i.imgur.com/Sj5RtzN.png

The pixel indicated with a star indicates the pixel currently being scanned, and the blank pixels
are the previously-scanned pixels. The algorithm scans the image from left to right, top to
bottom, quantizing pixel values one by one. Each time the quantization error is transferred to
the neighboring pixels, while not affecting the pixels that already have been quantized. Hence, if
a number of pixels have been rounded downwards, it becomes more likely that the next pixel is
rounded upwards, such that on average, the quantization error is close to zero.
The original lossless masterfile for this test is 10bit, but the encoded file has to be 8bit due to
the XDCAM specifications, so we were wondering whether Dithering has a positive or a negative
impact on objective metrics compared to truncation.

We tried to run some internal tests with three different types of dithering:

– Serpentine Floyd-Steinberg error diffusion
– Stucki error diffusion
– Atkinson error diffusion

When we compared each test, we noticed that the Serpentine Floyd-Steinberg error diffusion is
a well-balanced algoritm (which confirms the reason why we use it internally), the
Stucki error diffusion looks “sharp” and preserve light edges and details well and the Atkinson
error diffusion generates distinct patterns but keeps clean the flat areas. Unfortunately, though,
even if they look “better” to the human eye, this is strictly subjective, as they only look “different”.
As a matter of fact, on both SSIM and PSNR, each dithering method has got a lower score
compared to truncation.
The interesting fact, though, is that it didn't get a lower score in every single frame, as there are a
very few frames in which dithering algorithms managed to get an higher value compared to
truncation, but overall truncation outperformed dithering algorithms by 1.51% in SSIM and 0.3%
in PSNR, that's why we decided not to include Dithering as reference.

3) Comparison between encoders

In this part, we are going to compare the following encoders: Ateme, AWS, Selenio, Telestream and x262.
For the reason already explained above, the file encoded with x262 has been encoded without any dithering algorithms and just using truncation.

https://i.imgur.com/tnqPiYY.png
https://i.imgur.com/8Wa4OFD.png

The first graph represents how the different encoders behave during the whole video.
Since SSIM goes from 0 to 1 with many digits after the 0, we re-scaled it in order to make it more human readable. From the tests, AWS performed better than other encoders, with a score of 289921, followed by Ateme by a very narrow margin (289639). At the third position, with a rather significant quality drop, we have Telestream with 281010, followed by Selenio with 279577.
At the bottom, we have x262 which scored 276854 and which is outperformed by 4.51%.

https://i.imgur.com/yWA6JCY.png
https://i.imgur.com/uhOUXGR.png

PSNR pretty much confirms what is shown by SSIM.
AWS performed better than all the other encoders and scored 733272, followed by Ateme by a narrow margin with 731639. At the third position, there's Telestream with 729195, followed by Selenio which scored 722449. At the bottom, there's x262 with a total score of 720755.
According to PSNR, though, Selenio is closer to the quality reached by x262 rather than the one reached by Telestream.

SSIM Individual Charts (From best to worse):

https://i.imgur.com/CD1kDRU.png
https://i.imgur.com/lq4t45U.png
https://i.imgur.com/IJkpWqe.png
https://i.imgur.com/BfZbCVi.png
https://i.imgur.com/XUes4sv.png

PSNR Individual Charts (From best to worse):

https://i.imgur.com/itTULrg.png
https://i.imgur.com/W33rgCK.png
https://i.imgur.com/nh32ha2.png
https://i.imgur.com/gpdPeD5.png

4) Results and final thoughts

AWS managed to achieve a better score compared to all the other encoders, but its advantage is only because it had a slightly higher spike on a few scenes, while overall it had pretty much the same performance as Ateme. In particular, grain retention was pretty much fine on both, but when random noise recorded by the camera came into the equation, AWS managed to handle it slightly better than Ateme, but again, overall, they performed pretty much the same, that's why the margin was really narrow. At the third place, Telestream performed worse compared to Ateme by a not-so-high margin, but still, it was worse. Even though Telestream performed worse, it's still closer to the upper part of the chart rather than to the lower part of the chart, however it didn't quite manage to get to the same level of Ateme on too many scenes, that's why it ended up being third. At the fourth position there's Selenio that performed significantly worse than AWS and Ateme and worse than Telestream by a still significant margin. At the bottom of the table, there's x262, which apparently is the worse MPEG-2 encoder among those at such an high bitrate and its performance was pretty low overall and on top of that, it struggled to encode sport properly. On the other had, x262 is free and open source, while all the other encoders are closed source, need to be purchased, their cost is very high and they don't support Avisynth input, so we would still choose x262 over those other encoders. We're already using x262 and we're not planning to change anytime soon.

Bibliography

–Visgraf (Vision and Graphic Laboratory) Mathematics, FS algorithm
–Proceedings of the Society of Information Display (adaptive algorithm)

kolak
16th July 2019, 19:07
AWS is an ex Elemental encoder, no?
I also found FS dithering been the best, but I also kept adding tiny amount of noise (this was for Blu-ray encoding though).

Blue_MiSfit
17th July 2019, 00:27
Fascinating :)

I was always impressed with Elemental's MPEG-2 encoder, so I'm not surprised to see it doing so well here. Particularly when doing low bitrate 15 Mbps CableLabs compliant 1080i it absolutely crushed my go-to at that point - Harmonic ProMedia Carbon aka Rhozet Carbon Coder.

Any particular reason you left Harmonic out of the mix?

kolak
17th July 2019, 11:03
Is it that good?
Rhozet was a reference for interlaced content for me.

Blue_MiSfit
18th July 2019, 02:46
Rhozet was fine, and WAY better than Digital Rapids (now Imagine's Selenio product line), but at low bitrates (15 Mbps for 1080i) it had a lot of blocking that totally went away with Elemental. Plus, the latter was WAY faster :)

Gosh I haven't thought about this in awhile, It's been years since I've done any MPEG-2 encoding.

ifb
14th July 2021, 02:15
Do you remember the x262 commandline? I sure hope you used --tune ssim/psnr. :)


I should probably check in here more often. This thread is 2 years old and I haven't seriously worked on x262 in 8 years. I'm glad people still find it somewhat useful, though.

FranceBB
14th July 2021, 13:50
Do you remember the x262 commandline?
I should probably check in here more often. This thread is 2 years old and I haven't seriously worked on x262 in 8 years. I'm glad people still find it somewhat useful, though.

I'm glad you're back, though! :D
About x262, I don't remember the command line, but I remember that I had to make sure the content was progressive 25p 'cause this:

x262_64.exe "AVS Script.avs" --mpeg2 --preset medium --profile 422 --bitrate 50000 --vbv-maxrate 50000 --vbv-bufsize 17825792
--keyint 12 --bframes 2 --tff --deblock -1:-1 --overscan show --colormatrix bt709 --range tv --transfer bt709 --colorprim bt709
--videoformat component --nal-hrd cbr --output-csp i422 --output "\\mibctvan000\Ingest\MEDIA\temp\raw_video.h262"

pause

ends up with:


avs [info]: 1920x1080i 0:0 @ 25/1 fps (cfr)
x262 [error]: interlaced 4:2:2 not implemented


so I had to settle for something like:

x262_64.exe "AVS Script.avs" --mpeg2 --preset medium --profile 422 --bitrate 50000 --vbv-maxrate 50000 --vbv-bufsize 17825792
--keyint 12 --bframes 2 --deblock -1:-1 --overscan show --colormatrix bt709 --range tv --transfer bt709 --colorprim bt709 --videoformat component
--nal-hrd cbr --output-csp i422 --output "\\mibctvan000\Ingest\MEDIA\temp\raw_video.h262"

pause



Now, here is the thing:

first of all to create a complaint XDCAM-50 stream we need:

--level

which is currently not working if I try things over 1, but we need "HL", so High Level.
The MPEG-2 levels are:
LL Low Level
ML Main Level 2
H-14 High 1440
HL High Level

we need the last one.

--nal-hrd cbr

should actually be displaying constant bitrate properly rather than showing it as variable.
Then:

--keyint 12 --bframes 2

are meant to be used (I think) so that the GOP is:

pict_type=I
pict_type=B
pict_type=B
pict_type=P
pict_type=B
pict_type=B
pict_type=P
pict_type=B
pict_type=B
pict_type=P
pict_type=B
pict_type=B

and then repeats the sequence as M=3 N=12.

This is an example of an XDCAM File encoded with FFMpeg:

General
Complete name : S:\MEDIADIRECTOR\ARCA\F1 1993 GP ITALIA GARA 930912 1P (7754523).mxf
Format : MXF
Commercial name : XDCAM HD422
Format version : 1.3
Format profile : OP-1a
Format settings : Closed / Complete
File size : 45.8 GiB
Duration : 1 h 49 min
Overall bit rate : 60.0 Mb/s
Encoded date : 2021-06-23 14:23:37.404
Writing application : FFmpeg OP1a Muxer 58.65.101.0.0

Video
ID : 2
Format : MPEG Video
Commercial name : XDCAM HD422
Format version : Version 2
Format profile : 4:2:2@High
Format settings : CustomMatrix / BVOP
Format settings, BVOP : Yes
Format settings, Matrix : Custom
Format settings, GOP : M=3, N=12
Format settings, picture structure : Frame
Format settings, wrapping mode : Frame
Codec ID : 0D01030102046001-0401020201040300
Duration : 1 h 49 min
Bit rate mode : Constant
Bit rate : 50.0 Mb/s
Width : 1 920 pixels
Height : 1 080 pixels
Display aspect ratio : 16:9
Frame rate : 25.000 FPS
Standard : Component
Color space : YUV
Chroma subsampling : 4:2:2
Bit depth : 8 bits
Scan type : Interlaced
Scan order : Top Field First
Compression mode : Lossy
Bits/(Pixel*Frame) : 0.965
Time code of first frame : 00:00:00:00
Time code source : Group of pictures header
GOP, Open/Closed : Open
GOP, Open/Closed of first frame : Closed
Stream size : 38.2 GiB (83%)
Color range : Limited
Color primaries : BT.709
Transfer characteristics : BT.709
Matrix coefficients : BT.709

and this is the command line from FFMpeg:


avs2yuv.exe "S:\00_INGEST_MAM\A.R.C.A\02_ALTRO\NR_DJF_AVISYNTH_TEST_SPORT_SD.avs" -csp AUTO -o - | ffmpeg.exe -i - -pix_fmt yuv422p -vcodec mpeg2video
-s 1920:1080 -aspect 16:9 -vf setfield=tff -flags +ildct+ilme+cgop -b_strategy 0 -mpv_flags +strict_gop -sc_threshold 1000000000 -r 25
-b:v 50000k -minrate 50000k -maxrate 50000k -bufsize 17825792 -g 12 -bf 2 -profile:v 0 -level:v 2 -color_range 1 -color_primaries 1 -color_trc 1
-colorspace 1 -y "\\MIBCSSDA001\Media Ingest\filetemporanei\server0\output.mxf"



Can you help me out a bit here?
I mean, can we try to support XDCAM-50 properly this time?
Let's be honest, aside from professional formats like XDCAM and IMX, there's no much use for MPEG-2 nowadays.
On top of that, I think x262 should not encode in H.264. I think it should be MPEG-2 only with the same features as x264 but that's it. This way, if we get that MPEG-2 only and XDCAM/IMX compliant, then I'm pretty sure it can be used professionally and it could even be included in FFMpeg as libx262! ;)

I sure hope you used --tune ssim/psnr. :)

Wouldn't that be cheating? ehehehehe
I mean the goal was to offer a real life scenario, but yeah I guess picking them would have achieved a higher score, definitely eheheheh

Emulgator
14th July 2021, 15:26
Still there is improvement room to pull closer to CCE, DVDs are still needed.

FranceBB
14th July 2021, 15:43
Still there is improvement room to pull closer to CCE, DVDs are still needed.

True. But making a DVD compliant stream is also professional in some sense, so yeah.

Anyway, if ifb gets XDCAM right I'm gonna be happy xD

benwaggoner
14th July 2021, 17:23
What's the goal in using objective metrics here? Particularly for dithering, which is absolutely a subjective optimization.

The "best" encoder is the one that delivers the best subjective results in double-blind comparisons. Objective metrics are an okay 1st order approximation of some things, but not for things like how different dithering modes impact subjective quality of the final output. For MPEG-2, a dithering mode that looks better pre-compression might wind up compressing less, and any potential quality gain is eaten up by artifacts from encoding requiring a higher QP.

FranceBB
14th July 2021, 19:34
Yes, which is why I tested to see whether dithering was going to affect quality in terms of objective metrics or not, and it did, negatively, so in the end the one I evaluated was the one obtained via truncation, which is the one you see in the charts

benwaggoner
15th July 2021, 01:23
Yes, which is why I tested to see whether dithering was going to affect quality in terms of objective metrics or not, and it did, negatively, so in the end the one I evaluated was the one obtained via truncation, which is the one you see in the charts
But truncation looks bad, and would never be used in real-world DVD encoding.

There's a lot of secret sauce in dithering for 8-bit codecs, particularly as simple a one as MPEG-2.

ifb
21st July 2021, 15:28
The MPEG-2 levels are:
LL Low Level
ML Main Level 2
H-14 High 1440
HL High Level

we need the last one.
There are five levels supported (https://git.videolan.org/?p=x262.git;a=blob;f=x264.h;h=8b10b7a1cb9a5f8aaf86340c35a189855f8f8cb2;hb=HEAD#l88) (you omitted HighP). Using --output-csp i422 --level high reports
x262 [info]: 4:2:2 profile @ High level
so I don't see the problem. It's the same behavior as x264. The minimal level is assumed and you can force High if needed (i.e. with SD resolutions).
--keyint 12 --bframes 2
are meant to be used (I think) so that the GOP is:
I don't know that a fixed GOP pattern is strictly required for XDCAM, but a fixed M,N pattern is achieved the same as with x264. Use --no-scenecut (https://forum.doom9.org/showthread.php?t=167216).

Can you help me out a bit here?
I mean, can we try to support XDCAM-50 properly this time?
You're implying some sort of failure, but XDCAM was never a goal. 4:2:2 was added just because I could. Interlacing was added (and paid for) by a company that needed it. Extending that to 4:2:2 is outside of my expertise and interest.

On top of that, I think x262 should not encode in H.264. I think it should be MPEG-2 only with the same features as x264 but that's it. This way, if we get that MPEG-2 only and XDCAM/IMX compliant, then I'm pretty sure it can be used professionally and it could even be included in FFMpeg as libx262! ;)
x262 IS x264. It's a patch that would have been merged into mainline x264. The whole genius/stupid troll is that you can dumb down an AVC encoder and have it spit out MPEG-2 (even MPEG-1) with minimal effort. Unless and until there is some requirement/feature that would negatively affect the main AVC codebase, separating them makes little sense. They are 95% the same code.

The x262 repository hasn't been updated largely because of the x264 history rewrite. I'm not a git-filter-branch expert enough to fix it.

Wouldn't that be cheating? ehehehehe
I mean the goal was to offer a real life scenario, but yeah I guess picking them would have achieved a higher score, definitely eheheheh
It's not cheating at all. The entire point of those tunes (https://web.archive.org/web/20150119234005/http://x264dev.multimedia.cx/archives/458) is to not cripple x264 when doing codec comparisons. You're using metrics that the AQ and psy-opts in x264/x262 bias against.

A "real life scenario" uses your eyes, not PSNR/SSIM.

FranceBB
21st July 2021, 22:21
Extending that to 4:2:2 is outside of my expertise and interest.


That's really a shame... :(
If you'll ever feel like into this, lots of people will definitely appreciate it, I'm sure. :)


x262 IS x264. It's a patch that would have been merged into mainline x264. The whole genius/stupid troll is that you can dumb down an AVC encoder and have it spit out MPEG-2 (even MPEG-1) with minimal effort. Unless and until there is some requirement/feature that would negatively affect the main AVC codebase, separating them makes little sense. They are 95% the same code.


Well, if it will actually be merged into x264, then that's even better. :)


The x262 repository hasn't been updated largely because of the x264 history rewrite. I'm not a git-filter-branch expert enough to fix it.


Ah, I see.



It's not cheating at all. The entire point of those tunes (https://web.archive.org/web/20150119234005/http://x264dev.multimedia.cx/archives/458) is to not cripple x264 when doing codec comparisons. You're using metrics that the AQ and psy-opts in x264/x262 bias against.


Ah, right, due to the psychovisual optimization. In this case I can try to repeat the tests with --tune ssim




I don't know that a fixed GOP pattern is strictly required for XDCAM, but a fixed M,N pattern is achieved the same as with x264.


Ok, so I guess

--no-scenecut --keyint 12 --bframes 2

should give me:

pict_type=I
pict_type=B
pict_type=B
pict_type=P
pict_type=B
pict_type=B
pict_type=P
pict_type=B
pict_type=B
pict_type=P
pict_type=B
pict_type=B

right?


So I guess the final command line would be:

x262_64.exe "AVS Script.avs" --mpeg2 --preset medium --level high --profile 422 --bitrate 50000 --vbv-maxrate 50000 --vbv-bufsize 17825792
--keyint 12 --bframes 2 --no-scenecut --deblock -1:-1 --overscan show --colormatrix bt709 --range tv --transfer bt709 --colorprim bt709 --videoformat component
--nal-hrd cbr --output-csp i422 --output "\\mibctvan000\Ingest\MEDIA\temp\raw_video.h262"

pause

and then use --tune ssim for the comparison. :)


I'll test again and let you know if it works correctly but I have faith it will :P



The only "sad" thing is that it's gonna be progressive only... :(

If you'll ever introduce interlace encode in 4:2:2 please let me know.
I also hope that one day or another your patch will be merged into the mainline x264 repository at this point, although, if it hasn't been merged after so many years, I doubt it will ever be...

By the way, just let me say that what you've done was brilliant; I mean, using the x264 features to encode an MPEG-2 stream is very interesting and it's a shame that it hasn't been used by more broadcasters... :(

pandy
22nd July 2021, 17:13
I have one question - is there any justification to not test ordered dither side (or as alternative) to "noise" like dither?
FS is "noise" type dither and in unavoidable way it will increase entropy and sources coded with lossy encoder may deliver suboptimal results.

ifb
22nd July 2021, 18:01
Ok, so I guess

--no-scenecut --keyint 12 --bframes 2

should give me:

pict_type=I
pict_type=B
pict_type=B
pict_type=P
pict_type=B
pict_type=B
pict_type=P
pict_type=B
pict_type=B
pict_type=P
pict_type=B
pict_type=B

right?
I didn't test it, but I think so.


So I guess the final command line would be:

x262_64.exe "AVS Script.avs" --mpeg2 --preset medium --level high --profile 422 --bitrate 50000 --vbv-maxrate 50000 --vbv-bufsize 17825792
--keyint 12 --bframes 2 --no-scenecut --deblock -1:-1 --overscan show --colormatrix bt709 --range tv --transfer bt709 --colorprim bt709 --videoformat component
--nal-hrd cbr --output-csp i422 --output "\\mibctvan000\Ingest\MEDIA\temp\raw_video.h262"

pause

--deblock, --overscan, --range do nothing in MPEG-2
--profile 422 isn't needed with --output-csp i422
--level high isn't needed either if you are encoding HD resolutions

Maybe add --open-gop if XDCAM allows it. Short keyints are the worst and that helps some.

If you'll ever introduce interlace encode in 4:2:2 please let me know.
I also hope that one day or another your patch will be merged into the mainline x264 repository at this point, although, if it hasn't been merged after so many years, I doubt it will ever be...
x264 development is in different hands than when this started, so that's a big part of the issue.

By the way, just let me say that what you've done was brilliant;kierank did a lot, so I can't take all the credit (or even most of it). There were rumors about xvp8 being written, plus I wanted something I could use for ATSC, thus x262 was born. That it turned out kinda OK is a pleasant surprise. The very informal eyeball comparisons I did using park_joy were favorable, IMHO.

2010-10-09 22:28:47 < kierank> awww BBB's not here
2010-10-09 22:28:57 < Dark_Shikari> what do you need to harass him for?
2010-10-09 22:29:09 < kierank> the fact that x262 is making progress as opposed to xvp8
2010-10-09 22:29:15 < ifb> lol
2010-10-09 22:29:45 < Dark_Shikari> hell yes
2010-10-09 22:29:53 < Dark_Shikari> \o/
2010-10-09 22:29:59 < Dark_Shikari> Now, what you need to do
2010-10-09 22:30:02 < Dark_Shikari> is show x262 beating libvpx
2010-10-09 22:31:55 < Dark_Shikari> troll troll troll troll troll
2010-10-09 22:32:08 < Dark_Shikari> we can really troll the shit out of michael with this one too
2010-10-09 22:32:12 < Dark_Shikari> "lol x264 is a better ffmpeg than ffmpeg"

benwaggoner
23rd July 2021, 04:57
I have one question - is there any justification to not test ordered dither side (or as alternative) to "noise" like dither?
FS is "noise" type dither and in unavoidable way it will increase entropy and sources coded with lossy encoder may deliver suboptimal results.
Patterned dithers are more efficient with GIF and PNG as they result in similar pixel sequences; better for lossless RGB entropy codecs. But they're no help for DCT codecs like MPEG-2. High quality 8-bit movie encoders are really dependent on high quality, highly tuned dithering. Tools like xscaler are/were used and different modes could be used for different scenes.

Really, metrics should be calculated comparing the dithered input to the encoder in the final and the encoder output. A subjective analysis comparing different dithering modes is interesting, but objective metrics just aren't going to be helpful.

Dithering is a fascinating area of study that I'm hoping to purge from my memory as we enter the 10+ bit era ;).

But heck, it was probably 20 years ago I first said "thank goodness I'll never have to do an Animated GIF again!"

pandy
2nd August 2021, 20:59
Patterned dithers are more efficient with GIF and PNG as they result in similar pixel sequences; better for lossless RGB entropy codecs. But they're no help for DCT codecs like MPEG-2. High quality 8-bit movie encoders are really dependent on high quality, highly tuned dithering. Tools like xscaler are/were used and different modes could be used for different scenes.


Don't get me wrong but dither like FS is nothing than stress to encoder (and it is usually filtered out if possible).

Also FS temporal characteristic made impossible to compress this kind of dither.
From my perspective FS is like noise i.e. suboptimal from spatial and temporal perspective.

Beside to this - clear proof how behave FS in video - use same video before compression, then modify video by placing single static pixel in static position (so no change from temporal perspective) apply FS to both videos, subtract original from modified - compare spatial residue, then second experiment - modify single pixel with different position each frame - subtract reference from modified - analyze residue in spatial and temporal domain.

Perhaps i'm bit naive but from my perspective residue is noise and noise is extremely difficult to compress especially by lossy encoder...

Perhaps my example is too simplistic but still similar effect can be achieved simply by adding filtered noise (blue?) to lets say Luma channel.

Also instead FS, some refined ordered dither such as Ulichney could be better at least from temporal perspective.


Really, metrics should be calculated comparing the dithered input to the encoder in the final and the encoder output. A subjective analysis comparing different dithering modes is interesting, but objective metrics just aren't going to be helpful.

Depends what is your goal - if "objective" then definitely yes, always apples shall be compared with apples - if subjective then if you define properly area then apples can be compared with for example pineapples.
And always - you need clearly define goal and methodology.


Dithering is a fascinating area of study that I'm hoping to purge from my memory as we enter the 10+ bit era ;).

But heck, it was probably 20 years ago I first said "thank goodness I'll never have to do an Animated GIF again!"

Dithering is unavoidable - whenever quantization is involved then dithering (and best with psychovisual matched noiseshaping/errorshaping) is mandatory - 10 bit solving some problems - of course modern display technology quickly reaching level where for average consumer this will be enough but human eyes are capable way more than 10 bits (depends on conditions and context somewhere between 12 and 14 bits).
this is same like ultra high frame rate - 300...600 frames per second will be key to get full immersion....

But my question was triggered by FS dithering - from my experience FS dither raising QP dramatically and literally stealing bits from video details.

Blue_MiSfit
31st August 2021, 02:58
Any suggestions on tuning ffmpeg's mpeg-2 encoder or x262 for quality? I'm happy to burn as much CPU time as possible!

I'm targeting 6 Mbps for 720p59.94. Aggressive? ... Yes :devil:

GMJCZP
1st September 2021, 01:33
Any suggestions on tuning ffmpeg's mpeg-2 encoder or x262 for quality? I'm happy to burn as much CPU time as possible!

I'm targeting 6 Mbps for 720p59.94. Aggressive? ... Yes :devil:

Please search in my Arsenal the script for DVD with FFMPEG.

EDIT: Ok, I see, this is another class of encoders, my comment is not exactly applicable here.

rwill
1st September 2021, 08:35
Any suggestions on tuning ffmpeg's mpeg-2 encoder or x262 for quality? I'm happy to burn as much CPU time as possible!

I'm targeting 6 Mbps for 720p59.94. Aggressive? ... Yes :devil:

I could pitch y262 here but Id rather not.

benwaggoner
2nd September 2021, 19:25
Don't get me wrong but dither like FS is nothing than stress to encoder (and it is usually filtered out if possible).

Also FS temporal characteristic made impossible to compress this kind of dither.
From my perspective FS is like noise i.e. suboptimal from spatial and temporal perspective.
Perhaps i'm bit naive but from my perspective residue is noise and noise is extremely difficult to compress especially by lossy encoder...
You are totally correct. It's generally a desired property of dithering to look like noise rather than have an obvious repeated pattern to it, like older techniques yielded.

A key thing is the strength of the dither. With xscaler about a 0.5 worked pretty well to prevent banding without adding too much high-entropy noise that is harder to compress. Different kids of content benefit from different strengths. Lots of film grain can make dither unneeded, while clean-noise free CGI may need stronger as banding can become extra obvious. Anime can benefit from some more adaptive techniques, as dithering flat areas is undesired, but gradients need it, and thresholding would be a big problem using a "dither/don't dither" mask instead of having variable strength.

Also instead FS, some refined ordered dither such as Ulichney could be better at least from temporal perspective.

Because you'd get better temporal matches? Definitely true for an Animated GIF, but I've not seen it proven out for block-based motion-compensation codecs. It's tricky to separate the similar high-frequency AC coefficients while having quite different lower-frequency AC coefficients.

Depends what is your goal - if "objective" then definitely yes, always apples shall be compared with apples - if subjective then if you define properly area then apples can be compared with for example pineapples.
And always - you need clearly define goal and methodology.
A good test would be to do a matrix of different dithering techniques with different encoders. Some encoder products may have built-in dithering that can be compared to (x265 as a basic mode, and a more advanced mode triggered by --dither).

Dithering is unavoidable - whenever quantization is involved then dithering (and best with psychovisual matched noiseshaping/errorshaping) is mandatory - 10 bit solving some problems - of course modern display technology quickly reaching level where for average consumer this will be enough but human eyes are capable way more than 10 bits (depends on conditions and context somewhere between 12 and 14 bits).
this is same like ultra high frame rate - 300...600 frames per second will be key to get full immersion....
Yes, dithering is always needed when converting between color spaces. But the visible and compression impact of dithering is far greater in SDR 8-bit than in SDR 10-bit, and especially PQ 10-bit.

But my question was triggered by FS dithering - from my experience FS dither raising QP dramatically and literally stealing bits from video details.
It is a tradeoff, reducing banding at the cost of somewhat higher QPs. For lots of content and scenarios, it's a good tradeoff.

Truncation can also increase QP and ringing, as truncation can leave sharper edges that aren't DCT-friendly.

An optimal ditherer would probably have different strengths at different luma levels. And even be in-loop with the encoder so it could adjust dynamically adjust how dithering is done relative to QPs.

kolak
18th September 2021, 14:38
FS+bit of noise was my choice for high bitrate Blu-rays. Anything below 30Mbit would almost destroy your dithering efforts anyway. With todays 10bit pipelines it's not much needed though.

FranceBB
18th September 2021, 21:32
With todays 10bit pipelines it's not much needed though.

Yeah, well, generally post-processing like LUT Conversion and other kind of filtering is done in 16bit anyway, so we still kinda need something to go back to 10bit as that's how we go out to our viewers for UHD materials and Floyd Steinberg is still very much needed eheheheh



About FULL HD... well... In Italy we're still stuck with XDCAM-50 MPEG-2 8bit, so...

https://i.pinimg.com/474x/ea/03/ee/ea03ee43fe934928e0fe0ba5e3e9895f.jpg

(that's my face every time I have to encode in MPEG-2 in 2021)

kolak
18th September 2021, 21:35
This is not just Italy problem. Probably way more than half of broadcast industry is still XDCAM based.
I would never even bother doing any dithering for broadcast anyway :)

Blue_MiSfit
19th September 2021, 09:27
LOL yep, who cares? They're going to just uplink with some craptastic live encoder that's probably not even set up in a reasonable way (either due to ignorance or negligance). Broadcast is brutal when it comes to video quality.

FranceBB
20th September 2021, 13:58
LOL yep, who cares? They're going to just uplink with some craptastic live encoder that's probably not even set up in a reasonable way (either due to ignorance or negligance). Broadcast is brutal when it comes to video quality.

Well, even if the mezzanine is an MPEG-2 at 50 Mbit/s, when it goes to the live H.264 hardware encoder it has a 4s delay to encode the stream live (which can be raised up to a maximum of 20s for added complexity if it's absolutely necessary). The final H.264 25i yv12 FULL HD .ts stream has 12 Mbit/s over here.

The UHD one instead is an XAVC Intra Class 300 10bit 50p at 500 mbit/s which is encoded live by an hardware encoder in H.265 10bit at 25 Mbit/s.

I mean, it's not such a terrible quality to be fair, especially considering that all this is live. (Of course VOD has more bitrate and most importantly a better, more complex encoding, but still Satellite is still very much alive and kicking eheheheh).

Blue_MiSfit
20th September 2021, 23:20
How nice that your facility has reasonable bit budgets and probably sensibly configured hardware :)

Lyris
16th August 2023, 06:17
This thread makes me want to benchmark all the available MPEG2 options for the rare times I have to make a DVD. I'm using CCE SP3 for this, even still. Although the last time I compared encoders was in the late 2000s.

benwaggoner
22nd August 2023, 19:20
This thread makes me want to benchmark all the available MPEG2 options for the rare times I have to make a DVD. I'm using CCE SP3 for this, even still. Although the last time I compared encoders was in the late 2000s.
Campus ProCoder/Carbon Coder were really impressive back then for automated encoding without scene-by-scene optimization. I used it to good effect on some weird content types, like the first Criterion Collection Stan Brakhage DVDs.

I know that Elemental was still making good money with big MPEG-2 compression efficiency improvements within the last decade. Cables companies would pay a lot to get more channels out of fixed bandwidth.

The MIPS/pixel we can apply to MPEG-2 encoding today is >>100x than back in the Minerva days. If there was a market reason for a new MPEG-2 encoder, I bet we could squeeze out another 25% using ever more advanced preprocessing and psychovisual optimization.