Lossless Encoding Comparison 2024 [Archive]

View Full Version : Lossless Encoding Comparison 2024

Pages : [1] 2

FranceBB

22nd April 2024, 19:11

Hi there guys,
the other day we've been talking about lossless encoding in one of the x264 topics (https://forum.doom9.org/showthread.php?t=143748&page=3) and it made me realize that I haven't done one of those comparisons in a while, so it was worth exploring whether something changed over the last few years or not.

Source: Digital BetaCAM
Recording Date: July 13th 2006
Playback Date: April 16th 2024
VTR: Sony DVW-A500P
Capture Card: Blackmagic Decklink Studio
Video Codec: v210 720x576 25i TFF 4:2:2 BT601 10bit planar
Audio Codec: 1 track 4ch PCM 24bit 48000Hz
Container: avi

https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSa8EZHF2w6T8YtAlIBDJLLLQjWVz8fXH77nbLyA4_bsg&s

The source lasts 2h and 7 minutes and contains several kind of footage from 2006 (a few example shown below in a deinterlaced preview).

https://i.imgur.com/6UfQiBT.jpeg
https://i.imgur.com/i7mviy8.jpeg
https://i.imgur.com/iVCa8ri.jpeg
https://i.imgur.com/DVcqTpG.jpeg
https://i.imgur.com/eFTmbSV.jpeg
https://i.imgur.com/bzAzD69.jpeg
https://i.imgur.com/8IoTz0G.jpeg
https://i.imgur.com/LMF6uAR.jpeg

Lossless Encoding candidates:

- UTVideo 720x576 25i TFF 4:2:2 BT601 10bit planar
- H.264 720x576 25i TFF 4:2:2 BT601 10bit planar
- H.265 720x576 25i TFF 4:2:2 BT601 10bit planar
- FFV1 720x576 25i TFF 4:2:2 BT601 10bit planar
- HuffYUV 720x576 25i TFF 4:2:2 BT601 10bit planar

The UTVideo file has been encoded using the installed codec, while for HuffYUV the ffvhuff variation of it has been used as it's the only one that is currently maintained. As for FFV1, lavc (the FFMpeg encoder) has been used as it's the official encoder. Last but not least, H.264 and H.265 were encoded using x264 and x265 respectively using preset medium. Unfortunately, due to the fact that Lagarith is 8bit only, it couldn't be included in the tests as it wouldn't have been able to losslessly encode the original v210 source given that it's 10bit.

To make sure that every single file had been encoded losslessly, PSNR and SSIM were run against the v210 lossless source and sure enough they all returned ∞ for PSNR and 1.0000 for SSIM which means that they were all indeed losslessly encoded.

https://i.imgur.com/j7WjoPK.png

As you can see from the screenshot, we've got the following bitrates (from highest to lowest):

0) v210 332 Mbit/s (315 GB)
1) HuffYUV 99 Mbit/s (97 GB)
2) H.265 72 Mbit/s (71 GB)
3) H.264 64 Mbit/s (65 GB)
4) UTVideo 54 Mbit/s (56 GB)
5) FFV1 37 Mbit/s (39 GB)

where v210 is the source and all the other are the lossless encodes.
This is reflected in the following chart (from which I've excluded the v210 source):

https://i.imgur.com/IRdSsbb.png

where we can see HuffYUV using the most bitrate and FFV1 using the least bitrate.

We can easily collect the results in a chart (v210 is in orange as it's the source):

https://i.imgur.com/vc8impH.png

Although HuffYUV is holding on with age thanks to ffvhuff that allows it to encode things up to 16bit, it's very clear that the internal encoding tools used by the codec are beginning to show their age. Nonetheless, a reduction of the bitrate from 332 Mbit/s to 99 Mbit/s compared to v210 (which is uncompressed) isn't exactly nothing and in a 2h+ long clip it allows us to go from a whopping 315 GB to right under the triple digits mark with 97 GB. The one I didn't expect to find after it, however, was H.265. I know that x265 hasn't had its lossless encoding that much polished, however I initially didn't expect it to perform worse than x264, that was until I remembered one important thing: this is 25 interlaced top field first and as far as I know there has been very little focus on interlaced encoding in most H.265 encoders as it was slotted there in the standard at the very end. This is true for x265 as well and here we have a combination of two things that didn't have much focus in the overall x265 development: lossless encoding and interlaced encoding. This combo made x265 produce a larger file compared to x264 and what's worse is that it took a remarkable amount of time to encode compared to x264. Speaking of x264, at 64 Mbit/s it produced a file as little as 65 GB. Down the line we have UTVideo which performed remarkably well while not taking quite that long to encode and it's actually achieving the best tradeoff between encoding time and compression in my opinion at 54 Mbit/s (56 GB). Last but not least, the one that actually performed better than all the other codecs is FFV1 at as little as 37 Mbit/s (39 GB). This is not surprising given that FFV1 is a codec specifically created for this very purpose and that it's used by several national archives across the world, however it took its sweet time to encode, so although it can be useful for archivists and those who're gonna save the file on an LTO tape somewhere for the future generations to come, it might not be a practical solution for temporary lossless files that need to be moved around and re-encoded to a lossy output.

hajj_3

22nd April 2024, 21:09

you didn't try MagicYUV?

FranceBB

22nd April 2024, 22:13

Well no but only because it's not free and I don't have it installed on my system. I could have technically purchased the $25 licence but I wanted to focus on the free open source solutions readily available to everyone. :)

Emulgator

22nd April 2024, 23:12

I have bought a MagicYUV license in the past.
If I can get the sample I can try to append that comparison here.
I would guess it might be in the range of UT.

Emulgator

22nd April 2024, 23:28

Dirty test: Sidejob while running Topaz Proteus on a 11900K 21..68% CPU + 35..70% GPU (did not interrupt that 17 hrs job running)
v210 720x576 25i TFF 4:2:2 BT601 10bit v210 -> Virtual Dub -> M0Y2 (MagicYUV 10bit 422)
45.593.132KiB into 21.397.784KiB | 221Mbit/s -> 103Mbit/s (46,9%); 41612f in 63s -> 660fps encode speed SSD to SSD.
So, well, halfsize and quick, sails along Huffyuv, and yes, Huffman encoding is inside.
I am just wondering, 4:2:2 PAL-SD@10bit uncompressed: I get a bitrate of 720*576*25*20 = 207.360.000bit/s.
How come the 332Mbit/s ?

rwill

23rd April 2024, 05:32

How come the 332Mbit/s ?

10bit values in a 16bit container?

FranceBB

23rd April 2024, 06:05

10bit values in a 16bit container?

Yes that's exactly right.

Emulgator

23rd April 2024, 07:41

That clears it up. So for compression efficiency calculations one should only apply the significant bitrate.
Thanks for reminding me of FFV1. Let's see, 10-bit FFV1, same dirty conditions
45.593.132KiB into 20.256.213KiB | 221Mbit/s -> 97,4Mbit/s (44,42%); 41612f in 665s -> 62,5fps encode speed SSD to SSD.
Hm, not much to compress out of my dirty source (VHS capture with 5..8% of virgin tape.)
UT install/test to come when that Topaz task has finished here.

Now after finishing sidejob, 10bit Compression: Almost same size
UT 50s CPU 34-38% -> 832fps -> 21.420.361KiB
MagicYUV 29s CPU 66-69% -> 1434fps -> 21.397.784KiB
After some more MagicYUV passes it became obvious that SSD transfer rates broke down, SSD thermic limiting had set in.
CPU load halved.

richardpl

23rd April 2024, 09:54

FFmpeg can encode magicyuv for yuv422p source, but never bother, i'm talking to the wall.

huhn

23rd April 2024, 10:07

Although HuffYUV is holding on with age thanks to ffvhuff that allows it to encode things up to 16bit, it's very clear that the internal encoding tools used by the codec are beginning to show their age. Nonetheless, a reduction of the bitrate from 332 Mbit/s to 99 Mbit/s compared to v210 (which is uncompressed) isn't exactly nothing and in a 2h+ long clip it allows us to go from a whopping 315 GB to right under the triple digits mark with 97 GB. The one I didn't expect to find after it, however, was H.265. I know that x265 hasn't had its lossless encoding that much polished, however I initially didn't expect it to perform worse than x264, that was until I remembered one important thing: this is 25 interlaced top field first and as far as I know there has been very little focus on interlaced encoding in most H.265 encoders as it was slotted there in the standard at the very end. This is true for x265 as well and here we have a combination of two things that didn't have much focus in the overall x265 development: lossless encoding and interlaced encoding. This combo made x265 produce a larger file compared to x264 and what's worse is that it took a remarkable amount of time to encode compared to x264. Speaking of x264, at 64 Mbit/s it produced a file as little as 65 GB. Down the line we have UTVideo which performed remarkably well while not taking quite that long to encode and it's actually achieving the best tradeoff between encoding time and compression in my opinion at 54 Mbit/s (56 GB). Last but not least, the one that actually performed better than all the other codecs is FFV1 at as little as 37 Mbit/s (39 GB). This is not surprising given that FFV1 is a codec specifically created for this very purpose and that it's used by several national archives across the world, however it took its sweet time to encode, so although it can be useful for archivists and those who're gonna save the file on an LTO tape somewhere for the future generations to come, it might not be a practical solution for temporary lossless files that need to be moved around and re-encoded to a lossy output.

AFAIK there is barely anything to optimize for h265 because it doesn't even support p frames.
the source is very hard to lossless compress maybe try something clean 8 bit.
because lossless 8 bit h264 is very powerful.

Emulgator

23rd April 2024, 10:14

richardpl
FFmpeg can encode magicyuv for yuv422p source, but never bother, i'm talking to the wall.
Nice to have, thank you for your work and good that you mention it, but I did not know that it can handle 10bit ?
static av_cold int magy_encode_init(AVCodecContext *avctx)
{
MagicYUVContext *s = avctx->priv_data;
PutByteContext pb;

switch (avctx->pix_fmt) {
case AV_PIX_FMT_GBRP:
avctx->codec_tag = MKTAG('M', '8', 'R', 'G');
s->correlate = 1;
s->format = 0x65;
break;
case AV_PIX_FMT_GBRAP:
avctx->codec_tag = MKTAG('M', '8', 'R', 'A');
s->correlate = 1;
s->format = 0x66;
break;
case AV_PIX_FMT_YUV420P:
avctx->codec_tag = MKTAG('M', '8', 'Y', '0');
s->hshift[1] =
s->vshift[1] =
s->hshift[2] =
s->vshift[2] = 1;
s->format = 0x69;
break;
case AV_PIX_FMT_YUV422P:
avctx->codec_tag = MKTAG('M', '8', 'Y', '2');
s->hshift[1] =
s->hshift[2] = 1;
s->format = 0x68;
break;
case AV_PIX_FMT_YUV444P:
avctx->codec_tag = MKTAG('M', '8', 'Y', '4');
s->format = 0x67;
break;
case AV_PIX_FMT_YUVA444P:
avctx->codec_tag = MKTAG('M', '8', 'Y', 'A');
s->format = 0x6a;
break;
case AV_PIX_FMT_GRAY8:
avctx->codec_tag = MKTAG('M', '8', 'G', '0');
s->format = 0x6b;
break;

MoSal

23rd April 2024, 11:20

* How does ffv1 perform with -coder 2 -context 1? Ditto with -coder -2 -context 1 if possible.

* Why not use -f framemd5 to actually confirm losslessness and avoid doubts about possible limited precision in metrics or their implementations?

kolak

23rd April 2024, 15:10

Try ffmpeg's ffvhuff. It probably won't be efficient but it's fast.

Jamaika

23rd April 2024, 18:11

Unfortunately, due to the fact that Lagarith is 8bit only, it couldn't be included in the tests as it wouldn't have been able to losslessly encode the original v210 source given that it's 10bit.

So what codec is this? What lossless film do we think about when we have an old processed bt709 interlaced film. Are we talking about the whole progressive conversion and color retouching thing?

Despite the huge space saving (only 10% the size of the uncompressed Canon raw files) the codec is visually lossless and gives you full control over the raw data and all that dynamic range. It is 12bit RGB.

AVI/MPEG converter is very old under bt709.

https://www.youtube.com/watch?v=b1Ded-mHJAE
Who recorded it and with what equipment? I don't know. Is this a deliberate job? 2000year

https://giorgiolovecchio.com/wp-content/uploads/2020/01/50-Intermediate-Codecs-Comparison.png

FranceBB

24th April 2024, 09:31

FFmpeg can encode magicyuv

Well the fact that it has been included in FFMpeg is definitely a good thing and it makes me reconsider MagicYUV, however, when I tested it on FFMpeg N-114842-g639013aafc from the 17th of April 2024 (last time I compiled from the master) it errored out saying that it couldn't produce 10bit.

ffmpeg.exe -hide_banner -i "\\mibctvan000\Ingest\MEDIA\temp\Lossless_test_files\v210.avi" -pix_fmt yuv422p10le -c:v magicyuv -c:a copy -y "\\mibctvan000\Ingest\MEDIA\temp\Lossless_test_files\MagicYUV.avi"

pause

Incompatible pixel format 'yuv422p10le' for codec 'magicyuv', auto-selecting format 'yuv422p'

This means that unfortunately I have to keep it out for the same reason I kept out Lagarith: it cannot do 10bit and therefore it couldn't losslessly encode the source.
That being said, it's nice to see that's it's now freely available to the public now that it's part of the FFMpeg suite. :)

Try ffmpeg's ffvhuff. It probably won't be efficient but it's fast.

Yep, that's the one used here as it's the only one supporting high bit depth. Unfortunately the old huffyuv encoder was 8bit only so this was the only choice. :)

So what codec is this? What lossless film do we think about when we have an old processed bt709 interlaced film.

It's old interlaced BT601 and there's no codec technically, it's a tape.
v210 is just the codec of what the Blackmagic Decklink capture card saved as it's 10bit uncompressed, so lossless.

Are we talking about the whole progressive conversion and color retouching thing?

No, no, this is the just what was used to perform a comparison between lossless codecs and their encoders.
In the actual workflow currently in production the v210 intermediate is only a temporary file that is then indexed, bobbed, filtered, upscaled to FULL HD and re-encoded to XDCAM-50 before being stored in LTO8 tapes for archival purposes. All of this is done through Avisynth of course, the bobbing is done via QTGMC, filtering uses MDegrain via MVTools for news (temporal) and only DFTTest for sports (spatial), primaries, transfer and matrix are converted from BT601 PAL to BT709 via avsresize (i.e zlib) and the upscale is done using NNEDI3. The whole processing is done with 16bit planar precision before dithering down to 8bit with the Floyd Steinberg error diffusion to 8bit 4:2:2 to encode the final file, mux everything in mxf and deliver the result for archival.

https://i.imgur.com/0vxsBRP.png

Anyway that's beside the point of this topic, it was just an overview on why I had those sources and what I'm actually doing with them.

tormento

24th April 2024, 11:56

Last but not least, H.264 and H.265 were encoded using x264 and x265 respectively using preset medium
Preset medium is really not efficient in compression. You should try at least slow or very slow. For x265 you should also try the lossless option.

huhn

24th April 2024, 12:52

it would be more interesting if he got lossless results that can compete with x264 if keyint=1 is not needed using x265.

FranceBB

24th April 2024, 14:29

Preset medium is really not efficient in compression. You should try at least slow or very slow.

Eh, the problem is that it took me 2 days to encode the whole file with an old i7 5930K 6c/12th at --preset medium --lossless, it would have probably taken me ages to encode with --preset veryslow --lossless.

For x265 you should also try the lossless option.

Yep I used --lossless.

huhn

24th April 2024, 14:38

can you share the profile x265 used for that?

poisondeathray

24th April 2024, 15:55

I did 10bit422 some tests from an Arri Alexa a few years back

1920x1080 23.976p, 10bit422 (709,SDR) test, default settings in ffmpeg incl. gop size (lossless output verified with psnr)

x264 445Mb/s
aom-av1 447Mb/s (bloody slow)
x265 454Mb/s
ffv1 462Mb/s

Note these were long GOP - if you used ut video, magic yuv - those are intra only , or any of those encoders in intra mode - the compression rates would be worse

benwaggoner

24th April 2024, 19:37

Yes that's exactly right.
What container format did you use for the v210?

IIRC, a .y4m will encode with proper 10-bit to give a more realistic source file size.

benwaggoner

24th April 2024, 19:39

Preset medium is really not efficient in compression. You should try at least slow or very slow. For x265 you should also try the lossless option.
I've been startled the impact that --profile can have. I've seen --lossless save 10% more bits with --preset placebo than --preset veryslow. That's a much bigger compression efficiency delta than with lossy x265 encoding.

benwaggoner

24th April 2024, 19:45

AFAIK there is barely anything to optimize for h265 because it doesn't even support p frames.
the source is very hard to lossless compress maybe try something clean 8 bit.
because lossless 8 bit h264 is very powerful.
x265 absolutely supports P frames! it has the full IPBb hierarchy of H.264. The new 3.6 version added support for multiple layers of reference B-frames.

I agree interlaced 4:2:2 10-bit is not a representative sample of most real-world lossless encoding these days. That probably explains a lot of the x264/x265 gap, as H.264 supports MBAFF while HEVC only does field/frame.

For a more representative test, I'd suggest progressive 4:2:0 10-bit 720p24 (1080p is more common, but 720p is close enough for extrapolation and will encode >2x faster).

benwaggoner

24th April 2024, 19:45

can you share the profile x265 used for that?
I'd like to see the command lines used for each output, myself. It can be tricky to do apples-to-apples comparisons!

benwaggoner

24th April 2024, 19:54

it would be more interesting if he got lossless results that can compete with x264 if keyint=1 is not needed using x265.
Yeah, IDR-only makes sense for a lot of lossless scenarios, as the bitrate hit is a lot smaller than for lossy encoding, and makes encoding and decoding a lot faster.

Since a good number of the encoders being tested have speed/quality tradeoff options, it would be great to have comparisons based on:

time required to hit a specific bitrate (tuning to optimize compression efficiency)
bitrate delivered at the same encoding time (tuning to optimize encoding speed)

Comparing speed and efficiency together based on default parameters isn't all that helpful to choose what to use for a given use case.

After all, spending 20x longer to encode time to get 20% lower bitrate can be a great idea for some scenarios, and a terrible idea for others.

tormento

24th April 2024, 19:57

Nvenc can encode and DgSource can decode hevc up to 4:4:4 entirely on GPU. That’s more than enough to choose it as mezzanine file over other formats unless storage is an issue.

huhn

24th April 2024, 20:51

x265 absolutely supports P frames! it has the full IPBb hierarchy of H.264. The new 3.6 version added support for multiple layers of reference B-frames.

I agree interlaced 4:2:2 10-bit is not a representative sample of most real-world lossless encoding these days. That probably explains a lot of the x264/x265 gap, as H.264 supports MBAFF while HEVC only does field/frame.

For a more representative test, I'd suggest progressive 4:2:0 10-bit 720p24 (1080p is more common, but 720p is close enough for extrapolation and will encode >2x faster).

of cause it does. this is just about lossless where it mid not be able to do so. at least according to some sources. without that context my statement is just stupid.
but that could all be scenario based.
it may not be an option that's my issue. having an intra profile is fine farcing intra is not.
hevc has a lot of levels and profiles.

in my limited test h264 lossless stomps lossless h265 into the ground. and my guess is no P frames with the profile i used.
h265 is hands down the better codec if it is allowed to use it's compression features.
Nvenc can encode and DgSource can decode hevc up to 4:4:4 entirely on GPU. That’s more than enough to choose it as mezzanine file over other formats unless storage is an issue.
which h264 has to just 8 bit only.

Blue_MiSfit

26th April 2024, 06:26

What about JPEG2000? OpenJPEG has a competent encoder AFAIK.

The studios all chose J2k for IMF mastering for a reason I suppose? It's... not fast, but I'd be curious to see how it stacks up against x264 and FFV1

In my quick with my default animated test clip (in 1080p24 4:2:2 10 bit SDR / BT. 709) on my Ryzen 9 5900x:

FFV1
30fps encoding speed
449.45 Mbps

OpenJPEG in lossless mode
26 fps
456.847 Mbps

FFV1 only used a couple of cores, whereas OpenJPEG used about 70% CPU.

Not bad, FFV1! Decode is even funnier, FFV1 decodes real-time with 30% CPU, and OpenJPEG again needs 70%.

Man... maybe we should have used FFV1 instead of J2K for IMF archival, huh?? I guess J2K is flexible enough to be the "one" codec for these use cases, but still. Very impressive lossless performance. I totally see why it's the codec de jour of lots of archives.

excellentswordfight

26th April 2024, 10:28

Eh, the problem is that it took me 2 days to encode the whole file with an old i7 5930K 6c/12th at --preset medium --lossless, it would have probably taken me ages to encode with --preset veryslow --lossless.

Yep I used --lossless.
Preset medium is really not efficient in compression. You should try at least slow or very slow. For x265 you should also try the lossless option.
It looks the same results as I saw when I did the same test a few years back. I even got lower filesize with x264 preset fast, than x265 veryslow. My guess is that there has been very little optimization done for the lossless mode in x265 (and tbh, thats totally understandable).

My take away was that FFV1 seems like the best option for lossless encoding if speed isnt one of that primary factors. Robust, efficient, open, allows for a ton of formats, and not super slow.

tormento

26th April 2024, 10:54

My take away was that FFV1 seems like the best option for lossless encoding if speed isnt one of that primary factors. Robust, efficient, open, allows for a ton of formats, and not super slow.
I prefer to offload to GPU as much as possible. It’s faster and more energy efficient. I rarely used lossless format but my 1660 Super dealt with it very well and it’s nicely integrated in AVS+ plugins.

jpsdr

26th April 2024, 16:23

Where can you download FFV1 codec ?

EDIT
Discoverd it's included in the VDUB2 plugin made by v0lt.

benwaggoner

26th April 2024, 21:53

Nvenc can encode and DgSource can decode hevc up to 4:4:4 entirely on GPU. That’s more than enough to choose it as mezzanine file over other formats unless storage is an issue.
And since Lossless is mandatory for Main and Main10, all HEVC decoders can handle those lossless mezzanines. That's not true for H.264 or most other lossy codecs. I think AV1 may also have mandatory lossless support, but I'm not aware of anyone having tried it as SVT-AV1 doesn't support lossless except in IDR-only.

benwaggoner

26th April 2024, 23:33

What about JPEG2000? OpenJPEG has a competent encoder AFAIK.

The studios all chose J2k for IMF mastering for a reason I suppose? It's... not fast, but I'd be curious to see how it stacks up against x264 and FFV1
A big reason was that J2K was fully vetted by golden eyes and technical experts for Digital Cinema, so there was a lot of confidence in it. It also had support for 12-bit, 444, RGB and xyz, earlier than other codes. And it is fine for the task at hand in terms of compression efficiency. Encode/decode speed have been J2K's biggest drawbacks. And the promise of being able to do proxy editing using the lower resolution subbands was compelling, although didn't amount to much in practice.

In my quick with my default animated test clip (in 1080p24 4:2:2 10 bit SDR / BT. 709) on my Ryzen 9 5900x:

FFV1
30fps encoding speed
449.45 Mbps

OpenJPEG in lossless mode
26 fps
456.847 Mbps
Thank you Moore's Law! J2K speed was a big bottleneck 15 years ago.

FFV1 only used a couple of cores, whereas OpenJPEG used about 70% CPU.
That may be a multithreading limitation in FFV1, then. With lossless IDR-only encoding, each frame can be done in parallel, so it can use as many threads are as available, ultimately gated by source processing speed.

Not bad, FFV1! Decode is even funnier, FFV1 decodes real-time with 30% CPU, and OpenJPEG again needs 70%.
That is probably more reflective of the single-thread processing speed difference.

Man... maybe we should have used FFV1 instead of J2K for IMF archival, huh?? I guess J2K is flexible enough to be the "one" codec for these use cases, but still. Very impressive lossless performance. I totally see why it's the codec de jour of lots of archives.
Yeah, flexibility and familiarit made J2K the obvious first choice. I expect to see a lot of ProRes IMF in the future, which offers similar flexibility and familiarity with faster encode/decode, and often letting that be skipped when sources are already in ProRes.

benwaggoner

26th April 2024, 23:41

It looks the same results as I saw when I did the same test a few years back. I even got lower filesize with x264 preset fast, than x265 veryslow. My guess is that there has been very little optimization done for the lossless mode in x265 (and tbh, thats totally understandable).
There was actually was some optimization around lossless for x265. Among other things, it had the --cu-lossless mode which allows for mixed lossy/lossless areas in the same frame.

Can you share the full command lines you used for both x264 and x265? You results don't match mine, and I'd like to dig a little deeper on why.

Also, note that there is a big increase in compression efficiency in x265 between veryslow and placebo, for reasons I've not deep dived on. I suspect there's a specific parameter that makes up most of the difference, and hope it's possible to get most of the efficiency improvements with only some of the painful performance hit.

My take away was that FFV1 seems like the best option for lossless encoding if speed isnt one of that primary factors. Robust, efficient, open, allows for a ton of formats, and not super slow.
Yeah, pretty impressive results.

Intuitively I think HEVC should be better in a variety of ways, but is being held back by parameter configuration or broader x265 defects. I can't come up with any reason in theory why x264 should be able to match HEVC, let alone beat it by a big margin in both efficiency and performance.

I'm very appreciative of the study! Very thought provoking.

benwaggoner

26th April 2024, 23:42

I prefer to offload to GPU as much as possible. It’s faster and more energy efficient. I rarely used lossless format but my 1660 Super dealt with it very well and it’s nicely integrated in AVS+ plugins.
Do you mean you have a GPU-accelerated FFV1 decoder?

Or are you saying you prefer HEVC because it has HW GPU support?

huhn

27th April 2024, 02:14

And since Lossless is mandatory for Main and Main10, all HEVC decoders can handle those lossless mezzanines. That's not true for H.264 or most other lossy codecs. I think AV1 may also have mandatory lossless support, but I'm not aware of anyone having tried it as SVT-AV1 doesn't support lossless except in IDR-only.

all lossless files i can create are HEVC rext 8.1 so basic no hardware decoder can do that currently on PC except nvdec.

but as you said HEVC will gladly use p and even b frames for lossless.
happens when you blindly follow instruction: https://trac.ffmpeg.org/wiki/Encode/H.265
Generally, options are passed to x265 with the -x265-params argument, as in -x265-params "keyint=1:lossless=1".
and not noticing that it is an unrelated example...

-preset slow -map 0 -c:v libx265 -x265-params "keyint=1:lossless=1" "FSRCNNX2lossless.mkv"
116 sec 533mb 12 bit
-preset slow -map 0 -c:v libx265 -x265-params "lossless=1" "FSRCNNX2lossless.mkv"
76 sec i1 p17 b76 415 mb 12 bit
-preset slow -map 0 -c:v libx264 -qp 0 "FSRCNNXh26410.mkv"
10 sec i1 p95 229mb 10 bit
-preset slow -map 0 -c:v libx265 -x265-params "lossless=1:output-depth=10" "FSRCNNX2losslessplacebo10true.mkv"
76 sec i1 p17 b76 415 mb 10 bit

i guess that telling it seem to ignore the output bitdeep and is always storing the same internal bit deep.

if i switch to ffmpeg -y -vsync 0 -i 1.mkv -init_hw_device vulkan -vf format=yuv420p10le,hwupload,libplacebo=w=iw*2:h=ih*2:custom_shader_path=FSRCNNX_x2_16-0-4-1.glsl,hwdownload,format=yuv420p10le -preset slow -map 0 -c:v libx265 -x265-params "lossless=1:output-depth=10" "FSRCNNX2losslessplacebo10true.mkv"
74 sec i1 p20 b76 271 mb

maybe this is a bug?

full cmd:
ffmpeg -y -i 1.mkv -init_hw_device vulkan -vf format=yuv420p16le,hwupload,libplacebo=w=iw*2:h=ih*2:custom_shader_path=FSRCNNX_x2_16-0-4-1.glsl,hwdownload,format=yuv420p16le

this is lossless h264 should never ever win here. there is nothing subjective here where someone can get to the conclusion that h264 looks better then h265 at the same bit rate or something or way to trick PSNR by blurring the image or something.

placebo:
AVC 10 bit 220 just 9 mb difference took 67 sec
HEVC 12 bit none IDR 400 just 15 mb difference took 914 sec

8 bit test:
ffmpeg -y -vsync 0 -i 1.mkv -init_hw_device vulkan -vf format=yuv420p,hwupload,libplacebo=w=iw*2:h=ih*2:custom_shader_path=FSRCNNX_x2_16-0-4-1.glsl,hwdownload,format=yuv420p -preset slow -map 0 -c:v libx265 -x265-params "lossless=1:output-depth=8" "FSRCNNX2losslessplacebo8true.mkv"

HEVC 57 sec 149 mb i1 p22 b73
AVC 3 sec 146 mb i1 p95

my best guess is 10 bit or more HEVC lossless is bugged and is actually storing 12-16 bit lossless or what ever the internal precision is.

i let people take care of this that know what they are doing not me.

tormento

27th April 2024, 10:07

Do you mean you have a GPU-accelerated FFV1 decoder?

Or are you saying you prefer HEVC because it has HW GPU support?

The second.

benwaggoner

5th May 2024, 22:00

all lossless files i can create are HEVC rext 8.1 so basic no hardware decoder can do that currently on PC except nvdec.
Plenty of HW decoders will decode levels more than they nominally support, just at a slower speed. I did do all my testing with a RTX a6000 card, however.

but as you said HEVC will gladly use p and even b frames for lossless.
Ah, that explains the size difference!

i guess that telling it seem to ignore the output bitdeep and is always storing the same internal bit deep.
Internal frequency-domain precision of HEVC is 16-bit for Main and 32-bit for Main10 and other >8-bit profiles. The higher internal precision is why HEVC and AVC are more efficient encoding even 8-bit in 10-bit mode.

maybe this is a bug?

For apples-to-apples testing, I find it best to just output a .y4m lossless file and encode directly from that using command line encoders, so the documented parameters can be used directly and you're not reliant on whatever version is built into ffmepg. For example, x265 3.6 came out last month, but I don't believe is in ffmpeg yet.

my best guess is 10 bit or more HEVC lossless is bugged and is actually storing 12-16 bit lossless or what ever the internal precision is.
Main10 would have finer precision, but I've not seen that ever increase bitrate. I'm not sure how internal precision works in lossless mode, TBH; since lossless skips the transform stage, that might not even apply. If so, 8-bit could encoder smaller than 10-bit for 8-bit source in some scenarios.

That AVC supports MBAFF and HEVC only field/frame could account for a difference doing interlaced encoding, which would be a rare thing to want to encode lossless these days.

MoSal

6th May 2024, 00:49

For apples-to-apples testing, I find it best to just output a .y4m lossless file and encode directly from that using command line encoders, so the documented parameters can be used directly

Using the encoder's CLI directly is indeed the way to go. But keeping a possibly huge y4m lossless file around is maybe what stops people from trying this (big hindrance, or completely impossible).

This is compounded by some tools seemingly not supporting y4m piping, and insisting on an input file with a y4m extension.

I wrote "seemingly" above because a simple solution exists, mkfifo ;):

mkfifo /tmp/t.y4m
# first terminal
ffmpeg -i <input_file> -f yuv4mpegpipe - > /tmp/t.y4m
# second terminal
x265 <options> /tmp/t.y4m <output_file>

(Using two terminals to keep things simple, and see progress from both tools.)

and you're not reliant on whatever version is built into ffmepg. For example, x265 3.6 came out last month, but I don't believe is in ffmpeg yet.

huh?

There is no external library versions "built into" FFmpeg. There is no x264 or x265 specific version "in" ffmpeg (compile-time preprocessor checks may exist to support APIs of older library versions alongside newer ones).

FFmpeg is distributed in source form, and it does not vendor external libraries in any way. It interfaces with external libraries which, in the case of Linux distributions for example, are updated by the system. They are even updated independently when there are no SONAME bumps involved. Assuming no API changes which break the interface, FFmpeg will work with whatever SO version (or whatever it's called in other platforms) it linked against at build time.

The concern is not the external encoder library version. The concern is the possibility that FFmpeg may set/override/ignore/not-support some encoding parameters passed by users, and there is no way to ensure that the result you get will match what you get when you use the external encoder CLI directly.

huhn

6th May 2024, 12:50

Plenty of HW decoders will decode levels more than they nominally support, just at a slower speed. I did do all my testing with a RTX a6000 card, however.

Ah, that explains the size difference!

Internal frequency-domain precision of HEVC is 16-bit for Main and 32-bit for Main10 and other >8-bit profiles. The higher internal precision is why HEVC and AVC are more efficient encoding even 8-bit in 10-bit mode.

For apples-to-apples testing, I find it best to just output a .y4m lossless file and encode directly from that using command line encoders, so the documented parameters can be used directly and you're not reliant on whatever version is built into ffmepg. For example, x265 3.6 came out last month, but I don't believe is in ffmpeg yet.

Main10 would have finer precision, but I've not seen that ever increase bitrate. I'm not sure how internal precision works in lossless mode, TBH; since lossless skips the transform stage, that might not even apply. If so, 8-bit could encoder smaller than 10-bit for 8-bit source in some scenarios.

That AVC supports MBAFF and HEVC only field/frame could account for a difference doing interlaced encoding, which would be a rare thing to want to encode lossless these days.
my point is 10 and 12 bit HEVC have the same size from 16 bit.
this is so utterly unlikely they have to be the "same" internally and are massively bigger then AVC 10 bit but if i pipe 10 bit directly HEVC and AVC are now comparable slashing size by about a lot down to 65%.

benwaggoner

6th May 2024, 22:42

Using the encoder's CLI directly is indeed the way to go. But keeping a possibly huge y4m lossless file around is maybe what stops people from trying this (big hindrance, or completely impossible).
Having done hours of 8K 120 fps video, perhaps my sense of "huge" is a bit skewed. But a half-hour of 10-bit 420 doesn't seem particularly big.

This is compounded by some tools seemingly not supporting y4m piping, and insisting on an input file with a y4m extension.

I wrote "seemingly" above because a simple solution exists, mkfifo ;):

mkfifo /tmp/t.y4m
# first terminal
ffmpeg -i <input_file> -f yuv4mpegpipe - > /tmp/t.y4m
# second terminal
x265 <options> /tmp/t.y4m <output_file>

(Using two terminals to keep things simple, and see progress from both tools.)
Nice trick!

There is no external library versions "built into" FFmpeg. There is no x264 or x265 specific version "in" ffmpeg (compile-time preprocessor checks may exist to support APIs of older library versions alongside newer ones).
I am speaking of the version of x265 that is compiled into ffmpeg. I don't know if any of the common builds include x265 3.6 yet as it only came out a few weeks ago.

Unless you rolled your own, of course.

I don't know that 3.6 did anything impactful for lossless, however. I've not tested it for that. At a minimum there would be some performance improvements (more than 2x on ARM, more modest on Intel). On a current CPU --avx512 might improve performance some, particularly in --preset placebo.

The concern is not the external encoder library version. The concern is the possibility that FFmpeg may set/override/ignore/not-support some encoding parameters passed by users, and there is no way to ensure that the result you get will match what you get when you use the external encoder CLI directly.
There are new parameters and syntax added in 3.6, for example --rskip used to be a boolean, and now can take a 0, 1, or 2 as a parameter. I don't think any of those changes would impact lossless.

benwaggoner

6th May 2024, 22:44

my point is 10 and 12 bit HEVC have the same size from 16 bit.
Yeah, with the same 10-bit input I would expect Main10 and Main12 to have essentially the same file size, as they both use the same internal 32-bit precision.

this is so utterly unlikely they have to be the "same" internally and are massively bigger then AVC 10 bit but if i pipe 10 bit directly HEVC and AVC are now comparable slashing size by about a lot down to 65%.
Pipe versus file shouldn't make any difference! Very strange. I find piping and reading from .y4m to provide identical results in x265.

rwill

7th May 2024, 08:56

In this thread: confused people.

Not really something OpenAI can train its LLM with...

huhn

7th May 2024, 10:21

Yeah, with the same 10-bit input I would expect Main10 and Main12 to have essentially the same file size, as they both use the same internal 32-bit precision.with 16 bit input and lossless output. there is no way lossless 10 bit has the same size as lossless 12. lossless 10 bit needs less information to recreate.
and 10 bit input and 16 bit input should have about roughly the same size or 16 should be lower with lossless. the dithering option was not used so this is rounded 10 bit it is supposed to output.

for lossy i can understand atleast in part the result.

rwill

7th May 2024, 14:31

Yeah, with the same 10-bit input I would expect Main10 and Main12 to have essentially the same file size, as they both use the same internal 32-bit precision.

Can you please provide a source for that "internal 32-bit precision", I am willing to learn.

rwill

7th May 2024, 14:32

I don't know what you are all going on about when its this simple to test:

./x264 foreman_cif.yuv --profile high444 --input-depth 8 --preset veryslow --input-res 352x288 --fps 25 --qp 0 -o test.264 --output-depth 8 -A p8x8,b8x8,i8x8
-> encoded 300 frames, 157.56 fps, 10722.74 kb/s

./x264 foreman_cif.yuv --profile high444 --input-depth 8 --preset veryslow --input-res 352x288 --fps 25 --qp 0 -o test.264 --output-depth 10 -A p8x8,b8x8,i8x8
-> encoded 300 frames, 93.17 fps, 18207.77 kb/s

./x265 --input foreman_cif.yuv --profile main --input-depth 8 --preset veryslow --input-res 352x288 --fps 25 --lossless -o test.265
-> encoded 300 frames in 54.74s (5.48 fps), 10633.94 kb/s, Avg QP:4.00

./x265 --input foreman_cif.yuv --profile main10 --input-depth 8 --preset veryslow --input-res 352x288 --fps 25 --lossless -o test.265
-> encoded 300 frames in 54.97s (5.46 fps), 18111.93 kb/s, Avg QP:4.00

./x265 --input foreman_cif.yuv --profile main12 --input-depth 8 --preset veryslow --input-res 352x288 --fps 25 --lossless -o test.265
-> encoded 300 frames in 52.86s (5.67 fps), 26905.68 kb/s, Avg QP:4.00

poisondeathray

7th May 2024, 14:56

It looks like the -x265-params "output-depth=10" switch is not passed with ffmpeg and libx265 . This happens regardless of lossless=1 vs. lossy . The output encode is 12bit from 16bit input

But it works ok with x265 cli => --output-depth 10 switch is honored

I didn't test the latest git , my ffmpeg binary is about a month old, maybe it's fixed by now

huhn

7th May 2024, 17:19

@rwill
this make sure the input is 8 bit: --input-depth 8
right?

so why would the output be massively bigger?
is there well defined behaviour for these cases?
like adding the first 4 bit of the 8 bit input to the end to create accurate 12 bit?
does x264 even have an output flag: i tried to find the commit but that seem to be to old:
https://git.videolan.org/?p=x264.git;a=commit;h=71ed44c7312438fac7c5c5301e45522e57127db4

rwill

7th May 2024, 19:38

@rwill
this make sure the input is 8 bit: --input-depth 8
right?

It DEFINES that the input raw .yuv has 8 bit depth.

so why would the output be massively bigger?

For 10bit output bit depth? Because the pictures are encoded at 10bit precision and not with 8.

is there well defined behaviour for these cases?
like adding the first 4 bit of the 8 bit input to the end to create accurate 12 bit?

What? Given 8bit input depth and an output depth of 10 the 8 significant bit are just shifted up by 2 I guess. Maybe there is even correct scaling... I don't know.

does x264 even have an output flag: i tried to find the commit but that seem to be to old:
https://git.videolan.org/?p=x264.git;a=commit;h=71ed44c7312438fac7c5c5301e45522e57127db4
Output flag? Have you taken a look at the CLI fullhelp options yet? Your git link 404's btw. I see no reason to look up something like an CLI option in git anyway.

huhn

8th May 2024, 00:47

padding should cast barely anymore space.
scaling quite a bit.
and if it does stacking to 16 bit followed by dithering the space needed should explode.