View Full Version : PICO (Perceptual Image Codec) — the first learned codec by Apple
birdie
22nd May 2026, 12:32
https://apple.github.io/ml-pico/
Looks terrific. Classic codecs are left in the dust.
Sadly, JXL was not tested.
Zebulon84
22nd May 2026, 16:34
To my eyes there seems to be some sort of FGS and/or AI generation involved. Details are not preserved, they are reinvented. Other format are blurrier but closer to the target in many places. Look at the shape of the lipstick on the upper lip in the third image for instance.
Z2697
22nd May 2026, 16:46
People complain about AI slop images/videos and now they are gonna make everything look like AI slop :devil:
Games (DLSS 5)! Photos! Movies!
We don't need to test JXL at this point.
rwill
23rd May 2026, 11:12
Damn, that third image with the lipstick encoded by DCVC-RT, girl just looks like the RealDoll I ordered from Temu...
On a more serious note, the made up stuff by PICO are quite apparent on the left parrot in the last pictures. The pattern on the parrots cheek below the eye is quite different.
Not something I would use for medical images.
Z2697
23rd May 2026, 18:33
Maybe it'll improve with more bpp.
There's a 9+ GB "dataset" but it only contains "PICO treated" images in PNG so it's useless. (WTF...)
The effect and performance is of course pretty amazing, but this is not something I will want to use. (from what they showed on webpage and paper)
(is traditional codec really that bad at around 0.3 bpp? maybe they used extra bad parameters, considering their vvc example is often on par or worse than bpg.)
rwill
24th May 2026, 07:00
There are no extra bad parameters when encoding Intra-Only. Maybe when picking some fast preset of a production encoder but here they used the Reference Software most of the time.
The traditional codecs kind of hit a brick wall coding Intra starting with HEVC and since then there were only very small gains. So given Information Theory and stuff I guess every extra detail you see above HEVC is made up with some sort of synthesis.
PICO is impressive but probably not designed for comparing the PICO image against the original image.
Sadly, JXL was not tested.
I don't think it matter much in this case. JPEG XL was not tuned and designed for low BPP, it excels at 0.8 to 1 BPP and beyond. Which is 80%+ of today's web image being served. AV1 are much better at low bpp for a lot of usage ( there are some usage were JPEG XL is as good or better but tends to be on the lower percentage side ) and we can infer AV2 as well as VVC being even better. Not to mention it has ECM which we know is current state of art low bpp encoding. I would have actually liked to see how H.264 do in this test.
Surprisingly if we take this image test as indication of Video Codec quality it would suggest VVC being better than both AV1 and AV2, while using less time for encoding and decoding. This sort of reconfirm VVC being the most efficient codec at that compression range. I think that is going to trigger a lot of people in the AOM camp.
As others have pointed out, until these model are deterministic and actually preserve details rather than reinvent them I still think there is no place for them. Nvidia seems to have some deterministic model in mind with DLSS 5 but I haven't read much details into it other than massive backlash.
nhw_pulsar
24th May 2026, 16:38
Yes, PICO is very impressive, it has the neatness and the details, very high compression and practical speed! It seems a major milestone.
I also think that PICO image is deformed from the original image, but PICO image has a good visual quality, at least on the examples here at very high compression.
In the past I always thought it was "too good to be true" with some "hidden model data" involved, but it's codecs from Apple, Microsoft, JPEG,... to name a few... so it must be very serious products.
Cheers,
Raphael
GeoffreyA
25th May 2026, 08:01
As much as I'm against the "making-up-details" trend in compression, one's eyes can't deny that PICO performs strongly.
I did two rough tests against JXL (PICO on the right):
https://slow.pics/s/JaTxdZSU
cjxl "%%i" "out-jxl\%%~ni.jxl" -q 30 -e 10
Z2697
25th May 2026, 08:13
There's a problem that the small image they use (if as is) in the slide doesn't really give the larger blocks a good chance.
GeoffreyA
25th May 2026, 08:35
There's a problem that the small image they use (if as is) in the slide doesn't really give the larger blocks a good chance.
Yes. Also, how would PICO perform in the visually-transparent range against JXL?
birdie
25th May 2026, 19:07
A discussion on Hacker News: https://news.ycombinator.com/item?id=48256565
Z2697
26th May 2026, 06:12
A discussion on Hacker News: https://news.ycombinator.com/item?id=48256565
No one likes it LOL.
GeoffreyA
26th May 2026, 06:42
One commenter made the important point that, in the classic sense, ringing, blocking, blurring, etc., are also not part of the original but are put there by the process. Do we tolerate that form of deviation because of decades of custom? Why is it that we are against this new species of deviation? After all, lossy compression's job is to be perceptually close.
Now, I'm not in favour of it. I think the answer is that, with ordinary lossy, everything, including the degradation, is happening to the original data. It's as if we were dirtying the window and storing the result but the world outside doesn't change.
In the ML codec, there are parts where detail is being made from prior training, guided by the input. It's as if an artist were given a pencil sketch of the original and asked to complete the picture based on their skill. The result will be strikingly close, like a Dutch or Flemish painter, but it is more like art than photographic reproduction.
Z2697
26th May 2026, 06:43
Video codecs on larger images aren't that bad.
A random 4K game screenshot:
https://files.catbox.moe/q4g3jv.avif
(this is roughly 0.3 bpp am I right?)
I have AV1 here because it's more convenient on the web... But HEVC and above are similar, like rwill said, we kinda hit a brick wall.
Z2697
26th May 2026, 06:57
One commenter made the important point that, in the classic sense, ringing, blocking, blurring, etc., are also not part of the original but are put there by the process. Do we tolerate that form of deviation because of decades of custom? Why is it that we are against this new species of deviation? After all, lossy compression's job is to be perceptually close.
Now, I'm not in favour of it. I think the answer is that, with ordinary lossy, everything, including the degradation, is happening to the original data. It's as if we were dirtying the window and storing the result but the world outside doesn't change.
In the ML codec, there are parts where detail is being made from prior training, guided by the input. It's as if an artist were given a pencil sketch of the original and asked to complete the picture based on their skill. The result will be strikingly close, like a Dutch or Flemish painter, but it is more like art than photographic reproduction.
If I had to take a guess, based on some pics in the "dataset" that I randomly opened and compared, the coded image should be able to reach transparancy at higher bpp.
(it has to be a guess, there's no single one reference in it, only PICO results at different bpp)
GeoffreyA
26th May 2026, 09:32
Video codecs on larger images aren't that bad.
A random 4K game screenshot:
https://files.catbox.moe/q4g3jv.avif
(this is roughly 0.3 bpp am I right?)
I have AV1 here because it's more convenient on the web... But HEVC and above are similar, like rwill said, we kinda hit a brick wall.
JXL, 0.292 bpp, at 1080p:
https://slow.pics/s/O7v0CbHL
vBulletin® v3.8.11, Copyright ©2000-2026, vBulletin Solutions Inc.