Log in

View Full Version : "The first end-to-end learned machine-targeted image codec"


birdie
30th August 2021, 16:26
In this paper, we propose an image codec for machines which is neural network (NN) based and end-to-end learned. In particular, we propose a set of training strategies that address the delicate problem of balancing competing loss functions, such as computer vision task losses, image distortion losses, and rate loss. Our experimental results show that our NN-based codec outperforms the state-of-the-art Versa-tile Video Coding (VVC) standard on the object detection and instance segmentation tasks, achieving -37.87% and -32.90% of BD-rate gain, respectively, while being fast thanks to its compact size. To the best of our knowledge, this is the first end-to-end learned machine-targeted image codec.

https://arxiv.org/abs/2108.09993

Comments?

rwill
30th August 2021, 19:37
https://arxiv.org/abs/2108.09993

Comments?

"In this paper, we propose an image codec for machines"

I am not a machine.

ksec
4th September 2021, 22:16
https://arxiv.org/abs/2108.09993

Comments?

The results are certainly amazing. Especially when compared to VVC which is already the best at low bpp encode. While they state the encoding time

Our system contains only 1.5M parame- ters, and is also extremely fast: the average encoding time for a 2048 × 1024 image in the val set is around 0.15 seconds, with batch size of 1 on a single RTX 2080Ti GPU

It doesn't state its decoding time. Most ( if not all ) expect image to be decoded without any special hardware. So it needs to present its CPU only decoding time and hardware assisted ( NPU ) decoding time.

Google shared their telemetry data that ~80% of images received on Chrome has a bpp of 1+. So as it turns out, ultra low bpp image aren't actually that popular.

It is still very interesting though how much more we could push with ML.

Gravitator
5th September 2021, 05:51
Where is the encoder and decoder? Boltology.

benwaggoner
8th September 2021, 02:16
"In this paper, we propose an image codec for machines"

I am not a machine.
Yeah. Without double-blind subjective ratings, it quite likely is a machine optimizing for another machine. Can't say much about what it would look like without being able to see what it looks like.