PDA

View Full Version : H.264 I-frames for still images?


benwaggoner
19th November 2008, 18:02
Apropos of nothing in particular...

So, as we see H.264 becoming more and more common for web use, I was wondering if anyone has does any experimentation with using H.264 High Profile I-frames as a still image format. In-loop deblocking, variable block size, and CABAC seem like they should make for much better quality at lower bitrates.

And what would one use as a file format for that? Just a normal MPEG-4 wrapper with just a single frame in it? Too much overhead?

Dark Shikari
19th November 2008, 18:32
Oh, it works, and IMO is pretty far superior to JPEG-2000 as well.

The only problem is that, like JPEG-2000, its heavily patent-encumbered, and nobody wants to use an image format that makes the browser cost money...

The only thing I can think of is to use Flash in some way to display it.

Sergey A. Sablin
19th November 2008, 18:36
Apropos of nothing in particular...

So, as we see H.264 becoming more and more common for web use, I was wondering if anyone has does any experimentation with using H.264 High Profile I-frames as a still image format. In-loop deblocking, variable block size, and CABAC seem like they should make for much better quality at lower bitrates.

there are lots of comparisons out there. here is one:
http://www.scribd.com/doc/2581731/H264AVC-intra-coding-and-JPEG-2000-comparison

And what would one use as a file format for that? Just a normal MPEG-4 wrapper with just a single frame in it? Too much overhead?

elementary stream? why one would want more?

benwaggoner
19th November 2008, 18:50
Oh, it works, and IMO is pretty far superior to JPEG-2000 as well.
Do you have any data/examples?

The only problem is that, like JPEG-2000, its heavily patent-encumbered, and nobody wants to use an image format that makes the browser cost money...
Sure. But for platforms/browsers/devices that already have a H.264 decoder...

Dark Shikari
19th November 2008, 18:54
Sure. But for platforms/browsers/devices that already have a H.264 decoder...Do we really want to use Flash to display images?

I mean really? :p
Do you have any data/examples?There's some papers floating around.

If someone can point me to a competitive JPEG-2000 encoder, I could make my own test.

benwaggoner
19th November 2008, 19:10
Do we really want to use Flash to display images?

I mean really? :p
Certainly not Flash in particular. It'd probably only make sense once you've committed to displaying somewhere that there's already going to be a H.264 decoder, and file size matters a lot.

There's some papers floating around.
Anything you can point me towards?

How would one make one of these anyway? x264 with just a single frame?

If someone can point me to a competitive JPEG-2000 encoder, I could make my own test.
There's ones built into QuickTime and Photoshop. No idea if they're competitive or not.

Dark Shikari
19th November 2008, 19:40
Anything you can point me towards?Not really, I don't keep track of papers. Google Scholar is probably your friend though.
How would one make one of these anyway? x264 with just a single frame?Yup, with subme9 and 8x8dct (and psy-wise, probably psy-rd/trellis, and maybe lower deblock).

akupenguin
19th November 2008, 23:19
If someone can point me to a competitive JPEG-2000 encoder, I could make my own test.

MSU (http://compression.ru/video/codec_comparison/jpeg2000_codecs_comparison_en.html) prefers ACDSee.

If you want open source, there's only Jasper or OpenJPEG, which are perfectly interchangeable down to individual artifacts.

Esurnir
19th November 2008, 23:30
I did the test. and the resulting file can't be read by any software I know, not elecard stream analyser, not by dgavcindex nothing. I used imagesource("blabla.jpg,0,0)

Esurnir
19th November 2008, 23:37
however vs a run of a mil jpeg the compression is outstanding. For jpeg2000 the photoshop 2000 plugin I had was more than dubious quality wise

jethro
19th November 2008, 23:50
MSU (http://compression.ru/video/codec_comparison/jpeg2000_codecs_comparison_en.html) prefers ACDSee.

If you want open source, there's only Jasper or OpenJPEG, which are perfectly interchangeable down to individual artifacts.

I like kakadu jpeg2000 (good at preserving small details and very fast).

http://www.kakadusoftware.com/ (http://www.kakadusoftware.com/)


http://i38.tinypic.com/29mlmpc.png
Kakadu jpeg2000 49,5 KB (50775 bytes)
/*Batchenc*/
kdu_compress -i <infile> -o <outfile.jp2> -rate 1.32

Dark Shikari
20th November 2008, 00:23
Here's your comparison. ACDSee Pro 2.5 used for JPG and JPEG2K encoding. Target compression ratio: 10:1. Source: Touhou.

http://i38.tinypic.com/x534bc.png
Source

http://i38.tinypic.com/29bo6bt.png
x264 (--8x8dct --trellis 2 --subme 9 --psy-rd 1.0:0.1 --crf 27.94)

http://i35.tinypic.com/mr4b9g.png
JPEG2K (target filesize: 50K)

http://i37.tinypic.com/256btw9.jpg
JPG (quality 32)

Download the x264 and JPEG2K versions (http://www.mediafire.com/?wimmddynuvz).

Edit: Alternate x264 version that (IMO) looks better:

http://i36.tinypic.com/2djtrmx.png
(--8x8dct --deblock -3:-3 --trellis 2 --subme 9 --psy-rd 2.5:0.3 --crf 29.1) (download (http://www.mediafire.com/?0tnux0lywlm))

cyberbeing
20th November 2008, 02:13
From the looks of it, x264 retains better luminance detail but loses a lot of chroma detail when compared to JPEG2000. This lack of color detail in the x264 images is likely what causes the x264 PNGs to always be smaller then the JPEG2000 PNGs. Overall the chroma differences are not very noticeable when viewed at 100%, and the luminance advantage seems to make the x264 images look nicer.

I did a test with a 4592x3056 somewhat noisy image (processed from RAW) of my cat that I took the other day and here are the resulting images:
1.18MB cat test http://www.mediafire.com/?2eldnotlojt 54.1MB
3.29MB cat test http://www.mediafire.com/?nynzizyn5hu 69.8MB
The archives above are large because I included a PNG of each image and source as well.
1.18MB & 3.29MB test w/o PNG http://www.mediafire.com/?mmzezx0zzht 31.2MB

x264 had pretty much all settings maxed (-3,-3 deblock on 3.29MB, 0,0 deblock on 1.18MB), jpeg2000 made with ACD7.0 (used in the MSU link above), jpeg made with Photoshop CS4.

Also for some reason whenever I tried to encode only 1 frame x264 encoded 2. No idea why. I removed the extra frame by splitting the MKV.

4592x3056 top down view of a pizza.
1.38MB pizza test http://www.mediafire.com/?oumkotdy2qt 64.5MB
x264 with 0,0 deblock.

All of the x264 encodes were decoded with ffmpegsource 2.0b3 and saved to BMP with VirtualDub.

Edit: Adobe's JPEG2000 included with Photoshop CS4 (settings: 1024x1024 tiles, float wavelet) in my opinion seems to give considerably better quality when compared to ACDSee 7.0 and kakadu (kakadu was worse than ACDsee on my images). I've uploaded the the jp2's produced by Photoshop CS4 here: http://www.mediafire.com/download.php?0jz1zzximmo (5.86MB). x264 still seems to win on luminance detail but it's closer now.

Edit2: Microsoft's HDPhoto Photoshop plug-in (set to 4:2:0 sampling to match x264) seems much better then Adobe's JPEG2000 and seems very close to x264, but x264 seems to still have a slight edge in noisy areas. http://www.mediafire.com/?fmqmm2yk4zj (5.7MB)

akupenguin
21st November 2008, 09:08
Snow (http://akuvian.org/images/touhou11x534bc.snow.nut) (decoded png (http://i36.tinypic.com/2cep7rb.png)). Same wavelet, better entropy coder and/or quantization.
(ffmpeg -vcodec snow -qscale 3.9)

btw, progressive jpeg (http://akuvian.org/images/touhou11x534bc.jpg) saves 5% (as usual).

shark37
1st December 2008, 21:17
Hi, all!

I'd like to go step further and compute Y PSNR and SSIM metrics for posted samples. What color transform should I use to restore luma values back from RGB data incorporated in .PNG files.

Thanks!

LoRd_MuldeR
1st December 2008, 21:21
I'd like to go step further and compute Y PSNR and SSIM metrics for posted samples.

You know that PSNR and SSIM don't say anything about perceived quality?

shark37
1st December 2008, 21:30
You know that PSNR and SSIM don't say anything about perceived quality?
I would consider "anything" rather strong claim :D

EDIT: In fact I have made some preliminary calculations of PSNR values and found that objective figures correlate quite fine with perceived quality.

LoRd_MuldeR
1st December 2008, 21:34
I would consider "anything" rather strong claim :D

At least it will be useless if you want to compare compressors that use Psy optimization, such x264's Psy RDO and Psy Trellis.

These will improve perceived quality significantly, but will hurt metrics like PSNR or SSIM ...

In fact I have made some preliminary calculations of PSNR values and found that objective figures correlate quite fine with perceived quality.

The development of x264 has shown the opposite. As soon as Psy optimizations are involved, metrics become useless ...

Dark Shikari
2nd December 2008, 00:44
EDIT: In fact I have made some preliminary calculations of PSNR values and found that objective figures correlate quite fine with perceived quality.PSNR correlates perfectly well with perceived quality when you're dealing solely with encoders that are optimizing for PSNR. That is, optimizing for PSNR is almost always better than optimizing for nothing at all. That's why tests that measure how well a metric correlates with perceived quality don't use video/image encoders to decrease the image quality.

Once you hit psy optimizations, you can easily get cases where 35db looks better than 39db.

shark37
2nd December 2008, 19:38
Sorry, guys, for bothering you with color conversion issue. It appears that the only tool I need is MSU Video Quality Measurement Tool 1.52 (plus of course one of the PNG to BMP converters)

Here we go

Touhou source

+=========+==============+==============+
| Sample | Y PSNR | SSIM |
+=========+==============+==============+
ACDSee 33.47667 0.92316
kakadu 31.82535 0.93912
x264 (1) 35.27010 0.95864
x264 (2) 34.35264 0.95064
Snow 36.05429 0.95594

Dark Shikari
2nd December 2008, 22:55
Sorry, guys, for bothering you with color conversion issue. It appears that the only tool I need is MSU Video Quality Measurement Tool 1.52 (plus of course one of the PNG to BMP converters)

Here we go

Touhou source

+=========+==============+==============+
| Sample | Y PSNR | SSIM |
+=========+==============+==============+
ACDSee 33.47667 0.92316
kakadu 31.82535 0.93912
x264 (1) 35.27010 0.95864
x264 (2) 34.35264 0.95064
Snow 36.05429 0.95594 If you want to compare SSIM or PSNR, you should probably make sure you're comparing encodes without psyopts. Well, at least for PSNR. AQ helps SSIM.

shark37
2nd December 2008, 23:44
If you want to compare SSIM or PSNR, you should probably make sure you're comparing encodes without psyopts. Well, at least for PSNR. AQ helps SSIM.

Thanx for commenting, DS

I do not reserve any implications from the mesurements. I just wanted to squeese some more info from the submitted material. I must say it was very interesting to spot the differences in touhou samples.

Let me continue:

cyberbeing's cat image samples 4592x3056 with target size 1.18 MB
+==============+=================+==============+
| Sample | Y PSNR | SSIM |
+==============+=================+==============+
ACDSee 7.0 42.964 0.96530 J2000-1.18MB.png
x264 41.699 0.96361 x264-1.18MB.png
Photoshop CS4 41.758 0.96033 photoshopJ2000_cat_1.18MB.png
JPEG 41.445 0.95893 jpeg-1.18MB.png
EDIT (06-DEC-08): Added JPEG results for cat image

cyberbeing
3rd December 2008, 21:58
For some reason I'm not surprised that x264 and Photoshop produces images with a lower PSNR and SSIM because it seems like Photoshop (from the looks of it) as well as x264 (as we know) use psy optimizations. The ACDSee 7.0 image is very soft but it seem like that is what these metrics prefer. From a visual point of view x264 and Photoshop produce images with better detail and noise retention.

shark37
6th December 2008, 11:19
cyberbeing's pizza image samples 4592x3056 with target size 1.38 MB
+==============+=================+==============+
| Sample | Y PSNR | SSIM |
+==============+=================+==============+
ACDSee 7.0 41.230 0.96084 pizzaj2k1.38.png
x264 40.382 0.96402 pizzax2641.38.png
Photoshop CS4 40.327 0.95530 photoshopJ2000_pizza_1.38MB.jp2 decoded by IrfanView 4.20
Photoshop CS4 40.305 0.95511 photoshopJ2000_pizza_1.38MB.jp2 decoded by ACDSee 7.0
JPEG 39.683 0.95696 pizzajpeg.png

cyberbeing
6th December 2008, 12:01
shark37, were you planning to calculate the PSNR and SSIM for the HDPhoto versions as well for comparisons sake?

shark37
6th December 2008, 13:34
shark37, were you planning to calculate the PSNR and SSIM for the HDPhoto versions as well for comparisons sake?

Hi, cyberbeing!

I was being sure I would unable to decode them, but by surprise IrfanView 4.20 appeared HDPhoto ready:)

So later today I will, but anticipate inferior performance to both modern rivals based on this:
"A comparative study of color image compression standards using perceptually driven quality metrics" (http://infoscience.epfl.ch/record/125933)
Francesca De Simone, Daniele Ticca, Frederic Dufaux, Michael Ansorge, Touradj Ebrahimi, SPIE Optics and Photonics, Applications of Digital Image Processing XXXI, 2008
"VISUAL QUALITY IMPROVEMENT TECHNIQUES OF HDPHOTO/JPEG-XR" (http://www.mediafire.com/?ezz1uejmykt)
Thomas Richter, ICIP2008

Cheers!

Dark Shikari
6th December 2008, 13:36
From what I read HD Photo is pretty close to H.264 with all i16x16 blocks... no adaptive transform, no intra pred.

Comatose
6th December 2008, 19:07
This would be nice if browsers came to support it. It could also be a much needed replacement for GIF if it was also used as an animated format.

shark37
6th December 2008, 23:49
-SUMMARY-

cyberbeing's cat image samples 4592x3056 with target size 1.18 MB
+==============+=================+==============+===========+=============+
| Sample | Y PSNR | SSIM | File Size | File Name |
+==============+=================+==============+===========+=============+
ACDSee 7.0 42.964 0.96530 1,239,177 J2000-1.18MB.jp2
Photoshop CS4 41.758 0.96033 1,242,396 photoshopJ2000_cat_1.18MB.jp2
x264 41.699 0.96361 1,237,473 x264-1.18MB.mkv
HDPhoto 42.297 0.96057 1,252,146 HDPhoto_cat_1.19M.wdp
JPEG 41.445 0.95893 1,244,590 jpeg-1.18MB.jpg


cyberbeing's cat image samples 4592x3056 with target size 3.29 MB
+==============+=================+==============+===========+=============+
| Sample | Y PSNR | SSIM | File Size | File Name |
+==============+=================+==============+===========+=============+
ACDSee 7.0 48.283 0.98931 3,508,426 jpeg2000.jp2
Photoshop CS4 44.953 0.98391 3,455,701 photoshopJ2000_cat_3.29MB.jp2
x264 47.574 0.98999 3,452,985 x264image.mkv
HDPhoto 48.180 0.98933 3,372,526 HDPhoto_cat_3.21M.wdp
JPEG 44.267 0.97787 3,446,101 jpeg.jpg


cyberbeing's pizza image samples 4592x3056 with target size 1.38 MB
+==============+=================+==============+===========+=============+
| Sample | Y PSNR | SSIM | File Size | File Name |
+==============+=================+==============+===========+=============+
ACDSee 7.0 41.230 0.96084 1,450,081 j2kpizza.jp2
Photoshop CS4 40.327 0.95530 1,450,118 photoshopJ2000_pizza_1.38MB.jp2 decoded by IrfanView 4.20
Photoshop CS4 40.305 0.95511 1,450,118 photoshopJ2000_pizza_1.38MB.jp2 decoded by ACDSee 7.0
x264 40.382 0.96402 1,444,829 x264pizza.mkv
HDPhoto 40.887 0.96020 1,423,731 HDPhoto_pizza_1.35M.wdp
JPEG 39.683 0.95696 1,444,535 pizzajpeg.jpg


Note: Please, pay attention to substantial undershoot for 'HDPhoto_cat_3.21M.wdp'

cyberbeing
7th December 2008, 00:17
Note: Please, pay attention to substantial undershoot for 'HDPhoto_cat_3.21M.wdp'
Yeah, that was as close as I could get it. It was either 0.08MB too small or something like 0.14MB too big so I ended up going with the small one since it was closer to the target size. Same issue with the others (to a lesser extent), considering you can't specify file size with the Photoshop HDPhoto plugin.

roozhou
9th December 2008, 16:48
Have anyone tried Snow? In my own test when compression ratio reaches 1:50, snow looks better than jp2k and x264.

shark37
9th December 2008, 17:09
Snow (http://akuvian.org/images/touhou11x534bc.snow.nut) (decoded png (http://i36.tinypic.com/2cep7rb.png)). Same wavelet, better entropy coder and/or quantization.
(ffmpeg -vcodec snow -qscale 3.9)
Have anyone tried Snow? In my own test when compression ratio reaches 1:50, snow looks better than jp2k and x264.

Hey, aku and roozhou!

Could you plz post your command lines for snow encoding?

Cheers!

roozhou
9th December 2008, 17:15
Hey, aku and roozhou!

Could you plz post your command lines for snow encoding?

Cheers!

So far I found that no parameters have impact on final output except qscale and v4mv. And remember to use nut instead avi as container for snow since avi brings significant overhead.

shark37
9th December 2008, 17:23
So far I found that no parameters have impact on final output except qscale and v4mv. And remember to use nut instead avi as container for snow since avi brings significant overhead.

Thanks
Could you please be more specific?

roozhou
9th December 2008, 17:32
mencoder mf://xx%d.png -vf format=yv12 -ovc lavc -lavcopts vcodec=snow:vstrict=-2:qscale=3:v4mv -of lavf -o xxx.nut

shark37
9th December 2008, 17:39
mencoder mf://xx%d.png -vf format=yv12 -ovc lavc -lavcopts vcodec=snow:vstrict=-2:qscale=3:v4mv -of lavf -o xxx.nut
:thanks:

Mr VacBob
11th December 2008, 10:13
It's 'vqscale', not qscale. IIRC 3 is actually a rather high qp for Snow and 1.5 is a lower one, but I haven't touched it in a while.

shark37
11th December 2008, 21:48
It's 'vqscale', not qscale. IIRC 3 is actually a rather high qp for Snow and 1.5 is a lower one, but I haven't touched it in a while.

Mr VacBob, thanx for correction.
I still can't reproduce aku's snow encoding results. I realize that it's kind of OT, but I'm having tons of problems trying to get mencoder and ffmpeg to work. Unfortunately the present build of MeGUI does not start properly on my system so this alternative is unavailable for me.

So here we go.
My (naive) encoding/decoding command lines:
ENCODING:
mencoder mf://touhou.png -mf type=png -vf format=yv12 -ovc lavc -lavcopts vcodec=snow:vstrict=-2:vqscale=3.9 -of lavf -o my_snow.nut
Filesize = 48, 169
(For comparison filesize of aku's 'touhou11x534bc.snow.nut' = 51,478)
DECODING:
ffmpeg -i my_snow.nut -f image2 my_snow.bmp
PSNR=34.86
DECODING:
ffmpeg -i my_snow.nut -f image2 my_snow.png
PSNR=34.97
DECODING:
ffmpeg -i touhou11x534bc.snow.nut -f image2 touhou11x534bc.snow.bmp
PSNR=27.73 (apparent bug in processing of nut container)

I think that the difference is in the rgb to yv12 color space conversion filters used.

Programs used:
FFmpeg version SVN-r15815, Copyright (c) 2000-2008 Fabrice Bellard, et al.
MEncoder Sherpya-SVN-r27811-4.2.5 (C) 2000-2008 MPlayer Team
OS: Win XP SP2

Any comments and suggestions?

Cheers!

roozhou
12th December 2008, 05:06
I think swscale uses BT 709 if no matrix information is retrieved from decoder.

shark37
12th December 2008, 21:49
I think swscale uses BT 709 if no matrix information is retrieved from decoder.
Thanx, roozhou,
I keep learning ffmpeg options :)

@akupenguin

Sensation! :p
>ffmpeg -i touhou11x534bc.snow.nut -f image2 -vcodec png -pix_fmt gray touhou.snow.png
FFmpeg version SVN-r15815, Copyright (c) 2000-2008 Fabrice Bellard, et al.
configuration: --enable-memalign-hack --enable-postproc --enable-swscale --enable-gpl --enable-libfaac --enable-libfaad --enable-libgsm --
enable-libmp3lame --enable-libvorbis --enable-libtheora --enable-libx264 --enable-libxvid --disable-ffserver --disable-vhook --enable-avisyn
th --enable-pthreads
libavutil 49.12. 0 / 49.12. 0
libavcodec 52. 3. 0 / 52. 3. 0
libavformat 52.23. 1 / 52.23. 1
libavdevice 52. 1. 0 / 52. 1. 0
libswscale 0. 6. 1 / 0. 6. 1
libpostproc 51. 2. 0 / 51. 2. 0
built on Nov 13 2008 10:28:29, gcc: 4.2.4 (TDM-1 for MinGW)
[nut @ 0x3daad0]no index at the end
Input #0, nut, from 'touhou.snow.nut':
Duration: N/A, start: 0.000000, bitrate: N/A
Stream #0.0: Video: snow, yuv420p, 640x480, 25.00 tb(r)
Output #0, image2, to 'touhou.snow.png':
Stream #0.0: Video: png, gray, 640x480, q=2-31, 200 kb/s, 25.00 tb(c)
Stream mapping:
Stream #0.0 -> #0.0
Press [q] to stop encoding
frame= 1 fps= 0 q=0.0 Lsize= -0.kB time=0.04 bitrate= -4.4kbits/s
video:236kB audio:0kB global headers:0kB muxing overhead -100.009104%

Y PSNR = 36.52
SSIM = 0.95724

Still dunno though how to reproduce uploaded .png and .nut altogether :-(