PDA

View Full Version : MSU MPEG-4 AVC/ H.264 codecs comparison RELEASED!


DmitriyV2
12th December 2005, 23:12
SECOND ANNUAL MSU MPEG-4 AVC/ H.264 CODECS COMPARISON RELEASED!
(Formal comparison of new standard codecs)

Main features:
* 7 H.264 codecs was compared with last DivX.
* All H.264 codecs and there settings was received from codec developers directly for test.
* 2 presets, received from codec developers, was measured:
"Max PSNR" - maximum quality (slow)
"Max speed" - maximum speed
Speed and quality was measured in both presets.
* We measure PSNR, SSIM, VQM, Blurring, Blocking, Bitrate handling, time.
* Clean calculation time was more than 20 days on P4-2400.
* Measurement program was published (http://www.compression.ru/video/quality_measure/video_measurement_tool_en.html) for sure (and for free :)).

Was tested:
# DivX 6.0 (NOT H.264 video codec, tested as reference MPEG-4 ASP codec)
# ArcSoft H.264
# Ateme H.264
# ATI H.264
# Elecard H.264
# Fraunhofer IIS H.264
# VSS H.264
# x264

Video sequences that were used in this comparison
# “foreman” (standard sequence)
# “susi” (standard sequence)
# “BBC” (standard sequence)
# “battle” (part from "Terminator-2" movie)
# “simpsons” (part from "The Simpsons" movie)
# “Matrix” (part from "The Matrix" movie)
# “Concert” (part from HDTV movie)

Comparison page: http://compression.ru/video/codec_comparison/mpeg-4_avc_h264_2005_en.html
PDF (3.5Mb): http://compression.ru/video/codec_comparison/pdf/msu_mpeg_4_avc_h264_codec_comparison_2005_eng.pdf
PDF in ZIP (2.6Mb): http://compression.ru/video/codec_comparison/zip/msu_mpeg_4_avc_h264_codec_comparison_2005_eng.zip

In plans:
* Measurement of H.264 codecs with formal visual tests (http://www.compression.ru/video/quality_measure/perceptual_video_quality_tool_en.html).
* Usage of new metrics.
* Bigger number of codecs (we already has requisition for new codecs)

Enjoy! ;)

bond
12th December 2005, 23:19
ok before i look at it i have to point out this again to everyone:

Restrictions of codecs’ operation in this comparison:
- only codecs that can work in Main profile of H.264
- and do not use two passes

as this has been said before that doesnt let better codecs show their full potential like
1) codecs supporting high profile, like x264
2) codecs supporting 2 passes, like x264 and nero

Kopernikus
12th December 2005, 23:27
High Profile and 2Pass have also been tested, but not been compared to the codecs that dont have these features.

DmitriyV2
12th December 2005, 23:34
High Profile and 2Pass have also been tested, but not been compared to the codecs that dont have these features.
Sure.

In next comparison we will change rules: In "Maximum PSNR" any settings will be allowed, but this time we keep current rules.

bond
12th December 2005, 23:39
btw i didnt say a big thank you for doing this test @ msu!!! :)

High Profile and 2Pass have also been tested, but not been compared to the codecs that dont have these features.yes, just saw that 2pass and hp has been tested, but there is no comparison _between_ codecs using hp/2pass?

bond
12th December 2005, 23:44
In next comparison we will change rules: In "Maximum PSNR" any settings will be allowed, but this time we keep current rules.great! i think your results show overall that 2pass and hp generally bring clearly higher quality (except in the case of frauenhofer, but i guess thats a different issue ;) )

Sulik
13th December 2005, 00:01
Great comparison!
It would be also interesting in the future to see a baseline profile comparison as well, since that's what is required for iPod and other portable devices.

acidsex
13th December 2005, 00:07
Cant say I am too surprised with the results as I have been using the top 2 for a while now and it is hard to discern one from the other. I am a bit suprised that ATI looks to be as fast as it is but obviously there is somewhat of a quality hit which is to be expected.

Much thanks for performing the test for us.

Sirber
13th December 2005, 00:37
x264 revision 293..... kinda oldish :(

bond
13th December 2005, 00:59
my comments:

overall a really nice comparison, its great you tried to get all dimensions in: metrics, visual and speed
i especially like the speed/quality combinations graphs :)
i hope visual gets extended more in future comparisons (yeah i know its really hard to do ;) )
also its great you tried different metrices (tough they showed not so different results)

and of course it was great to see that you tried to combine all these findings also to one final output via a "scorecard" like approach

actually it was also nice to see that my personal tests, showing ateme and x264 to lead, were reproduced here :D


you often wrote comments about divx, which was only a reference, would be great to have more comments also about the actual avc codecs (not only the best or worst)
i already mentioned the missing 2pass/high profile

as a sidenote:
very very interesting was also imho that divx could keep up with the avc codecs on the matrix clip, imho this might be because of two things:
1) it shows divx tuned its codec for this movie (as its often used for comparisons (eg by doom9))
2) it might also show that divx is more tuned for real movies, and not these abstract test clips (i personally dont like for testing)
maybe this shows that real movie content samples should get some more attention in such tests, cause people actually use codecs not for encoding foreman, if you understand what i mean ;)

now my questions (i hope you can answer all of them, as most are really important imho):

- why wasnt apples avc codecs tested (the propably most widely hyped avc codecs)
- i know matrix1 very well as i often use it for my tests: you wrote "deinterlace" there (matrix1 is not interlaced here). for simpsons you wrote its not interlaced but my simpsons dvds are. sure you wrote that correctly?
when deinterlacing, what deinterlacer did you use? did you use the same for all encoders? what are the exact .avs scripts for all your clips (assuming you used avs)
- another thing i saw with matrix1: your resolution is 720x416 for matrix (after cropping and resizing (getting rid of anamorphic) i have 720x288 here), did you leave the anamorphic resize in? did you do an anamorphic resize in the psnr measurement? did you use any resizer anywhere? if yes, which one?
- maybe i overlooked it, but it seems you didnt write the _exact and all_ codec settings you used for each codec. so which ones did you use?
- you are also not always clear on what decoders you used? did you always use the decoders of the encoder providers (unless mentioned otherwise)? if yes, did they use post processing and if yes what kind of post processing?
did you use one decoder for all streams? if yes, which one? post processing?
if no, why didnt you use the same decoder for all?
- in the 2pass/hp graphs you used more metrices for ateme, whereas for other codecs only psnr, why?
- what are the coefficients in the "informal comparison" and why are they chosen that way?
- why didnt you feed x264 simply with the yv12 .avi? i think it handles that?

btw
- divx3 is a totally different codec than divx5/6. they share no codecbase whatsoever (in contrary to divx4-6), so listing divx3 next to divx5/6 isnt really comparable imho when wanting to show the "improvements" made with the codec over time
- there is a failure with elecard in all the "concert" graphs i think

puffpio
13th December 2005, 01:20
it's good that 2 pass and high profile were included, even if they were in a different section....what about 2 pass with high profile? that's the mode most people use, right?

Sagittaire
13th December 2005, 01:50
Very impressive and good job (I make actually Test too but for "HDTV" resolution, "DVD" resolution and "portable" resolution).

1) why make metric test only for separate Y, U, V space color ... ???
2) At this time x264 (rev 385) is certainely the best for PSNR (and I compare with ateme AVC HP full me profil -> 0.1 fps for 720p ... !!!) with new ME optimisation (trelli and bfdo) but the x264 speed developement is incredible and it's a problem for codec test

Sharktooth
13th December 2005, 04:05
wow, that's a great comparison. thanx for your effort!
however, despite the old version used, x264 scored 2nd after ateme for quality. That's a great result for both codecs and considering they now have both improved significantly (expecially x264), it's safe to say they're both the best h.264 solution available.

Revgen
13th December 2005, 04:27
x264 revision 293..... kinda oldish :(

This was the codec that was submitted by the X264 devs. It's not MSU's fault.

charleski
13th December 2005, 04:46
A truly impressive piece of work.

The Ateme encoder seems to have been excluded from the Quality/Speed tradeoff results though. What was the reason for this? I see some comments about the speed of different Ateme presets in your introduction, but the language there is a bit unclear.

One other thing: in the earlier thread it was implied that you had Ateme's High Profile encoder, is there a reason this wasn't tested? The graphs only show Main, Main 2-pass, and Main+psy (which showed little effect on PSNR, as might be expected).

*.mp4 guy
13th December 2005, 05:01
@bond

If you look at the picture of the matrix clip they used in the comparison you can see that they encoded it anamorphicly and it definately doesn't look like it was deinterlaced, maybe they meant ivtced, or maybe it was a typo. The picture from the simpsons clip also looks progressive.

Revgen
13th December 2005, 07:12
Congratulations to the X264 devs!

I looked at the results and I'm amazed that an open-source codec like X264 is a close rival to an expensive professional codec like Ateme's. What's even more amazing is that X264 has improved a whole lot since then and continues to get better.

Manao
13th December 2005, 07:27
I have some criticism :

You choose the same bitrates for all the clips, whereas since the complexity of the clips varies. That's no issue when the bitrate is reachable, but when it isn't ( and obviously, 100 kbits isn't reachable on "concert" and "bbc" ), the bitrate shouldn't be taken into account. Because when you test such bitrates, you're asking the codec to deliver something it can't, and then its behavior is not well determined. It shouldn't crash, it shouldn't do what Elecard'd did on "concert", but yet, it shouldn't either be optimal.
As Sagittaire pointed out, you separate PSNR Y, U and V. That's a real shame. For two reasons : it makes the tests harder to read ( PSNR graphs for Y, U and V are far apart, and PSNR Y << PSNR U and V, and often there's a different tradeoff for each codec, which means that some codecs are better than others on Y, and worse on U,V ), and it means that when you collect results summarizing the three graph, there's a bias toward chroma ( hint : psnr(distorsion Y + distorsion U + distorsion V) is mainly influenced by the lowest quality channel - Y obviously - whereas psnr ( distorsion Y ) + psnr ( distorsion U ) + psnr ( distorsion V ) is less influenced by Y ). So it would be better to only compute the psnr ( overall or average, btw ? ) for the frame instead of the psnr for each channel. And it would have saved you time.
Bitrate handling : I already pointed out that some bitrates should have been removed. I can also add than you chose the rate control mode that was perhaps the less suited to reach a certain bitrate. ABR is a cheap / unprecise / fast version of 2 passes after all. Furthermore, you use the mode that is not used by D9 users ( most of them use two passes ) nor professionnals ( they only care about CBR ). Finally, you discarded 2 passes from your test by saying 'some codecs don't have it'. Well, I'd say, to bad for them. Hum, why not go further : some codecs don't have RD, let's disable RD ? You see my point, 2 passes isn't a feature of the standard ( like high profile and such, which have been discarded from the main part of the test for good reason : we don't compare oranges and apples ), but a quality feature of the encoders . So by doing so, you disavantage codecs that have two passes mode.
Now, lets talk about the unofficial rating system used at the end of the test. You ask codecs developpers for two configurations : max psnr, and max speed. It has already been argued then that max speed alone was worthless : and that instead a quality / speed tradeoff should be used. You did so, but without telling the codecs developpers what was the tradeoff. How are we to offer you the correct configuration then ? That's the reason why Ateme isn't in that part of the test. We delivered two 'high speed' settings, one really fast, one a bit more balanced ( I remind you, the tradeoff wasn't known ). The balanced one was used when the really fast one was the one adapted to the test. It resulted in a bad ranking for Ateme ( which you can see, since they forgot to remove us from the score table ). You can see from the 2nd to the 6th, codecs are tied, and that using the proper configuration would have change our ranking. Another thing is that with the tradeoff chosen ( speed * 4 + bitrate handling * 2 + psnr ), a mpeg2 codec would have win the comparison. DivX should have, in any case, but they didn't provide a configuration for speed.
Adding duplicates out of a clip because the decoder doesn't handle bframes well is understandable, but you could have chosen a decoder handling them. Why not always use the JM to decode the h264 clips ( it slows, I know, but it's standard, and there's no missing frames - like with x264/ffmpeg ).


On a sidenote, the proper way to test codecs is to say 'we want XXX psnr with YYY bitrate on this ZZZ clip', then choose the settings that give the wanted score with the highest speed. The other proper way is to say 'we want XXX speed with YYY bitrate of this ZZZ clip', and choose the highest psnr settings. Any other way to test speed / psnr is biased somehow ( some more, some less, the one chosen definitely more ).

I won't deny all the efforts and goodwill you put in the comparison. It's a tremendous work ( heck 2000 + graphs, 8 codecs, 6 clips, 8 bitrates !!! ). Yet, when you came forward to expose the rules of the test, you have been told ( by users, pengvado and us at least ) that there were glitches in the way of doing the test that would make it somehow unvalid, and you decided nonetheless to go forward without taking this considerations into account. I don't understand that because otherwise, the communication between you and us ( at least, I don't know for other codecs ) have been really good ( cf some notes in the test after the PSNR curves )

Finally, I must add that your test allowed to show us that our codec needed a lot of tweaking at low bitrates. For that, we are really grateful.
One other thing: in the earlier thread it was implied that you had Ateme's High Profile encoder, is there a reason this wasn't tested? The graphs only show Main, Main 2-pass, and Main+psy (which showed little effect on PSNR, as might be expected).They forgot it ( it's understandable, once again, they did a lot of testing ). It's no big deal either, since the test was about main profile h264 codecs.That's a great result for both codecs and considering they now have both improved significantly (expecially x264)They got a version a bit fresher than the beta 2.2. So how can you know if we improved significantly ? x264, however, did ( and might have won the quality contest if the latest build was used, who knows ). But the test began 3 months ago, which is quite a long time considering the development speed of x264.

akupenguin
13th December 2005, 10:31
x264 revision 293..... kinda oldish
That was the latest vesion when the comparison started. When Doom9 compares with 385 or so, it too will be obsolete by the time it's published ;)

Doom9
13th December 2005, 11:16
well, my comparison will be done within less than two weeks so you can only whip up that many releases during that time. By the way, I never heard anything from you about my PMs even though I see that you got them. Just trying to make sure that x264 will be in the comparison. If god forbid encoding would go flawlessly for once (there are no new codecs in the main round this time so there's a tiny chance this might actually work out this time), we'll hopefully only talk about a few days until the main round results will be published.

akupenguin
13th December 2005, 11:49
I'm running test encodes now. I'll submit the settings shortly.

Sagittaire
13th December 2005, 12:44
I test with that personnaly ...

x264.exe --bframe 2 --ref 16 --mixed-refs --filter 0:0 --bitrate 1250 --pass 1 --stats "x264_stat.log" --qcomp 0.75 --ipratio 1.25 --pbratio 1.33 --analyse "all" --8x8dct --weightb --me "dia" --subme 5 --progress -o NUL IceAge-720p.avi
x264.exe --bframe 2 --ref 16 --mixed-refs --filter 0:0 --bitrate 1250 --pass 3 --stats "x264_stat.log" --qcomp 0.75 --ipratio 1.25 --pbratio 1.33 --analyse "all" --8x8dct --weightb --me "dia" --subme 5 --progress -o NUL IceAge-720p.avi
x264.exe --bframe 2 --b-rdo --ref 16 --mixed-refs --filter 0:0 --bitrate 1250 --pass 2 --stats "x264_stat.log" --qcomp 0.75 --ipratio 1.25 --pbratio 1.33 --analyse "all" --8x8dct --weightb --me "umh" --subme 6 --trellis 2 --progress -o x264HP_IceAge_720p_1250.mp4 IceAge-720p.avi

I think it's the best setting for metric but perhabs not ... ???

with same source XviD H263 q8 VHQ4 done 1250 Kbps and quality is not so bad for for H264 720p 1250 Kbps: Ice Age (http://multimediacom.free.fr/Download/x264HP_IceAge_720p_1250.mp4)

Doom9
13th December 2005, 12:49
well, you know how much I give for metrics ;) I guess I would be a huge disappointment for all the professors that thaught me various forms of mathematics during my studies (I just happened to pick the masters that has the most maths besides actual studies in mathematics and perhaps physics). If ct's professional grade metric software tells that VP3 looks rather good, then you cannot blame anyone for rather trusting their eyes.

And I see a lot of default values in these settings.. I find that very confusing because it distracts you from the things that matter (the settings that are different from the default).

Sharktooth
13th December 2005, 15:29
They got a version a bit fresher than the beta 2.2. So how can you know if we improved significantly ? x264, however, did ( and might have won the quality contest if the latest build was used, who knows ). But the test began 3 months ago, which is quite a long time considering the development speed of x264.
Well, i speak for what i get... and i dont have the "fresher" version of ateme encoder, so i cant speak for ateme (i can only suppose the new versions are improved but i dont know how much), but x264 have vastly improved in both speed and quality.

bond
14th December 2005, 00:10
well, lets not speculate, but wait for doom9s test to learn more about how the latest versions perform

DmitriyV2
27th January 2006, 16:00
2ALL:
I was in 3 trips during last month, sorry for late replay!

btw i didnt say a big thank you for doing this test @ msu!!! :)
King thanks! :)
yes, just saw that 2pass and hp has been tested, but there is no comparison _between_ codecs using hp/2pass?
We try to keep resonable general time of comparison preparation, so did not test some combinations. Maybe we will add such comparisons only for 2 or 3 codecs next time.

DmitriyV2
27th January 2006, 16:02
great! i think your results show overall that 2pass and hp generally bring clearly higher quality (except in the case of frauenhofer, but i guess thats a different issue ;) )
Yes. They send us seriously different parameters for this pass.

DmitriyV2
27th January 2006, 16:05
Great comparison!
It would be also interesting in the future to see a baseline profile comparison as well, since that's what is required for iPod and other portable devices.
Thank you!
More tests require more time, due to comparison is free we try to reduce total time. Maybe we will measure several selected codecs (like x264) with base profile and low resolution.

DmitriyV2
27th January 2006, 16:50
Cant say I am too surprised with the results as I have been using the top 2 for a while now and it is hard to discern one from the other. I am a bit suprised that ATI looks to be as fast as it is but obviously there is somewhat of a quality hit which is to be expected.
Much thanks for performing the test for us.
Welcome! We also test ATI on another comp, results are in FAQ to comparison

DmitriyV2
27th January 2006, 17:22
overall a really nice comparison, its great you tried to get all dimensions in: metrics, visual and speed
i especially like the speed/quality combinations graphs :)
Me too! :)
i hope visual gets extended more in future comparisons (yeah i know its really hard to do ;) )
We will announce visual comparison soon.

also its great you tried different metrices (tough they showed not so different results)
We plan to "rate" metrics also.

very very interesting was also imho that divx could keep up with the avc codecs on the matrix clip, imho this might be because of two things:
1) it shows divx tuned its codec for this movie (as its often used for comparisons (eg by doom9))
2) it might also show that divx is more tuned for real movies, and not these abstract test clips (i personally dont like for testing)
maybe this shows that real movie content samples should get some more attention in such tests, cause people actually use codecs not for encoding foreman, if you understand what i mean ;)
Sure. :) This month I talk with developers from very well know company (hardware manufacture), that will going to use H.264 in there hardware, and I was surprised - they use foreman, container and etc... So hope our tests will show - that is not so good way for developers. ;)

now my questions (i hope you can answer all of them, as most are really important imho):
- why wasnt apples avc codecs tested (the propably most widely hyped avc codecs)
We measure ONLY 264 codecs from developers (to clear away many questions about parameters, version selection and etc).

And there was no answer.

Now we have contact for next comparison.

- i know matrix1 very well as i often use it for my tests: you wrote "deinterlace" there (matrix1 is not interlaced here). for simpsons you wrote its not interlaced but my simpsons dvds are. sure you wrote that correctly?
when deinterlacing, what deinterlacer did you use? did you use the same for all encoders? what are the exact .avs scripts for all your clips (assuming you used avs)
We use our own matrix1 rip. We use deinterlaced sequences for all encoders and name of deinterlacer mentioned on info about sequence.

Main idea - we know, that with MSU Deinterlacer results will be better, but we especially use ordinary wide used deinterlacers.


- another thing i saw with matrix1: your resolution is 720x416 for matrix (after cropping and resizing (getting rid of anamorphic) i have 720x288 here), did you leave the anamorphic resize in? did you do an anamorphic resize in the psnr measurement? did you use any resizer anywhere? if yes, which one?

No resizing. :) We use our own rip.


- maybe i overlooked it, but it seems you didnt write the _exact and all_ codec settings you used for each codec. so which ones did you use?
We use command line versions and one clever developer note that he can use any settings inside, another that he declare. So we will change rules - next time there will be no limitations on profile and settings in "Best PSNR" section but all settings must be published.


- you are also not always clear on what decoders you used? did you always use the decoders of the encoder providers (unless mentioned otherwise)? if yes, did they use post processing and if yes what kind of post processing?
did you use one decoder for all streams? if yes, which one? post processing?
if no, why didnt you use the same decoder for all?
We use decoders from developers, if provided. This situation noted in text.


- in the 2pass/hp graphs you used more metrices for ateme, whereas for other codecs only psnr, why?
Redusing number of pages. :) Situation was the same.

- what are the coefficients in the "informal comparison" and why are they chosen that way?
See FAQ.

- why didnt you feed x264 simply with the yv12 .avi? i think it handles that?

We have several troubles, discussed with developers.

- there is a failure with elecard in all the "concert" graphs i think
Sure. We report several mistakes to developers duaring testing.

DmitriyV2
27th January 2006, 17:25
Very impressive and good job (I make actually Test too but for "HDTV" resolution, "DVD" resolution and "portable" resolution).
Thank you! :)

1) why make metric test only for separate Y, U, V space color ... ???
2) At this time x264 (rev 385) is certainely the best for PSNR (and I compare with ateme AVC HP full me profil -> 0.1 fps for 720p ... !!!) with new ME optimisation (trelli and bfdo) but the x264 speed developement is incredible and it's a problem for codec test
1) We think averaging of PSNR's distort original situation with codec.
2) We plan 3 annual comparison this year, do not worry. :)

DmitriyV2
27th January 2006, 17:27
wow, that's a great comparison. thanx for your effort!
Thanks! :)

however, despite the old version used, x264 scored 2nd after ateme for quality. That's a great result for both codecs and considering they now have both improved significantly (expecially x264), it's safe to say they're both the best h.264 solution available.
Sure.

DmitriyV2
27th January 2006, 17:31
A truly impressive piece of work.
Thank you! :)

The Ateme encoder seems to have been excluded from the Quality/Speed tradeoff results though. What was the reason for this? I see some comments about the speed of different Ateme presets in your introduction, but the language there is a bit unclear.
Reason - negotiations with Ateme developers. You can measure there speed unofficially. :)

One other thing: in the earlier thread it was implied that you had Ateme's High Profile encoder, is there a reason this wasn't tested? The graphs only show Main, Main 2-pass, and Main+psy (which showed little effect on PSNR, as might be expected).
We test only presets from developers. HP will be tested in next our comparison.

DmitriyV2
27th January 2006, 17:42
We have also a big amount of similar questions by mail, so we write some
FAQ. It last version can be found on:
http://www.compression.ru/video/codec_comparison/h264_2005_comparison_faq_en.html

I was in 3 long trips during last month, sorry again for long delay.

-----------

Second annual MPEG4-AVC/H.264 codecs comparison

Frequently Asked Questions

Q: How did you choose metric set for your comparison?

A: We used metrics, which are implemented in MSU Quality Measurement Tool. In that tool we implemented objective comparison metrics that are most commonly used.

The main metric in our comparison is PSNR, because it is used in most objective comparisons, so our results will be understandable for everybody. We will increase the number of metrics in next comparison in spite of increasing measurements time.

Q: Did you try to find mistakes in your comparison?

A: Of course we did. Before the comparison start we found reviewers for our comparison. They got draft of our report the month before public release. In exchange they send to us list of comments and mistakes in our comparison. Such exchange significantly decreased number of mistakes in our report.


Q: How did you verify objective measurements?

A: We used different ways:

First, we have published our measurement tool. Lots of people download it, use it, and some times send us bug reports (as a rule, bugs are in work with file formats, B-frames in AVI, etc). So, reliability of our tool now is much better than in previous comparisons.

Second, after completion of all measurements, we provided original sequences and full results to codec developers (only results of the developer's codec and one of the freeware reference codecs for each developer). Developers are really interested in good results for their codec and they can verify some strange results.


These are our methods to increase our results reliability.


Q: You mainly test the encoders, why don't you call your comparison "encoder comparison"?

A: We use developer's decoder if it is provided to us with encoder. It means that developers can increase their results using decoder optimization, postfiltering, etc.

In next comparison we are going to make additional decoder compatibility tests.


Q: What computers have you used for measurements?

A: You can find information about our computers' configurations on this page, or on the 7-th page of PDF document.


Q: Why did you use deinterlaced sequences?

A: It is our common policy. We chose our sources similar to sequences, which ordinary users use. It is very difficult for ordinary user to get progressive sequence nowadays. As a rule, users get sequences to compress from DVDs, satellite receivers, DV-cameras, etc. They capture that sequences in real time using popular simple embedded deinterlacing methods. Such methods along with compressing artifacts decrease encoding performance on those sequences. We think that popular codecs should take into consideration such features of sequences with the help of prefiltering and advanced motion compensation.


Q: Is it possible for developers to use information about your testing set to make better their results in your comparison?

A: Theoretically it is possible. That is why we each time replace two or three sequences with absolutely new ones, publishing them only after finish of all measurements. So, user can draw more attention on that sequences (we also track differences in results with big interest).

Q: You have used ATI graphics accelerator on your computers and ATI codec is fastest in your comparison. Don't you think it is strange?

A: We used the same computers as in previous comparison last year. No one knew about ATI codec at that moment.

Anyway, we also were very interested, if ATI codec had used any hardware acceleration. However, such acceleration does not conflict with testing rules (we compare codecs for ordinary PC machines).

We made additional tests at another computer with following configuration:
o Processor: Pentium 4, 3.0 GHz with Hyper Threading
o Operation system: Windows XP Pro, SP2
o Memory: 1Gb
o Video accelerator: nVidia GeForce 6600 GT
o Hard disk: SATA 200Gb

Measurement results ("Foreman" sequence) can be found here
http://www.compression.ru/video/codec_comparison/h264_2005_comparison_faq_en.html

The main result is that ATI theoretically can use hardware acceleration in codec, but in that case they use rather common methods, which can work with different hardware.



Q: I think, you did not attempted visual comparison; there are only few frame pictures in your comparison!

A: We apply much attention to correct visual comparison. :)

o First, by choosing frames anyone can show that any codec is better than any other! It is because the quality of different frames in decoded sequence changes significantly. You can read about such situation reasons in Introduction to Video Codecs Comparison. But people are still asking such questions; that is why we introduced per-frame metrics in our comparison to show differences in frames quality.

o Second, we developed special software MSU Perceptual Video Quality, which is created to conduct subjective blind (when experts don't know what encoder was used for current sequence) assessments using different testing methods from Butterfly-test to ITU-T BT.500-11 standard recommendations. This program is the only freeware tool for such assessments.

o Third, we are going to make number of visual subjective comparisons. First such comparison will be released soon.



Q: Why didn't you add codec X in your comparison?

A: For each codec we used presets, provided us by developers. So, only codecs, for which we were able to communicate with developers, were added to our comparison.


Q: Why did you measure High Profile separately?

A: This year, similar to previous comparison, we tested only Main Profile. More over, when we announced "Call for codecs", only one codec, according to our knowledge, could support High Profile good enough. However, when developers provided to us new version of their codecs, we discovered that at least three companies implemented High Profile.

Next year we are going to remove profile restrictions, but we will publish codec presets including used profile.


Q: Why did you use such strange rules for your informal comparison?

A: People already have sent to us a lot of suggestion to combine all results into one table. Actually, in the beginning we didn't want to make such table at all, because when one codec is best at low bitrates, another one could be better at high bitrates, third one at film material and fourth one at video conferencing. So, we can average out all results, but...

Nevertheless, we are going to improve informal comparison rues, so, you your suggestions are welcome.


Q: Why do codecs versions in your comparison are not the latest ones?

A: The thing is that comparison measurements take rather long time. But we didn't renew our codecs versions during the measurements (except critical bugs in codecs) to ensure fair conditions for all developers. So, some developers could release new version of their codec before comparison release.

In the future we are going to speed up our measurements using more productive work with developers and improving our measurement methods.


Q: Why didn't you use this type of diagrams?

A: We permanently increase both number of diagram types and number of graphs. There are seven diagram types in our last comparison. In future we are going to add some new diagram types and replace some of existing ones with improved versions. We have already started work in that direction.

Q: Why do your measurements take so much time?

A: There are number of factors, which increase measurement time:

o People often ask us to increase number of sequences to analyze codec results in new application areas. Only pure measurements time is now more than 11 days. We are going to use additional computers, but increase of sequence number will increase time of human work in any case. Currently, all our comparisons are free, so we are trying to limit human work time.
o Another tendency is to increase number of metrics. We know about drawbacks of PSNR and will add new metrics, trying to limit their number in reasonable boundaries.
o Number of codecs is also increasing. We are interested in number of codecs increase, but at the same time it increases time to report preparation.
o If we find obvious mistake in codec, we, as a rule, send a bug report to developer. Some times developers can fix that bugs and send us new version. That is correct approach in developer's point of view, but it also increases number of measurements in our comparison.
o Report verification takes lots of time, because we are trying to increase report quality as much as possible.


So, if we decrease number of measured metrics and sequences in set, prohibit developers to fix bugs and remove report verification, we will speed up our comparison in few times.


Q: When will you make new comparison?

A: In September 2006, if we will not make two comparisons in year. :)