Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Announcements and Chat > General Discussion

Reply
 
Thread Tools Search this Thread Display Modes
Old 20th June 2010, 18:33   #1  |  Link
Emanem_
Registered User
 
Join Date: Feb 2010
Posts: 15
VP8 vs x264

Hi all,

I've done a simple comparison between VP8 and x264 computing SSIM and PSNR for some fragments of HD (720p video).
It's here:
http://qpsnr.youlink.org/vp8_x264/VP8_vs_x264.html

Please let me know what you think!
Cheers,

Ps. I've posted this in the alternative codecs section as well, I hope it's ok, otherwise please close one of them and redirect to the other :-)
Emanem_ is offline   Reply With Quote
Old 20th June 2010, 18:43   #2  |  Link
LoRd_MuldeR
Software Developer
 
LoRd_MuldeR's Avatar
 
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,239
You don't show your x264 command-line explicitly. I only see "default settings". So I have to assume you didn't use "--tune psnr" or "--tune ssime", right?

If that was the case, then you had Psy-optimizations enabled (x264 default) and thus the whole PSNR/SSIM comparison is worthless, as Psy-optimizations massively hurt PSNR and SSIM.

Needless to say that metrics, such as SSIM or even PSNR, can only give a very rough idea about the perceived quality anyway...
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊

Last edited by LoRd_MuldeR; 20th June 2010 at 18:47.
LoRd_MuldeR is offline   Reply With Quote
Old 20th June 2010, 18:49   #3  |  Link
Emanem_
Registered User
 
Join Date: Feb 2010
Posts: 15
Quote:
Originally Posted by LoRd_MuldeR View Post
You don't show your x264 command-line explicitly. I only see "default settings". So I have to assume you didn't use "--tune psnr" or "--tune ssime", right?

If that was the case, then you had Psy-optimizations enabled and thus the whole PSNR/SSIM comparison is worthless, as Psy-optimizations massively hurt PSNR and SSIM.
No definitely didn't have those on.
The point is, if I leave those on, the codec will cheat trying to better adapt PSNR and SSIM instead of performing what is expected to do (i.e. trimming the data our eye should not be able to understand).

Then I used PSNR and SSIM in order to provide a number as well, not only a subjective test.

I see this flag as when the ATi drivers were detecting that quake3 was running and degraded the quality of textures in order to get better FPS...

I'm not trying to look for a better number to satisfy PSNR and/or SSIM, but to satisfy the eye, and then provide some numbers as well.

Btw if there were such optimizations for PSNR and/or SSIM for VP8, I'd leave them off as well...

Cheers,

Ps. I know that even psy optimization can't be good for everyone, that for example our sensitivity to red is different, and even shapes, ...
Emanem_ is offline   Reply With Quote
Old 20th June 2010, 18:55   #4  |  Link
LoRd_MuldeR
Software Developer
 
LoRd_MuldeR's Avatar
 
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,239
Quote:
Originally Posted by Emanem_ View Post
No definitely didn't have those on.
The point is, if I leave those on, the codec will cheat trying to better adapt PSNR and SSIM instead of performing what is expected to do (i.e. trimming the data our eye should not be able to understand).
What you say makes no sense. If you decide to use a specific metric to judge quality, but you leave options enabled the intentionally hurt those metrics, you screw up the results. x264 will not "cheat", if you pass "--tune psnr" or "--tune ssim". It simply will disable certain "optimizations" that are known to greatly improve the perceived quality, but inherently hurt quality metrics because of the way they work. This also is another prove how one can be fooled when relying to much on quality metrics. Sometimes you need to make decisions that hurt those metrics in order to get the visually improved result.

After all it's your test and I only was giving suggestions. If you want to keep a worthless and misleading test on your web-site, then it's your decision...


Quote:
Originally Posted by Emanem_ View Post
I'm not trying to look for a better number to satisfy PSNR and/or SSIM, but to satisfy the eye, and then provide some numbers as well.
Then do the visual comparison with default settings (i.e. Psy-optimizations enabled) and do the comparison of PSNR or SSIM numbers with the appropriate settings (i.e. Psy-optimizations off).
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊

Last edited by LoRd_MuldeR; 20th June 2010 at 19:01.
LoRd_MuldeR is offline   Reply With Quote
Old 20th June 2010, 19:04   #5  |  Link
Emanem_
Registered User
 
Join Date: Feb 2010
Posts: 15
Quote:
Originally Posted by LoRd_MuldeR View Post
What you say makes no sense. If decide to use a metric to judge quality, but you leave options enabled the intentionally hurt those metrics, you screw up the results. x264 will not "cheat", if you pass "--tune psnr" or "--tune ssim". It simply will disable certain "optimizations" that are known to greatly improve the perceived quality, but inherently hurt quality metrics because of the way they work. This also is another prove how one can be fooled when relying to much on quality metrics. Sometimes you need to make decisions that hurt those metrics in order to get visually improved results.

After all it's your test and I only was giving suggestions. If you want to keep a worthless and misleading test on your web-site, then it's your decision...




Then do the visual comparison with default settings (i.e. Psy-optimizations enabled) and do the comparison of numbers (PSNR or SSIM) with the appropriate settings.
First of all thanks for you suggestions.

Nevertheless probably I wasn't clear enough.
From my point of view I follow this logic:
1) A video codec has to digitally compress the video stream to provide best quality for the eye.
2) If I want to test a video codec, I have to test against the settings that video codec is supposed to be used in day to day usage.
3) I can't just provide a subjective test and I need some numbers. Again asking the codec to tune its output to satisfy a test makes the test useless.
4) As seen as a number is better than no number, I provide SSIM and PSNR. but, due to point (1) and (2) I don't turn on optimizations that would affect quality for the eye to better have a PSNR/SSIM.

Is a bit like if we have to compare trucks.
But then we have a special truck, that has an option to morph into a Ferrari.
Now, let's say we're measuring the speed of trucks on a circuit. We're testing, as trucks, those vehicles.
Everyone that would buy that truck will use it as a truck.
Would you turn Ferrari mode on just for this test?

Cheers,

Btw, to a certain extent I think we're both right.

Last edited by Emanem_; 20th June 2010 at 19:08.
Emanem_ is offline   Reply With Quote
Old 20th June 2010, 19:08   #6  |  Link
Keiyakusha
契約者
 
Keiyakusha's Avatar
 
Join Date: Jun 2008
Posts: 1,577
Is there any other metrics that can be used so the question with --tune ssim/psnr won't be valid anymore? PEVQ, ITU-T J.247?
Keiyakusha is offline   Reply With Quote
Old 20th June 2010, 19:20   #7  |  Link
Manao
Registered User
 
Join Date: Jan 2002
Location: France
Posts: 2,855
Emanem_ : I think everybody will agree on your points 1) and 2). But 1) conflicts with 3). Especially since a visual comparison with x264 at default settings and x264 at default settings + --tune psnr will visually show that the higher the psnr definitely doesn't mean the higher the visual quality.
__________________
Manao is offline   Reply With Quote
Old 20th June 2010, 19:43   #8  |  Link
LoRd_MuldeR
Software Developer
 
LoRd_MuldeR's Avatar
 
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,239
Quote:
Originally Posted by Emanem_ View Post
Is a bit like if we have to compare trucks.
But then we have a special truck, that has an option to morph into a Ferrari.
Now, let's say we're measuring the speed of trucks on a circuit. We're testing, as trucks, those vehicles.
Everyone that would buy that truck will use it as a truck.
Would you turn Ferrari mode on just for this test?
I think that analogy doesn't capture the facts. But to stay with your "car" analogy, we could think about it like this:


We want to compare cars. And we want to know: Which car drives best ???

As that is very abstract and too hard to compare, we need some "hard" numbers. So we decide to only compare the maximum speed that the cars can reach (on a straight 1/4 mile track).

Now let's assume one car has an "optimization" that greatly improves handling and thus performs extremely well in daily life, but comes at the cost of reduced maximum speed.

In our "hard" number comparison that car will perform worse, if we keep the "optimization" enabled. That's because in our comparison we harshly ignore all aspects, except for the maximum speed.

Still in real life that car would perform better than the others, because maximum speed is worthless without good handling.

Also our comparison, which we have chosen to restrict to one single aspect (maximum speed) is unfair for the one car, as it could perform better in our comparison, but we simply don't allow that.

Instead we keep the "optimization" enabled, although we know that this optimization intentionally (and for good reason) hurts the one aspect we have chosen to compare.


The proper and fair method would be: Tell all car manufacturers that we are going to compare maximum speed only. And give them the chance to setup their cars for this specific contest.
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊

Last edited by LoRd_MuldeR; 20th June 2010 at 20:07.
LoRd_MuldeR is offline   Reply With Quote
Old 20th June 2010, 20:18   #9  |  Link
Emanem_
Registered User
 
Join Date: Feb 2010
Posts: 15
Trying to answer to all in one post :P

I know, that's why I tried to be as objective as I could and compare both PSNR and SSIM.
Personally I think PSNR numbers are definitely to take with a grain of salt, and imho I prefer usually SSIM. Or better, I prefer them both.
Sometimes PSNR can capture what SSIM does and viceversa and sometimes both won't capture anything.

Nevertheless (1) and (2) conflicts with (3), but still we need an objective metric used against the product in the same way as day to day use.

Is a bit like VaR/sensitivities in finance, these are numbers that alone don't mean anything, and you have to fully analyze different aspectes of the whole to get an idea of (sort of) what's going on.

And indeed, in my analysis I try to focus when I see big gaps in PSNR and/or SSIM and then I save the frames and visually compare them.
And I watched the segments as well.

Now, I'd rather not turn on psnr and/or ssim optimizations because then if I'd run a sort-of-pyschovisual benchmark on that product, I'd see bad video, not representing the usage.

Btw I disagree on the following:
Quote:
The proper and fair method would be: Tell all car manufacturers that we are going to compare maximum speed only. And give them the chance to setup their cars for this specific contest.
I could put a rocket on a car and I'd win. Now please tell me you'd buy that car to drive it
Again we're comparing what's drives best (to use your analogy)

This is why instead of only PSNR or SSIM I decided to use them both with some subjective analysis (see some screenshots).
I am alone, I can't provide a metric like PEVQ myself, I don't have time and resources to hire a group of persons and ask them to provide evaluation.

If you guys would like to recommend any other objective metric that I can implement in C/C++ in qpsnr I'll be more than happy to extend the software and use that (other than PSNR/SSIM/subjective) to evaluate different options.

Nevertheless, if you read the whole (not too serious) analysis, I don't just say this is better because the PSNR is higher or this is c**p because the SSIM is way below 1.
I provide what I can with my means (that indeed are limited), but I think the analysis is fair.

Cheers guys,
Let me know what you think!

Ps. PSNR is being computed again RGB colorspace, while SSIM against Y (of YCbCr)... I try to cover color and shapes.
Emanem_ is offline   Reply With Quote
Old 20th June 2010, 20:34   #10  |  Link
LoRd_MuldeR
Software Developer
 
LoRd_MuldeR's Avatar
 
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,239
Quote:
Originally Posted by Emanem_ View Post
Btw I disagree on the following:

Quote:
The proper and fair method would be: Tell all car manufacturers that we are going to compare maximum speed only. And give them the chance to setup their cars for this specific contest.
I could put a rocket on a car and I'd win. Now please tell me you'd buy that car to drive it
Again we're comparing what's drives best (to use your analogy)
Now you are joking. Of course we would compare the cars as they are sold to the customers.

But, as we have have chosen to make a highly specific test, which concentrates on one single aspect and ignores all the rest, we must allow the manufacturers to adapt their cars for the test.

If we don't allow that, our test result will simply be random! That's because the "default" setup of an individual car might be more or less suitable for our specific test - just by chance.

Of course the manufacturers will only be allowed to tweak the setup of their cars. They will not be allowed to add any additional "non-standard" parts (motors, rockets, whatever) to their car.


That's also how proper Codec comparisons are done in reality:

First the testing methodology is publicly announced. For example the comparision can be announced to be about a certain metric (i.e. PSNR or SSIM) only.

Or the test can be announced to be a "visual" comparison only, performed by human subjects only.

And then, once the testing methodology is clearly defined, the Codec developers get the chance to adapt their Codec for the specific test. Finally they submit their test settings/configuration.

Of course they must use the "standard" version of their Codec, the version that will be used by the end-users too. And using special non-standard hardware isn't allowed either.
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊

Last edited by LoRd_MuldeR; 20th June 2010 at 20:42.
LoRd_MuldeR is offline   Reply With Quote
Old 20th June 2010, 20:43   #11  |  Link
Emanem_
Registered User
 
Join Date: Feb 2010
Posts: 15
Of course I was joking

Anyway, I didn't just use PSNR.
I used SSIM, PSNR and then subjective analysis as well.

Please propose any other object C/C++ implementable objective metric and I'll try to implement and run that as well.

The point of my analysis was to do a more than possible objective review of VP8 and x264, using PSNR/SSIM and actually viewing the results; what do you think about it about the conclusions?

I've quite played around VP8 quite a bit these days.
It's astoundingly slow, really slow. But apparently finally we have something that can be compared with x264.

Have a look and tell me what you think.

Cheers
Emanem_ is offline   Reply With Quote
Old 20th June 2010, 21:01   #12  |  Link
LoRd_MuldeR
Software Developer
 
LoRd_MuldeR's Avatar
 
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,239
Quote:
Originally Posted by Emanem_ View Post
Of course I was joking

Anyway, I didn't just use PSNR.
I used SSIM, PSNR and then subjective analysis as well.
That's not the problem. Doing a subjective analysis and a metric-based one, is a good idea.

The problem is that when you did the SSIM/PSNR test with x264, you tweaked it (or kept it tweaked) against PSNR/SSIM.

Consequently your results for the PSNR/SSIM test were unfair/biased.


Quote:
Originally Posted by Emanem_ View Post
Please propose any other object C/C++ implementable objective metric and I'll try to implement and run that as well.
The problem is: The one objective quality metric that perfectly predicts the perceived quality doesn't exit

We are far away from fully understanding how the HVS (human visual system) works. How can we accurately model it in a computer program then?

So whatever metric we use, it will always be a simplifying model and it will always be restricted to certain aspects, leaving out others.

That doesn't mean metrics are useless! Sometimes we must have "hard" numbers and then using metrics is the only way. But we should be aware of the limitations of such a test!

This also means: If we know that a certain optimization while being visually advantageous hurts a specific metric, we don't keep that optimization enabled for that metric.


Some interesting reading about quality metrics:
http://www.ece.uwaterloo.ca/~z70wang...ions/SPM09.pdf
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊

Last edited by LoRd_MuldeR; 20th June 2010 at 21:30.
LoRd_MuldeR is offline   Reply With Quote
Old 20th June 2010, 21:05   #13  |  Link
Emanem_
Registered User
 
Join Date: Feb 2010
Posts: 15
Btw I think default VP8 is twaeked for human eye as well
Who would encode video in VP8 if default would be tweaked for PSNR and/or SSIM?

Anyway, cool, thanks for the paper, I'll give it a look later on.

Now, back to business, what do you think about the analysis itself and VP8?

Let me know!
Cheers,
Emanem_ is offline   Reply With Quote
Old 20th June 2010, 21:23   #14  |  Link
Blue_MiSfit
Derek Prestegard IRL
 
Blue_MiSfit's Avatar
 
Join Date: Nov 2003
Location: Los Angeles
Posts: 5,957
I don't think VP8 has any psy optimization currently, If I recall correctly from Dark_Shikari's blog.

A few thoughts

1) --tune ssim and --tune psnr doesn't enable anything. It actually just DISABLES part of the encoder that would hurt the respective metrics. If you don't do this.. well you could easily come away with the impression that x264 isn't as good as it actually is!

2) SSIM and PSNR are valid in certain contexts. If you're going to do a comparison, you must use the proper tuning. However, you should always do a subjective analysis as well, with the psy optimizations turned on. You can do one optimized for metrics as well, and be blown away at how good psy optimizations are

Derek
__________________
These are all my personal statements, not those of my employer :)

Last edited by Blue_MiSfit; 20th June 2010 at 21:27.
Blue_MiSfit is offline   Reply With Quote
Old 20th June 2010, 21:30   #15  |  Link
foxyshadis
ангел смерти
 
foxyshadis's Avatar
 
Join Date: Nov 2004
Location: Lost
Posts: 9,550
Default VP8 is tweaked heavily for PSNR. That's why it wasn't well accepted here once psy opts made such huge strides in x264. Now Google is undoing that and introducing psy into VP8, at the cost of PSNR. By posting PSNR and SSIM scores, you may as well be throwing up random numbers, because they won't correspond to reality. Google is using something like CWSSIM last I heard now, actually.

You want an a truly valid objective score, use what HydrogenAudio uses: ABX. Get enough people to ABX or judge perceived quality on a 1-10 scale and statistically you have a hard metric. A computer can't judge visual quality, stop making excuses, and don't bother making comparisons if you aren't going to make useful, valid ones that require effort. It just adds to the overall noise level on the topic.
foxyshadis is offline   Reply With Quote
Old 20th June 2010, 21:39   #16  |  Link
mandarinka
Registered User
 
mandarinka's Avatar
 
Join Date: Jan 2007
Posts: 729
You gave a (potentialy big) advantage to VP8 by misconfiguring x264. Your comparison is flawed, it is as simple as that.

The metrics are used for evaluating the encoder's prowess in things like motion search and RDO, stuff that (to simplify it) comes before the encoder begins to use the tricks of psy optimizations. Something like a raw power. If you meassure PSNR or SSIM after psy optimizations do their job, you will get improper information about the thing you wanted to know. As it turns out, when x264 does the tricks to improve visual quality, the raw power as represented by PSNR suddenly seems to be much lower... (With VP8, there are no psy optimizations, so it's results won't get harmed.)

Last edited by mandarinka; 20th June 2010 at 21:48. Reason: added explanation
mandarinka is offline   Reply With Quote
Old 20th June 2010, 21:40   #17  |  Link
ricardo.santos
Registered User
 
ricardo.santos's Avatar
 
Join Date: Mar 2005
Location: Portugal
Posts: 908
Please correct me if im wrong, im not an expert on PSNR etc etc but i see that you're trying to test the quality of both encoders and i think you missed soomething very importante:

x264 2k Bitrate: 1556
VP8 2k Bitrate: 2378

x264 2k Bitrate: 2030
VP8 2k Bitrate: 2614

Whatever results those tests show, VP8 has a higher bitrate and that to me means better quality... without knowing you're "cheating" the test.

i've come across that bug with ffmpeg (bitrate, oversizing), so to test both encoders i used Nic's ivfenc version with avs support and tested both through the avs with same bitrate.

Last edited by ricardo.santos; 20th June 2010 at 21:48.
ricardo.santos is offline   Reply With Quote
Old 20th June 2010, 21:58   #18  |  Link
poisondeathray
Registered User
 
Join Date: Sep 2007
Posts: 5,017
nice catch ricardo! did you ever get that sorted out (the bitrate issue for vp8 using ffmpeg version?)

Last edited by poisondeathray; 20th June 2010 at 22:04.
poisondeathray is offline   Reply With Quote
Old 20th June 2010, 22:07   #19  |  Link
ricardo.santos
Registered User
 
ricardo.santos's Avatar
 
Join Date: Mar 2005
Location: Portugal
Posts: 908
Hi poisondeathray, no i couldnt find a fix, the only way i could get the exact bitrate with VP8 was to use nic's ivfenc avisynth version.

I suspect a few "good vp8 reviews" are made with ffmpeg and people comment about good quality but are unaware of the bitrate issue.

Last edited by ricardo.santos; 20th June 2010 at 22:09.
ricardo.santos is offline   Reply With Quote
Old 21st June 2010, 00:24   #20  |  Link
Guest
Guest
 
Join Date: Jan 2002
Posts: 21,907
Quote:
Originally Posted by Emanem_ View Post
I've posted this in the alternative codecs section as well, I hope it's ok
Not OK. Read and follow our forum rules!
Guest is offline   Reply With Quote
Reply

Tags
comparison, psnr, ssim, vp8, x264

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 17:34.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, vBulletin Solutions Inc.