MSU HEVC Video Codecs Comparison [2016] [Archive]

View Full Version : MSU HEVC Video Codecs Comparison [2016]

IgorC

30th August 2016, 02:38

MSU has realesed their HEVC Video Codecs Comparison (2016)

http://compression.ru/video/codec_comparison/hevc_2016/

Video Codecs that Were Tested:

HEVC
Chips&Media HEVC Encoder
Intel MSS HEVC Encoder
Kingsoft HEVC Encoder
nj265
SHBPH.265 Real time encoder
x265

Non HEVC:
nj264
x264

CruNcher

12th September 2016, 01:55

Nanjing Yunyan what ?
Frost and Sulivan Award winner for their perceptive H.264 improvements ?

A Chinese Research Institution ?

2 Chinese Codec implementations and a Korean Soc solution Core :D

Nanjing Yanyun’s HEVC/H.265 solution is 30% faster than
competitors. Conversely, the encoder can deliver 20-30% reductions
in bandwidth or bitrate for comparable latency, quality and encoder
complexity. Nanjing Yunyan’s latest AVC encoder is able to match
the performance of state of the art HEVC encoders, providing much
needed gains in compression efficiency without incurring the expense
of overhauling infrastructure or the uncertainty of patent licensing
fees.

joj

26th November 2016, 13:16

What I don't understand is why, with the stated report objective: "The main task of the comparison is to analyze
different encoders for the task of transcoding video — e.g., compressing video for personal use.", they chose SSIM of all metrics.

It's especially confounding as the site the report is published on is also selling a very expensive video comparison tool that touts VQM as a better objective quality comparison algorithm, something that today should bring little controversy I'd think.

We know SSIM doesn't correlate well with subjective quality, so why was it chosen and not something much more suited to the goal such as VQM, VQM-VFD, or even Netflix's VMAF?

I find it unfortunate and sad seeing them spend so much energy compiling a report that attempts to compare encoder quality along dimensions of visual quality, bit-rate and speed, just to have it all fall flat on the nose by choosing SSIM as the quality metric. That choice nullified the entire report given its stated objective and rendered it useless to me. It doesn't matter if things are fast or compress well if expected viewing quality is left as an unknown, which to me is pretty much what using PSNR and SSIM equate to, as they have been proven to correlate poorly to subjective estimates compared to more modern and readily available options.

CruNcher

26th November 2016, 15:25

Hmm seems nj264 uses a more complex ME

MasterNobody

26th November 2016, 15:53

We know SSIM doesn't correlate well with subjective quality, so why was it chosen and not something much more suited to the goal such as VQM, VQM-VFD, or even Netflix's VMAF?
1) Same can be said about any of current objective quality metrics (VQM, VQM-VFD and Netflix's VMAF are not exceptions); all off them show quality drop from psy-optimizations; and I wouldn't say any of them are really better than SSIM.
2) Using proprietary quality metric is bad for such reports because results can't be verified by independent parties.

So the real choice here is mostly between different PSNR and SSIM variants. Choosing from them SSIM is obviously better than PSNR. Of course having subjective blind test results would be even better but that need too much resources (time and money).

CruNcher

26th November 2016, 17:37

In those regards it's also interesting who joined V-Novas R&D Team enhancing Perseus further the inventor of the MosP Objective Metric.

http://ieeexplore.ieee.org/document/5783335/

https://openair.rgu.ac.uk/handle/10059/794

Results show that, by integrating the MOSp metric into the mode selection process, it is possible to make coding decisions based on estimated visual quality rather than mathematical error measures and to achieve visual quality gain in content that is identified as visually important by the MOSp metric

I advice everyone to read her Master Thesis it's great :)

Even if i was a little surprised about

A thesis submitted as part of the requirements for the
degree of Doctor of Philosophy
awarded by the Robert Gordon University

joj

27th November 2016, 01:31

1) Same can be said about any of current objective quality metrics (VQM, VQM-VFD and Netflix's VMAF are not exceptions); all off them show quality drop from psy-optimizations; and I wouldn't say any of them are really better than SSIM.

I was not questioning the use of objective models, which you seem to imply, but the use of one ill suited to the stated goal of the report when there are models known to be better for the purpose, which studies have shown better correlate to subjective tests.

Apparently ANSI and ITU considered VQM good enough to adopt[1] and Netflix's own tests indicated huge correlation boost using the mentioned methods[2].
Since these methods have been published and have reference implementations (at least two of them, only found a paper for VQM-VFD), I don't understand your claim that using something other than SSIM would in any way limit the possibility of reproducing the test results.

2) Using proprietary quality metric is bad for such reports because results can't be verified by independent parties.
Agreed, but what does that have to do with anything discussed here? Which model would you be prevented from using for independent verification of the claimed test results?

So the real choice here is mostly between different PSNR and SSIM variants.
How did you reach that conclusion? The whole point of my post was that we do have other options available, options that would have been more suited to the question the report stated it was trying to answer.

[1] http://www.its.bldrdoc.gov/resources/video-quality-research/vqm-faq.aspx
[2] http://techblog.netflix.com/2016/06/toward-practical-perceptual-video.html

joj

27th November 2016, 01:52

Cruncher, started skimming the paper but quickly got intrigued. You're right, it's captivating and well written. Thanks for the link!

MasterNobody

27th November 2016, 19:45

joj
Ok. Here is little test. Choose which one of the pictures is better (1 or 2): http://screenshotcomparison.com/comparison/192022
For now without reference frame.

Yups

27th November 2016, 20:00

First picture is better, more details.

CruNcher

27th November 2016, 23:50

@MasterNobody

Spatial compares of something we see only Temporarily the whole picture of is and will stay useless at certain bitrates.

MasterNobody

28th November 2016, 07:56

iwod

28th November 2016, 11:05

Tl;dr, So it is suggesting x265 is not the best encoder? I dont really care about encoding speed. I care about best quality.

CruNcher

28th November 2016, 22:42

CruNcher
My test was about base image quality metrics. And bitrate doesn't matter here as good metrics should work at all bitrates. Also most of current practical metrics don't analyze temporal changes. And no, I don't say that temporal is not important but for now don't derail this test and simply answer the question of the test. AFTER we could compare this encodes with temporal information (I will upload samples from which I have extracted this images).

Okay first both look in it's core characteristics bad as hell i couldn't watch this without getting headaches if it would stay that way in motion ;)

I will also give you a new metric to thinker with my EMOS score ;)

https://www.youtube.com/watch?v=Lafga42ZnPE

Though it seems pretty obvious that on 1 AQ or RC or both worked better in this specific case (frame).

VoodooFX

30th November 2016, 03:21

joj
Ok. Here is little test. Choose which one of the pictures is better (1 or 2): http://screenshotcomparison.com/comparison/192022
For now without reference frame.

For me 1 looks better.

MasterNobody

30th November 2016, 22:27

Looks like joj decided to ignore this test but I hope he is not going to argue that image 1 have better quality than image 2.
And now to the point of this small test. Here are both images together with reference: images.zip (https://www.datafilehost.com/d/b749ad62). Here are metrics for this images:
| Image 1 | Image 2 |
PSNR Y | 24.7652 | 24.8432 |
PSNR YUV | 26.4287 | 26.4992 |
SSIM Y | 0.8104 | 0.8064 |
SSIM YUV | 0.8275 | 0.8236 |
VQM Y | 4.6688 | 4.5224 |
VQM YUV | 2.1425 | 2.1048 |
Reminder: for PSNR and SSIM higher is better, for VQM lower is better. I don't have tools to calc VQM-VFD and VMAF metrics so calc them yourself (and may be post them in this thread). As you can see both PSNR and VQM indicates that image 2 is better and only SSIM tells that image 1 is better. I don't say that SSIM is the best metric but at least I wouldn't take for granted that some obscure metric is better (even if it used as some group standard; even less if it use trained neural networks like VMAF). So invalidating this report due the use of SSIM is not the good idea.

P.S. CruNcher, here is your headache: samples.zip (https://www.datafilehost.com/d/a391c843). Souce was as you probably already guessed: parkrun (https://media.xiph.org/video/derf/y4m/720p50_parkrun_ter.y4m) (I have used yuv version instead of y4m but original ftp-link is currently down).

Yups

1st December 2016, 22:06

Good job from SSIM then.

birdie

3rd December 2016, 01:54

No VP9, no Daala, no AV1.

Tell me again what was the point of comparing x264/x265 and a couple of unknown codecs?

CruNcher

4th December 2016, 12:36

I guess that test made especially the analysts from Frost and Sulivan happy ;)