VMAF - Video Multi-Method Assessment Fusion [Archive]

HolyWu

7th November 2018, 11:41

https://github.com/HomeOfVapourSynthEvolution/VapourSynth-VMAF/

poisondeathray

7th November 2018, 16:11

Thanks

lansing

7th November 2018, 16:37

I'm confused, what is this filter supposed to do?

ChaosKing

7th November 2018, 16:52

It calculates a score.... You have 2 clips, the score tells you by how much the the seconds clips differs from the first one. The VMAF algo tries take the human perception into account. https://en.wikipedia.org/wiki/Video_Multimethod_Assessment_Fusion

Selur

7th November 2018, 20:07

Nice thanks!

ChaosKing

8th November 2018, 11:11

Also available via vsrepo now (https://github.com/vapoursynth/vsrepo/commit/d238579f4a32c295f888b828de65affea5402506)

ifb

9th November 2018, 02:56

Very timely for me. Thanks.

I've had mixed results getting yuv422p10le to work correctly with vmafossexec and/or ffmpeg builds with libvmaf. That's probably my fault for trying on Windows, but I was too lazy to try on a Linux VM at the time.

poisondeathray

13th November 2018, 18:23

Request: is it possible to print out the aggregate psnr, ssim, ms_ssim scores ? (Currently it's only aggregate vmaf)

HolyWu

14th November 2018, 12:21

Update r2.

Scale 10-bit pixel values to 8-bit range for correct score calculation.
Use stricter linear frame request since VMAF score will change if frame order is different.
Report aggregate PSNR, SSIM, and MS-SSIM scores in addition to VMAF score.

edcrfv94

14th November 2018, 15:09

Is it possible to registered to frame props like mvsfunc PlaneStatistics?

ChaosKing

14th November 2018, 15:51

Is it possible to registered to frame props like mvsfunc PlaneStatistics?

This would be awesome.

HolyWu

15th November 2018, 03:43

Is it possible to registered to frame props like mvsfunc PlaneStatistics?

Not possible with the exposed API of libvmaf. Only when all frames are deliverd and processed does the library print the aggregate scores and optionally write the log file. There is no way to access the library's internal data to get the per-frame score and add that to the frame property.

lansing

19th January 2019, 09:30

Can you add an "average score" for each metric at the end of the log?

HolyWu

19th January 2019, 15:42

Can you add an "average score" for each metric at the end of the log?

It's already there, albeit not at the end. For example a xml log:

<?xml version="1.0" encoding="ISO-8859-1"?>
<VMAF version="1.3.7">
<params subsample="1" scaledHeight="1080" scaledWidth="1920" model=""/>
<fyi execFps="2.3972" aggregateMS_SSIM="0.997629" aggregateSSIM="0.998861" aggregatePSNR="48.5527" aggregateVMAF="98.452" numOfFrames="6"/>
<frames>
<frame vmaf="95.1147" vif_scale3="0.999067" vif_scale2="0.997998" vif_scale1="0.986572" vif_scale0="0.711273" ssim="0.998833" psnr="48.1752" ms_ssim="0.997502" motion2="0" adm2="0.990135" frameNum="0"/>
<frame vmaf="98.0908" vif_scale3="0.999122" vif_scale2="0.998097" vif_scale1="0.987361" vif_scale0="0.725121" ssim="0.998876" psnr="48.5654" ms_ssim="0.997635" motion2="2.08641" adm2="0.991341" frameNum="1"/>
<frame vmaf="98.6133" vif_scale3="0.999151" vif_scale2="0.998111" vif_scale1="0.987686" vif_scale0="0.737886" ssim="0.998892" psnr="48.9502" ms_ssim="0.997769" motion2="2.4176" adm2="0.991782" frameNum="2"/>
<frame vmaf="99.0617" vif_scale3="0.999089" vif_scale2="0.998038" vif_scale1="0.98654" vif_scale0="0.71651" ssim="0.99883" psnr="48.3169" ms_ssim="0.997532" motion2="2.96657" adm2="0.990675" frameNum="3"/>
<frame vmaf="100" vif_scale3="0.999108" vif_scale2="0.998094" vif_scale1="0.987838" vif_scale0="0.738668" ssim="0.998885" psnr="48.9485" ms_ssim="0.997766" motion2="3.60763" adm2="0.991847" frameNum="4"/>
<frame vmaf="100" vif_scale3="0.999115" vif_scale2="0.99804" vif_scale1="0.986751" vif_scale0="0.720485" ssim="0.99885" psnr="48.371" ms_ssim="0.997572" motion2="4.2859" adm2="0.991102" frameNum="5"/>
</frames>
</VMAF>

lansing

19th January 2019, 16:04

Oh, I was using the json format and it doesn't have it

lansing

20th January 2019, 05:50

So I finally have time to play with this plugin, it's very good tool to help find optimal encode settings for my contents.

My test clip is a 1000 frames of 1440x1080 anime with no grain and very little motion. Here are the comparison between x264 and x265 I made with vmaf :

Aggregate VMAF Percentage
source 98.8374

x264 300 kb/s medium animation 79.9185
x265 300 kb/s medium 85.6454
x265 300 kb/s slower 86.9443

x264 2500 kb/s medium animation 95.1616
x265 2500 kb/s medium 94.9675

x264 CRF 25 medium animation 91.3914
x265 CRF 25 medium 91.1363

x264 CRF 18 medium animation 96.1513
x265 CRF 21 medium 93.6972

x264 CRF 18 medium animation 96.1513
x264 CRF 18 slower animation 96.3279

According to the FAQ, comparing the source clip to itself won't gives 100% score, so 98.8% here is the highest quality for this clip.

For encoding in low bitrate, x265 clearly wins. But for high/transparent encoding, it seems that x264 is still better if encode in same bitrate or same CRF. I don't have good 4k contents to see how that goes.

I have heard people said that x264 crf 18 is equivalent to x265 crf 21 so I did a comparison on those too, and the score shows that there is a difference.

On the last comparison I did it between two different presets. So with high bitrate encoding for my content, using slower preset is just a waste of time with insignificant amount of gains. It only make sense to use it on low bitrate encodes.

Boulder

20th January 2019, 21:23

Did I understand correctly that model 0 should be used if the tested material is 1080p or less? At first I thought that it meant the viewing device but after reading the FAQ, it looks like it's the content itself.

gonca

20th January 2019, 22:48

Did I understand correctly that model 0 should be used if the tested material is 1080p or less? At first I thought that it meant the viewing device but after reading the FAQ, it looks like it's the content itself.

I think it is the screen, otherwise the --phone-model switch makes no sense

Boulder

21st January 2019, 04:46

I think it is the screen, otherwise the --phone-model switch makes no sense

Yes, looking at the models page suggests that.. Looking at the linked slideshow there, it also seems that for proper analysis, you have to upscale to 4K.

EDIT: with video upscaled to 4K, it runs out of memory quite often. Even with core.max_cache_size = 1024, the usage jumps to over 8GB quite fast and then the errors appear. I have 16GB on the machine so it's really not using all the memory.

Start calculating VMAF score...
Script exceeded memory limit. Consider raising cache size.
error: aligned_malloc failed for data_buf.
error: aligned_malloc failed for data_buf.
error: aligned_malloc failed for data_buf.
Exec FPS: 3.574859
VMAF score (harmonic_mean) = nan

HolyWu

21st January 2019, 06:45

EDIT: with video upscaled to 4K, it runs out of memory quite often. Even with core.max_cache_size = 1024, the usage jumps to over 8GB quite fast and then the errors appear. I have 16GB on the machine so it's really not using all the memory.

Start calculating VMAF score...
Script exceeded memory limit. Consider raising cache size.
error: aligned_malloc failed for data_buf.
error: aligned_malloc failed for data_buf.
error: aligned_malloc failed for data_buf.
Exec FPS: 3.574859
VMAF score (harmonic_mean) = nan

Can't reproduce. Provide the exact script you used.

Boulder

21st January 2019, 17:21

Here's the one I used:
import vapoursynth as vs

core = vs.get_core()

orig = core.dgdecodenv.DGSource(r'O:\Testclips\test2.dgi', fulldepth=True)
orig = core.f3kdb.Deband(orig, preset="medium", output_depth=10)

clp = core.ffms2.Source(source=r'c:\x265\aq\aqmode3.hevc')
orig = core.resize.Bicubic(orig, width=3840, height=2160, filter_param_a=0, filter_param_b=0.5)
clp = core.resize.Bicubic(clp, width=3840, height=2160, filter_param_a=0, filter_param_b=0.5)

result = core.vmaf.VMAF(orig, clp, model=1, log_path="c:\x265\aq\clp1.log", log_fmt=0, pool=1, ci=True)

result.set_output()
I also noticed that there's no logfile written if I run that one through "vspipe script.vpy ."

Boulder

21st January 2019, 21:11

The process also hangs without returning to the command prompt. The values are output but I need to use CTRL+C to get the prompt back.

C:\>vspipe c:\x265\aq\compare.vpy .
Start calculating VMAF score...
Exec FPS: 0.501642
VMAF score (harmonic_mean) = 98.925030

The Exec FPS must be a wrong value, the clips are 1385 frames long and it only took a couple of seconds to produce that output. If I ran with the -p option, I got those out of memory errors quite soon.

lansing

21st January 2019, 21:34

C:\>vspipe c:\x265\aq\compare.vpy .
Start calculating VMAF score...
Exec FPS: 0.501642
VMAF score (harmonic_mean) = 98.925030

The Exec FPS must be a wrong value, the clips are 1385 frames long and it only took a couple of seconds to produce that output.

Have you try running it with vs editor's benchmark? My 1000 frames 1080p anime took about a minute and 20 seconds, yours should be doubling or tripling my time comparing 4k clips.

Boulder

21st January 2019, 21:40

Just tried - it crashed by vanishing without a warning almost right after it started. I raised the cache size to 2048 MB and it hung at frame 30 which it probably what happens with vspipe as well.

lansing

21st January 2019, 23:03

Just tried - it crashed by vanishing without a warning almost right after it started. I raised the cache size to 2048 MB and it hung at frame 30 which it probably what happens with vspipe as well.

I tried your script, mine stop moving about frame 14 too. I try lowering the resize resolution to 3000x2000, it was able to run with RAM usage jumping between 3.8G to 4.8G. So it should be an insufficient memory problem?

ChaosKing

22nd January 2019, 02:23

No problems here. Max ram usage was 10gb.
Maybe it is also a path problem? I used log_path=r"D:\test\clp1.log"
The x265 folder exists on c? Bcs if not vapoursynth can't create one without admin rights.

gonca

22nd January 2019, 02:52

Did I understand correctly that model 0 should be used if the tested material is 1080p or less? At first I thought that it meant the viewing device but after reading the FAQ, it looks like it's the content itself.

Could someone expand on this.
Is it the device or video?
Does the video have to be resized?

HolyWu

22nd January 2019, 03:09

I also can't reproduce the issues you two encountered with a 2000 frames 1080p upscaled to 4K, no matter using vspipe or vsedit. And not prefixing the string containing a single backslash with r is always a bad idea.

HolyWu

22nd January 2019, 03:44

Could someone expand on this.
Is it the device or video?
Does the video have to be resized?

It's the viewing device. See https://github.com/Netflix/vmaf/blob/master/FAQ.md#q-when-computing-vmaf-on-low-resolution-videos-480-height-for-example-why-the-scores-look-so-high-even-when-there-are-visible-artifacts.

lansing

22nd January 2019, 03:59

I did another test, my PC has 16G RAM and I start the benchmark with all my apps closed dropping total ram usage to 3.5G. And now the benchmark ran without chocking with RAM usage for the process peaks at 6G rarely.

And then I did another benchmark with starting ram usage at 8.4G, this time the process stopped after the first few frames with its ram usage start from 200MB -> 3GB -> 500MB.

So this seems like a not-enough-memory issue, but with 16G, I should still have 1G excessive memory to run the second benchmark.

Boulder

22nd January 2019, 04:50

It's the viewing device. See https://github.com/Netflix/vmaf/blob/master/FAQ.md#q-when-computing-vmaf-on-low-resolution-videos-480-height-for-example-why-the-scores-look-so-high-even-when-there-are-visible-artifacts.

And this seems to confirm that upscaling to that resolution is the way to go, although getting a VMAF score from an originally 1080p source on a 4K display is not optimal.

If, say, for a distorted video of 480 resolution, we still want to predict its quality viewing from 3 times the height (not 6.75), how can this be achieved?

If the 480 distorted video comes with a source (reference) video of 1080 resolution, then the right way to do it is to upsample the 480 video to 1080, and calculate the VMAF at 1080, together with its 1080 source.

If the 480 distorted video has only a 480 reference, then you can still upsample both distorted/reference to 1080, and calculate VMAF. A caveat is, since the VMAF model was not trained with upsampled references,
the prediction would not be as accurate as 1).

Boulder

22nd January 2019, 04:56

I did another test, my PC has 16G RAM and I start the benchmark with all my apps closed dropping total ram usage to 3.5G. And now the benchmark ran without chocking with RAM usage for the process peaks at 6G rarely.

And then I did another benchmark with starting ram usage at 8.4G, this time the process stopped after the first few frames with its ram usage start from 200MB -> 3GB -> 500MB.

So this seems like a not-enough-memory issue, but with 16G, I should still have 1G excessive memory to run the second benchmark.

When I first tested things, I didn't have any other big processes running and I had no cache size restrictions in the script. The memory usage jumped to around 8GB and still got the out of memory error. Task Manager showed that the total memory usage was around 65-70% when it happened, so there was still more memory available but maybe Vapoursynth restricted it.

It looks like I cannot run an encode simultaneously because it seems end abruptly quite often when the other VS process grabs hold of most of the memory on the computer. I'm getting 8GB more this week which is good for testing this issue.

gonca

22nd January 2019, 23:38

Doing a test run
4K display, 4K sample
Using VDub2 64 bit for playback of script
So far it is slow and CPU is at near 100%
RAM for VDub2 is +/- 12 GB, but overall I see spikes to near 16GB overall, VDub2 has spiked to about 14GB on some occasions.
When this run is over I will try VSEditor

gonca

23rd January 2019, 00:17

VDub2 seems to run fine
Tried VSEditor and got the out of memory error early
Added one line and it seems to go through fine
Numbers (RAM) are about the same as VDub2

Original script

import vapoursynth as vs
core = vs.get_core()
core.std.LoadPlugin("C:/Program Files (Portable)/dgdecnv/x64 Binaries/DGDecodeNV.dll")
core.std.LoadPlugin(r'C:\Users\LUIS\Desktop\VMAF-r2\plugins64\VMAF.dll')
orig = core.dgdecodenv.DGSource(r'I:\original.dgi', fieldop=0, fulldepth=True)
orig = core.resize.Point(orig, format=vs.YUV420P10)
dist = core.dgdecodenv.DGSource(r'I:\distorted.dgi', fieldop=0, fulldepth=True)
dist = core.resize.Point(dist, format=vs.YUV420P10)
result = core.vmaf.VMAF(orig, dist, model=1, log_path=r'I:\log.log', log_fmt=1, pool=1, ci=True)
result.set_output ()

Modified script

import vapoursynth as vs
core = vs.get_core()
core.max_cache_size = 16384
core.std.LoadPlugin("C:/Program Files (Portable)/dgdecnv/x64 Binaries/DGDecodeNV.dll")
core.std.LoadPlugin(r'C:\Users\LUIS\Desktop\VMAF-r2\plugins64\VMAF.dll')
orig = core.dgdecodenv.DGSource(r'I:\original.dgi', fieldop=0, fulldepth=True)
orig = core.resize.Point(orig, format=vs.YUV420P10)
dist = core.dgdecodenv.DGSource(r'I:\distorted.dgi', fieldop=0, fulldepth=True)
dist = core.resize.Point(dist, format=vs.YUV420P10)
result = core.vmaf.VMAF(orig, dist, model=1, log_path=r'I:\log.log', log_fmt=1, pool=1, ci=True)
result.set_output ()

Seems minimum amount of RAM for 4K is 16GB + all background processes

lansing

25th January 2019, 04:53

I'm wondering, right now with my machine I'm doing 12 fps on 1440x1080 clips with 3G ram usage. If I'm to upgrade to a faster machine that do 3 times this speed, will my ram usage also go up by 3 times?

HolyWu

25th January 2019, 06:47

I'm wondering, right now with my machine I'm doing 12 fps on 1440x1080 clips with 3G ram usage. If I'm to upgrade to a faster machine that do 3 times this speed, will my ram usage also go up by 3 times?

I guess the ram usage should only be relevant to the video resolution, bit depth, and number of threads.

lansing

25th January 2019, 21:37

I did another test on this, I disabled the core in task manager and ran benchmark on 1 to 8 threads, speed rises as more threads were used, but the ram usage are the same on the all of them.

A side note, setting the thread number in the script doesn't work
core = vs.get_core(threads=1)

HolyWu

26th January 2019, 03:08

A side note, setting the thread number in the script doesn't work
core = vs.get_core(threads=1)

Ah, libvmaf uses pthreads for its internal multithreading, and I didn't set it according to core's number of threads but leave it at its default which uses all threads available. I'll fix it at the next release.

HolyWu

1st February 2019, 06:01

Update r3.

Update libvmaf to v1.3.13, which includes performance improvement.
Change default pool to 1.
Set libvmaf's threads according to core's number of worker threads.

lansing

1st February 2019, 19:28

I couldn't update it through vsrepo?

WorBry

1st February 2019, 22:25

I've been testing the Vapour Synth VMAF (r3) plugin with a high quality 1080/50p source (CrowdRun, lossless x264 8bit 420 Intra) encoded to x264 over a range of CRF values.

When testing the source file against self (as a control), which should be lossless, I was surprised to find that the VMAF score is not 100.

http://i.imgur.com/DldW0wj.png (https://imgur.com/DldW0wj)

Is this normal ?

Script:

import vapoursynth as vs
core = vs.get_core()
clip = core.ffms2.Source(source=r'X:/CrowdRun_x264_lossless.mp4')
result = core.vmaf.VMAF(clip, clip, ssim=True, ms_ssim=True, psnr=True, model=0, log_path=r'X:/VMAF_r3.log' )
result.set_output()

Also the log reports VMAF version="1.3.11", not 1.3.13

Update r3.

[LIST]
Update libvmaf to v1.3.13, which includes performance improvement.

ChaosKing

1st February 2019, 22:29

https://github.com/Netflix/vmaf/blob/master/FAQ.md#q-when-i-compare-a-video-with-itself-as-reference-i-expect-to-get-a-perfect-score-of-vmaf-100-but-what-i-see-is-a-score-like-987-is-there-a-bug

A: VMAF does not guarantee that you get a perfect score in this case, but you should get a score close enough. Similar things would happen to other machine learning-based predictors (another example is VQM-VFD).

WorBry

1st February 2019, 22:32

Thanks.

WorBry

2nd February 2019, 20:16

When testing the source file against self (as a control), which should be lossless, I was surprised to find that the VMAF score is not 100.

http://i.imgur.com/DldW0wj.png (https://imgur.com/DldW0wj)

Is this normal ?

https://github.com/Netflix/vmaf/blob/master/FAQ.md#q-when-i-compare-a-video-with-itself-as-reference-i-expect-to-get-a-perfect-score-of-vmaf-100-but-what-i-see-is-a-score-like-987-is-there-a-bug

Actually, looking at the per-frame scores in that same log, it is just the VMAF score for the first frame that skews the aggregate result, and it looks like it's the motion2 metric (which measures temporal difference) score of 0 that is responsible for that. All of the remaining 499 frames have a VMAF score of 100.

http://i.imgur.com/H17lsxVh.png (https://imgur.com/H17lsxV)

To enlarge open image and click (+) cursor.

Perhaps there should be an option to exclude the first frame from the aggregate scores?

WorBry

5th February 2019, 05:46

I've been testing the Vapour Synth VMAF (r3) plugin with a high quality 1080/50p source (CrowdRun, lossless x264 8bit 420 Intra) encoded to x264 over a range of CRF values.

Interesting results....having not tested VMAF before.

Here I encoded the CrowdRun 1080/50p 'master' to x264 over CRF 0 - 30. This was using the default vmaf_v0.6.1.pkl model (i.e. Predict Quality on a 1080p HDTV screen at distance 3x the screen height). The VMAF, SSIM and MS-SSIM scores are the aggregate values. The 'classic' SSIM tests were run on Zeranoe ffmpeg win64-static nightly build (20190131).

http://i.imgur.com/SumN4ba.png (https://imgur.com/SumN4ba)

http://i.imgur.com/z3s3fgG.png (https://imgur.com/z3s3fgG)

Big difference in the libvmaf SSIM and ffmpeg SSIM scores. Apparently, the vmaf SSIM implementation "includes an empirical downsampling process, as described at the Suggested Usage section of https://ece.uwaterloo.ca/~z70wang/research/ssim/", whereas the FFMPEG implementation does not have this step:

https://github.com/Netflix/vmaf/issues/22

As for the VMAF metric itself; well, I can appreciate it's value in context of 'perceptual quality'. In this example it effectively declares the x264 transcodes to be visually lossless from CRF 0 to around CRF 16, whereas the ffmpeg-SSIM scores show a progressive decline over the entire CRF/bitrate range.

And here I ran a parallel series encoded to x265 for comparison.

http://i.imgur.com/LfAjzft.png (https://imgur.com/LfAjzft)

Clearly VMAF judges x265 to have significantly higher perceptual quality than x264 at the lower bitrate range and more so than revealed by SSIM.

That said, I think 'classic' (ffmpeg) SSIM is still a useful tool for analyzing fine differences at the pixel peeping level and beyond visual acuity, and (by virtue of the differential Y, U and V scores) for determining whether the luma and/or chroma are affected.

I did record the libvmaf and ffmpeg PSNR scores also, but they are not as interesting.

@HolyWu, btw, thanks for the plugin.

Boulder

5th February 2019, 07:25

Has anyone else noticed that the VMAF scores in some cases tend to be "too perfect" to measure?

https://forum.doom9.org/showthread.php?p=1864721#post1864721

lansing

5th February 2019, 23:09

Clearly VMAF judges x265 to have significantly higher perceptual quality than x264 at the lower bitrate range and more so than revealed by SSIM.

Good comparison to show that x265 really has no advantage over x264 on 1080p materials if we're going for transparent encoding.

Now we'll just have to wait for people with high end computer to do the 4K comparison.

WorBry

6th February 2019, 00:41

I started off testing at original (Crowd Run) 2160/50p resolution but could see I would be in for a long haul ;)

ChaosKing

6th February 2019, 13:58

I couldn't update it through vsrepo?

Updates via vsrepo will never be available immediately. In addition, the new version was not recognized by the update script so it had to be done by hand. After what Myrsloik need to upload the new compiled repo file to his site. There are many steps as you can see.

But it's available now :)

WorBry

7th February 2019, 07:52

Actually, looking at the per-frame scores in that same log, it is just the VMAF score for the first frame that skews the aggregate result, and it looks like it's the motion2 metric (which measures temporal difference) score of 0 that is responsible for that. All of the remaining 499 frames have a VMAF score of 100.

http://i.imgur.com/H17lsxVh.png (https://imgur.com/H17lsxV)

To enlarge open image and click (+) cursor.

Perhaps there should be an option to exclude the first frame from the aggregate scores?

There again, that's not always the case. Here, the original 2160/50p Crowd Run (8bit 420, y4m) reference clip encoded to x264 CRF=0 (i.e. lossless with switch to qp 0 and High 444 Predictive profile), and the clips compared with VMAF v3 in Model=1 mode:

<VMAF version="1.3.11">
<params model="" scaledWidth="3840" scaledHeight="2160" subsample="1" num_bootstrap_models="0" bootstrap_model_list_str="" />
<fyi numOfFrames="500" aggregateVMAF="100" aggregatePSNR="60" aggregateSSIM="1" aggregateMS_SSIM="1"......
....
<frame frameNum="0" adm2="1" motion2="0" ms_ssim="1" psnr="60" ssim="1" vif_scale0="1" vif_scale1="0.999999" vif_scale2="0.999999" vif_scale3="0.999998" vmaf="100" />
<frame frameNum="1" adm2="1" motion2="8.42311" ms_ssim="1" psnr="60" ssim="1" vif_scale0="1" vif_scale1="0.999999" vif_scale2="0.999999" vif_scale3="0.999998" vmaf="100" />

So I guess you have to let it do it's thing and take the scores as they come.