Log in

View Full Version : Audio quality prediction – AudioVMAF?


tormento
8th July 2024, 17:30
I was asking myself if something similar to PSNR, SSIM or VMAF exists for audio.

I saw some paper about AudioVMAF but no executable at all to try.

Has anybody some experience of it?

Emulgator
8th July 2024, 23:09
SNR indeed comes from audio.
It is so simple: Subtraction, then relation.
Encode - Source = Error signal.
Relate this to FS level and there it is: SNR

To judge what these -46dB from a cassette tape or video encode might mean:
It is a 200th of FS signal. Just amplify that error signal and listen to it.
This tells more about signal degradation than any artificial unit.
All is there, ready to be analysed: kind of distortion, frequency components of distortion, temporal distortion, noise, modulation products...

SSIM, VMAF are rather concealing those clear and ugly numbers, like:
"How much dirt can we hide under the carpet until 90% of customers won't notice anymore ?"
Well, then it becomes subjective. Good for content distributors, bad for archival.

PSNR is still my first go at video quality, but this is engineer's side of things.

tormento
9th July 2024, 14:21
Encode - Source = Error signal
Can you suggest me what program(s) do you use?

Emulgator
9th July 2024, 18:34
Any audio editor will do.
CoolEdit, Audacity, SoundForge, Reaper, Audition, ProTools,..
Subtract, or Invert&Mix.

P.S. And you will want to work in 32bit float, so specify "open as 32-bit" (wording depends on Editor used)

j7n
28th July 2024, 02:38
You need to carefully align the two files in time using a transient as a point of reference. Switch the time ruler to samples. Sometimes the latency is a fractional number of samples. There was a tool called DeltaWave that aimed at doing the alignment automatically. Most lossy codecs have some delay that is only compensated with the most popular formats. You can distil the difference to a single number by calculating the RMS power, possibly with Equal Loudness Contour in Sound Forge, so that high frequencies are deweighted.