Very interesting link, tebasuna51! May I ask what your definition of a duplicated frame is? Is it that the audio frames are bit-for-bit identical, or is it that they carry the same PTS timestamp?
The perceptibility of desync is not very sensitive (it also depends on whether the audio is ahead or behind). Some people are more sensitive than others as well. The limits obtained from research are way higher than the limits specified by standard.
https://ieeexplore.ieee.org/document/4599253
https://core.ac.uk/download/pdf/55846182.pdf
https://www.itu.int/dms_pubrec/itu-r...2-S!!PDF-E.pdf
At my former employer we required our sync algorithms for STBs to limit things to maximum 20ms desync. So one could argue that it's not worth worrying too much about 16ms.