Log in

View Full Version : Thread-safe ?


Pat357
21st March 2015, 23:58
I noticed that when using lot's of threads (like 12 or so) in VS, the output is no longer identical to the the one using less threads.

I 've used :
vspipe -y -p -e 999 TDeintMod.vpy 1000frames_thr01_a.yuv

to output 1000 frames and repeated this for different settings of the number of threads used.

Here are the results :

658f6e7349f83ab0643596583ef464c6 *1000frames_thr01_a.yuv
658f6e7349f83ab0643596583ef464c6 *1000frames_thr02_a.yuv
658f6e7349f83ab0643596583ef464c6 *1000frames_thr04_a.yuv
658f6e7349f83ab0643596583ef464c6 *1000frames_thr06_a.yuv
8146826125ccc3c019d76b88707aff40 *1000frames_thr12_a.yuv
3c4745721c519a27e6a8b52e5950de82 *1000frames_thr12_b.yuv
658f6e7349f83ab0643596583ef464c6 *1000frames_thr12_c.yuv
3c4745721c519a27e6a8b52e5950de82 *1000frames_thr12_d.yuv
3ba74da4a1fd324a28ff5920b0849dd9 *1000frames_thr12_e.yuv
658f6e7349f83ab0643596583ef464c6 *1000frames_thr6_a.yuv
658f6e7349f83ab0643596583ef464c6 *1000frames_thr6_ffms220_2.yuv

The filenames ending on thr(x)_a are created with (x) threads configured in the script.
Notice that for threads=12, we have 5 results, one is correct (1000frames_thr12_c.yuv), and we have 2 identical 's : 1000frames_thr12_b.yuv and 1000frames_thr12_d.yuv, but they are not the same as the results from threads=1...to 6.
This should rule out some kind of random processes like corruption from bad memory, bad disk,...
I was very surprised about the differences in the output for 12 threads : the output is no longer always identical !! :confused:
I normally don't use more than 8 threads on this system (4+4 HT cores) because it won't speed up things (>8 threads can even slowdown the process).
I did similar tests like this is the past but I've never encountered this issue : maybe it's specific this the script/clip combination ?

Memory consumption is not an issue as even with 12 threads it's well below the limit.

Script I use is straight from the DOC's :

import vapoursynth as vs
import functools
core = vs.get_core(threads=6, accept_lowercase = True)
core.std.LoadPlugin(path=r"j:\Programs\ffms2-r940c64-avs_vsp\ffms2.dll")
core.std.LoadPlugin(path=r"c:\Program Files (x86)\VapourSynth\filters\vapoursynth-nnedi3-v4.0-win32\libnnedi3.dll")
core.std.LoadPlugin(r"c:\Program Files (x86)\VapourSynth\filters\TDeintMod-r5\Win32\TDeintMod.dll")
clip = core.ffms2.Source(r"j:\film\new_2015-02\WhiteQueen_Sample.h264", fpsnum=30000, fpsden=1001, threads=1)
def conditionalDeint(n, f, orig, deint):
if f.props._Combed:
return deint
else:
return orig
deint = core.tdm.TDeintMod(clip, order=1, field=1, mode=0, edeint=core.nnedi3.nnedi3(clip, field=1)) #order=1, field=1, mode=1
combProps = core.tdm.IsCombed(clip)
clip = core.std.FrameEval(clip, functools.partial(conditionalDeint, orig=clip, deint=deint), combProps)
clip.set_output()


System & environment :
Input clip is 1080i30 (fps=30000/1001), interlaced TFF containing 1360 frames.
Vapoursynth r26 32 bit
Windows 7 Pro X64
Core i7-4770 CPU with 16 GB RAM
System is not OC'd

GMJCZP
22nd March 2015, 03:35
Depending on your CPU model it does not accept more than 8 threads.

http://ark.intel.com/products/75122/Intel-Core-i7-4770-Processor-8M-Cache-up-to-3_90-GHz

Myrsloik
22nd March 2015, 08:07
Is the output visually identical? Don't expect ffmpeg to have bitexact output always.

Pat357
22nd March 2015, 14:56
Is the output visually identical? Don't expect ffmpeg to have bitexact output always.

The FFMS2 source was what was I first thought too ; but this doesn't explain why only the results from the scripts with 12 threads configured differ from the rest.
It looks that for threads <= 6, FFMS creates exactly the same output : 6 tests with #threads<=6 and *all* 6 produced *identical* outputs.

My conclusion is that the different outputs is *not* only FFMS related, but somehow #threads related.. this does not rule out it's (#threads AND FFMS) related.
Remember while you at the university : Design of Experiments (DOE , critical parameters, full factorial experiment design , first, second, third, fourth order parameter relations)... maybe this has also been a while for you :-)

Myrsloik
22nd March 2015, 15:00
The more threads you have the more likely it is that a frame gets processed out of order. With 12 threads it will trigger about one seek in ffms2 every 1000 frames. Or that's what my primitive tests showed a while ago. With 6 threads it was closer to one in 10000.

So it's just as expected...

jackoneill
22nd March 2015, 15:30
What if you use l-smash instead of ffms2?

Are_
22nd March 2015, 16:59
What if you use l-smash instead of ffms2?

I'm not the OP but for me:

ffms2-threads6
2b0a35a37e58b37fd6570b8be513d8d1 -

lsmash-threads6
8ec2559e79cfe0f4d0ea6321f91c3d00 -

ffms2-threads12
4491ddfa60061d7e7c393c8be6544bab -
3d93ddf864c0a16da90cadb2fc09de5f -
4fd52a3d7a89fdbd5fa4feded451c695 -
705fdf5c8fec0818837b49f78ca91016 -
b84d73182017ccec2dd9a8cc6a64f0e9 -
69566c1909a9047cf707e503ea7a11e9 -

lsmash-threads12
8ec2559e79cfe0f4d0ea6321f91c3d00 -
8ec2559e79cfe0f4d0ea6321f91c3d00 -
1134822bbd084d06ebabfc1d653d08ca -
11d123ee1e0c9dedff7f3673a5d87757 -
8ec2559e79cfe0f4d0ea6321f91c3d00 -
ba37f8a1c3af6de53e07990bb968910f -

Same script as Pat357.

Pat357
23rd March 2015, 01:38
What if you use l-smash instead of ffms2?

L-smash LWLibavSource produced most of the time identical results : much more consistent than FFMS.
Even when using 20 threads, the output was bit-identical.

Also the official FFMS v2.20 build performed more consistent than the later C-versions concerning producing identical outputs for repeated runs.

The worst results come from the latest C-builds from http://forum.doom9.org/showthread.php?t=127037&page=98 .
Maybe is has to do with the fact that the Haali DirectShow code was removed and the default demuxer is now LAVF ?
Also this version has a memory leak that cumulates on successive seeks and may cause crashes because of the latter.
I just got a BSOD while testing ; I believe this is the first BSOD on this system running for over 2 years now !!