Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
26th January 2015, 14:45 | #81 | Link |
unsigned int
Join Date: Oct 2012
Location: 🇪🇺
Posts: 760
|
isse=False makes Degrain3 use C++ functions instead of some SSE2 functions. It probably makes things slower (you can compare Degrain1 with and without isse).
__________________
Buy me a "coffee" and/or hire me to write code! |
26th January 2015, 14:56 | #82 | Link | |
Registered User
Join Date: Jun 2006
Posts: 452
|
Quote:
My system is i7-970 3.2 MHz , which should support SSE2 and a lot more. Do you think switching to vapoursynth 64-bit would give the same problems ? Is it somewhere intrinsic to VS 32 bit versions ? |
|
26th January 2015, 17:01 | #83 | Link | |
unsigned int
Join Date: Oct 2012
Location: 🇪🇺
Posts: 760
|
Quote:
__________________
Buy me a "coffee" and/or hire me to write code! |
|
26th January 2015, 19:00 | #84 | Link | |
Registered User
Join Date: Jun 2006
Posts: 452
|
Quote:
This one works without the "isse=False" option ! . I still wonder why I was apparently the first to stumble on this problem. Also on another system (Hasswell 4770, 16 GB, Win7Pro x64) had the very same problem. I will test this version again on that system. |
|
26th January 2015, 19:05 | #85 | Link | |
Professional Code Monkey
Join Date: Jun 2003
Location: Kinnarps Chair
Posts: 2,548
|
Quote:
__________________
VapourSynth - proving that scripting languages and video processing isn't dead yet |
|
30th January 2015, 21:29 | #86 | Link |
unsigned int
Join Date: Oct 2012
Location: 🇪🇺
Posts: 760
|
v6 is here, and it didn't even take as many years as advertised!
Code:
* Add support for grayscale, 4:4:0, and 4:4:4 video. * Add support for up to 16 bits per sample. * Add SCDetection filter. * Reject overlap greater than half the block size. * Fix crash in BlockFPS when the input clip's frame rate is unknown (introduced in v5). * Fix colourful bottom border in Degrain3 when overlap is greater than 0 (introduced in v5). * Fix possible bug with infinite clips in Compensate. * Fix frequent crash in Degrain2 and Degrain3 due to stack misalignment, specific to the win32 builds. Probably all previous versions are affected.
__________________
Buy me a "coffee" and/or hire me to write code! |
30th January 2015, 21:46 | #87 | Link | |
Registered User
Join Date: Oct 2011
Posts: 204
|
Quote:
What is 4:4:0 supposed to be? Grey-and-Red video? Do you have any examples? I'm rather confused right now... |
|
30th January 2015, 22:39 | #89 | Link | |
Registered User
Join Date: Oct 2011
Posts: 204
|
Quote:
Now that you explain it, I even remember at least reading once that rotated 4:2:2 is 4:4:0 ... |
|
2nd February 2015, 23:31 | #90 | Link |
Registered User
Join Date: Feb 2013
Location: France
Posts: 23
|
Hi,
I get a crash when feeding QTGMC with a 1920x1080 16bit per sample clip : vspipe crashes telling me the faulty module is libmvtools.dll When I open the scrip using VDub, i get a "division by zero" error in libmvtools, here is the crash report : Code:
VirtualDub crash report -- build 35491 (release-AMD64) -------------------------------------- Disassembly: 6ef94c00: 784a js 16ef94c4c 6ef94c02: 8914c7 mov [rdi+rax*8], edx 6ef94c05: 42890c80 mov [rax+r8*4], ecx 6ef94c09: 488b8424c00000 mov rax, [rsp+c0] 00 6ef94c11: 49634b10 movsxd rcx, [r11+10h] 6ef94c15: 4c630c28 movsxd r9, [rax+rbp] 6ef94c19: 31c0 xor eax, eax 6ef94c1b: 4c39c9 cmp ecx, ecx 6ef94c1e: 7d22 jge 16ef94c42 6ef94c20: 4a8d0409 lea rax, [rcx+r9] 6ef94c24: 4c89ca mov edx, r9 6ef94c27: 4829ca sub edx, ecx 6ef94c2a: 480fafc2 imul eax, edx 6ef94c2e: 4d0fafc9 imul ecx, ecx 6ef94c32: 480fafc9 imul ecx, ecx 6ef94c36: 48c1e008 shl rax, 08h 6ef94c3a: 4899 cdq 6ef94c3c: 4c01c9 add ecx, ecx 6ef94c3f: 48f7f9 idiv eax, ecx 6ef94c42: 4983f801 cmp rax, 01h 6ef94c46: 43890487 mov [r15+r8*4], eax 6ef94c4a: 0f859b010000 jnz 16ef94deb 6ef94c50: 8b8c24f4010000 mov ecx, [rsp+1f4] 6ef94c57: 8b8424f0010000 mov eax, [rsp+1f0] 6ef94c5e: 41b900010000 mov ecx, 00000100 6ef94c64: 4c8bb424a00000 mov r14, [rsp+a0] 00 6ef94c6c: 4c897c2438 mov [rsp+38h], r15 6ef94c71: 4883c614 add rsi, 14h 6ef94c75: 48897c2420 mov [rsp+20h], rdi 6ef94c7a: 448d8408010100 lea r8d, [rax+rcx+101] 00 6ef94c82: c1e008 shl eax, 08h 6ef94c85: 99 cdq 6ef94c86: 41f7f8 idiv eax, eax <-- FAULT 6ef94c89: 4129c1 sub ecx, eax 6ef94c8c: 898424f0010000 mov [rsp+1f0], eax 6ef94c93: 89c8 mov eax, ecx 6ef94c95: c1e008 shl eax, 08h 6ef94c98: 488b8c24880000 mov rcx, [rsp+88] 00 6ef94ca0: 99 cdq 6ef94ca1: 41f7f8 idiv eax, eax 6ef94ca4: 8b942408010000 mov edx, [rsp+108] 6ef94cab: 4129c1 sub ecx, eax 6ef94cae: 898424f4010000 mov [rsp+1f4], eax 6ef94cb5: 488b442478 mov rax, [rsp+78h] 6ef94cba: 44894c2430 mov [rsp+30h], r9d 6ef94cbf: 448b8c24d80000 mov r9d, [rsp+d8] 00 6ef94cc7: 4889442428 mov [rsp+28h], rax 6ef94ccc: 488b8424c80000 mov rax, [rsp+c8] 00 6ef94cd4: 4e8d0410 lea r8, [rax+r10] 6ef94cd8: 41ff16 call dword ptr [r14] 6ef94cdb: 4c8ba424b80000 mov r12, [rsp+b8] 00 6ef94ce3: 8d0c1b lea ecx, [rbx+rbx] 6ef94ce6: 488b9424f80000 mov rdx, [rsp+f8] 00 6ef94cee: 4c8b8c24f00000 mov r9, [rsp+f0] 00 6ef94cf6: 4c8b8424880000 mov r8, [rsp+88] 00 6ef94cfe: 4863 db 63h Built on Althena on Sun Oct 27 16:00:02 2013 using compiler version 1400 Windows 6.1 (Windows 7 x64 build 7601) [Service Pack 1] Memory status: virtual free 8386984M/8388608M, commit limit 49032M, physical total 24517M RAX = fffffd00 RBX = 3a0 RCX = ffffff02 RDX = ffffffff RSI = 6dfc4 RDI = 1540f9d0 RBP = 0 R8 = 0 R9 = 100 R10 = 3a0 R11 = 1873eb88 R12 = 9319ee0 R13 = 0 R14 = 118f7148 R15 = 1540f960 RSP = 1540f770 RIP = 6ef94c86 EFLAGS = 00010287 Crash reason: Integer Divide-by-Zero Crash context: An integer division by zero occurred in module 'libmvtools'. Pointer dumps: RDI 1540f9d0: 286bdce0 00000000 30b5e610 00000000 28fec020 00000000 293e2020 00000000 RSP 1540f770: 00000002 00000000 00000007 000007fe 30868020 00000000 00000f20 00000000 1540f790: 1540f9d0 00000000 00000010 00000000 00000106 00000000 1540f960 00000000 1540f7b0: 00000000 00000000 00000002 00000000 00000002 00000000 00000010 00000000 1540f7d0: 1540fad0 00000000 1540f940 00000000 1540f9a0 00000000 1540f950 00000000 R11 1873eb88: 000001d0 000002f0 00000000 0000002a ffbc1e25 000001d8 000002f0 00000000 R12 09319ee0: 30868020 00000000 3372270c 88000050 08cc6d30 00000000 3372270d 8800ff50 R14 118f7148: 6f1d0690 00000000 6f1d2190 00000000 6f1d2190 00000000 6ef91df0 00000000 R15 1540f960: fffffffd ffffff02 038f4700 00000000 00000f00 00000780 00000780 00000000 Thread call stack: 6ef94c86: libmvtools!VapourSynthPluginInit [6ef80000+1b80+13106] 7fedb122bee: VapourSynth!getVapourSynthAPI [7fedb0e0000+2b2c0+1792e] 7fedb120424: VapourSynth!getVapourSynthAPI [7fedb0e0000+2b2c0+15164] 7fedb183c48: VapourSynth!getVapourSynthAPI [7fedb0e0000+2b2c0+78988] 7fedb1aca30: VapourSynth!getVapourSynthAPI [7fedb0e0000+2b2c0+a1770] 7fedb110439: VapourSynth!getVapourSynthAPI [7fedb0e0000+2b2c0+5179] 7fedb184477: VapourSynth!getVapourSynthAPI [7fedb0e0000+2b2c0+791b7] 7fedb1846c1: VapourSynth!getVapourSynthAPI [7fedb0e0000+2b2c0+79401] 7fedb11f19b: VapourSynth!getVapourSynthAPI [7fedb0e0000+2b2c0+13edb] 7fedb11bd93: VapourSynth!getVapourSynthAPI [7fedb0e0000+2b2c0+10ad3] 7fedb11d4c5: VapourSynth!getVapourSynthAPI [7fedb0e0000+2b2c0+12205] 7fedb192739: VapourSynth!getVapourSynthAPI [7fedb0e0000+2b2c0+87479] 7fedb183e15: VapourSynth!getVapourSynthAPI [7fedb0e0000+2b2c0+78b55] 7fedb157afb: VapourSynth!getVapourSynthAPI [7fedb0e0000+2b2c0+4c83b] 7fedb11fae8: VapourSynth!getVapourSynthAPI [7fedb0e0000+2b2c0+14828] 7fedb15b530: VapourSynth!getVapourSynthAPI [7fedb0e0000+2b2c0+50270] 7fedb1890f6: VapourSynth!getVapourSynthAPI [7fedb0e0000+2b2c0+7de36] 776959ed: kernel32!BaseThreadInitThunk [77680000+159e0+d] 778cc541: ntdll!RtlUserThreadStart [778a0000+2c520+21] -- End of report |
2nd February 2015, 23:50 | #91 | Link |
unsigned int
Join Date: Oct 2012
Location: 🇪🇺
Posts: 760
|
It's already fixed in git. Until v7, you can avoid the crash by passing "isse=False" to Analyse when feeding it 16 bit video. Only 16 bit video is affected. 15 and lower is fine.
__________________
Buy me a "coffee" and/or hire me to write code! |
3rd February 2015, 14:39 | #93 | Link |
Registered User
Join Date: Jun 2012
Location: Ibiza, Spain
Posts: 321
|
Yet another speed comparasion for mvtools
Welp, more test then:
CPU is a AMD FX-8150, 3600 MHz, Turbo CORE/Cool n' Quiet/C6 dissabled. Windows 7 Ultimate 64bit / Gentoo Linux 64bit. Input is 720×480 YUV420P8, mpeg2, 2000 frames, decoded with lsmash-works. VapourSynth command used with an additional "--requests 1" for the 1 thread tests: Code:
vspipe test.py /dev/null --start 9501 --end 11500 Code:
AVSMeter.exe "tests.avs" -range=9501,11500 Code:
VapourSynth version = r26 vapoursynth-mvtools version = v6 avisynth vanilla mvtools version = v2.5.11.3 avisynth svp mvtools version = 2.5.11.9-svp avisynth firesledge mvtools version = 2.6.0.5 lsmash-works version = r775 AviSynth version = 2.6.0 RC1 AVSMeter version = v1.9.4 No MT version of AviSynth was tested because it has proven unstable and somewhat useless nowdays, 4gb_ram / number_of_threads is not enough for HD content. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Results for Degrain test: Code:
8 threads: VapourSynth Windows = 32.35 fps (100% cpu) VapourSynth Linux = 37.50 fps (100% cpu) AviSynth firesledge = 8.32 fps (35% cpu) 1 thread: VapourSynth Windows = 5.32 fps (12% cpu) VapourSynth Linux = 5.50 fps (12% cpu) AviSynth Vanilla = 4.42 fps (12% cpu) AviSynth SVP = 6.05 fps (12% cpu) Code:
import vapoursynth as vs core = vs.get_core() # threads=1 v = core.lsmas.LWLibavSource(r'720x480 YUV420P8 mpeg2.mkv') super = core.mv.Super(src) mvbw3 = core.mv.Analyse(super, isb=True, delta=3, overlap=4) mvbw2 = core.mv.Analyse(super, isb=True, delta=2, overlap=4) mvbw = core.mv.Analyse(super, isb=True, delta=1, overlap=4) mvfw = core.mv.Analyse(super, isb=False, delta=1, overlap=4) mvfw2 = core.mv.Analyse(super, isb=False, delta=2, overlap=4) mvfw3 = core.mv.Analyse(super, isb=False, delta=3, overlap=4) v = core.mv.Degrain3(clip=src, super=super, mvbw=mvbw, mvfw=mvfw, mvbw2=mvbw2, mvfw2=mvfw2, mvbw3=mvbw3, mvfw3=mvfw3) v.set_output() Code:
LWLibavVideoSource("720x480 YUV420P8 mpeg2.mkv") super = MSuper(last) mvbw3 = MAnalyse(super, isb=True, delta=3, overlap=4) mvbw2 = MAnalyse(super, isb=True, delta=2, overlap=4) mvbw = MAnalyse(super, isb=True, delta=1, overlap=4) mvfw = MAnalyse(super, isb=False, delta=1, overlap=4) mvfw2 = MAnalyse(super, isb=False, delta=2, overlap=4) mvfw3 = MAnalyse(super, isb=False, delta=3, overlap=4) MDeGrain3(last, super, mvbw, mvfw, mvbw2, mvfw2, mvbw3, mvfw3) Results for BlockFPS test (change frame rate: 23.97->25): Code:
8 threads: VapourSynth Windows = 289.26 fps (99% cpu) VapourSynth Linux = 312.38 fps (99% cpu) AviSynth firesledge = 51.66 fps (30% cpu) 1 thread: VapourSynth Windows = 55.98 fps (12% cpu) VapourSynth Linux = 83.92 fps (12% cpu) AviSynth Vanilla = 45.41 fps (12% cpu) AviSynth SVP = 60.09 fps (12% cpu) Code:
import vapoursynth as vs core = vs.get_core() # threads=1 v = core.lsmas.LWLibavSource(r'720x480 YUV420P8 mpeg2.mkv') super = core.mv.Super(v) mvbw = core.mv.Analyse(super, isb=True, delta=1, overlap=0) mvfw = core.mv.Analyse(super, isb=False, delta=1, overlap=0) v = core.mv.BlockFPS(clip=v, super=super, mvbw=mvbw, mvfw=mvfw) Code:
LWLibavVideoSource("720x480 YUV420P8 mpeg2.mkv") super = MSuper() mvbw = MAnalyse(super, isb=True, delta=1, overlap=0) mvfw = MAnalyse(super, isb=False, delta=1, overlap=0) MBlockFps(super, mvbw, mvfw) Results for FlowFPS test (change frame rate: 23.97->25): Code:
8 threads: VapourSynth Windows = 29.83 fps (26% cpu) VapourSynth Linux = 39.08 fps (26% cpu) AviSynth firesledge = 19.38 fps (29% cpu) 1 thread: VapourSynth Windows = 11.76 fps (12% cpu) VapourSynth Linux = 13.81 fps (12% cpu) AviSynth Vanilla = 11.75 fps (12% cpu) AviSynth SVP = 15.73 fps (12% cpu) Code:
import vapoursynth as vs core = vs.get_core() # threads=1 v = core.lsmas.LWLibavSource(r'720x480 YUV420P8 mpeg2.mkv') super = core.mv.Super(v) mvbw = core.mv.Analyse(super, isb=True, delta=1, overlap=4) mvfw = core.mv.Analyse(super, isb=False, delta=1, overlap=4) v = core.mv.FlowFPS(clip=v, super=super, mvbw=mvbw, mvfw=mvfw) v.set_output() Code:
LWLibavVideoSource("720x480 YUV420P8 mpeg2.mkv") super = MSuper() mvbw = MAnalyse(super, isb=True, delta=1, overlap=4) mvfw = MAnalyse(super, isb=False, delta=1, overlap=4) MFlowFps(super, mvbw, mvfw) FlowFPS results were strange, not only it was not able to beat avisynth version, but it was not able to top the cores when multithreading was used (only one thread is maxed out). Last edited by Are_; 4th February 2015 at 13:47. |
3rd February 2015, 15:20 | #94 | Link |
unsigned int
Join Date: Oct 2012
Location: 🇪🇺
Posts: 760
|
Thanks for the comparison!
FlowFPS is the only filter that still runs on a single thread. It's due to the way it's written. The input frames it needs can be generated in parallel, which is why you see some speed-up with 8 threads.
__________________
Buy me a "coffee" and/or hire me to write code! |
3rd February 2015, 15:31 | #95 | Link |
I'm Siri
Join Date: Oct 2012
Location: void
Posts: 2,633
|
is core.mv.degrain3 equal to Expr ([core.mv.degrain1 (clip, super, mvbw1, mvfw1).std.Lut ("x / 3"), core.mv.degrain1 (clip, super, mvbw2, mvfw2).std.Lut ("x / 3"), core.mv.degrain1 (clip, super, mvbw3, mvfw3).std.Lut ("x / 3")], ["x y + z +"]) ?
if so, I think I can extend the time radius to any int by script edit: typo Last edited by feisty2; 3rd February 2015 at 15:45. |
3rd February 2015, 15:52 | #96 | Link | |
Registered User
Join Date: Jun 2012
Location: Ibiza, Spain
Posts: 321
|
Quote:
Btw, I updated it with linux results, for some obscure reason, linux is still faster than windows, the power of -march=native? Last edited by Are_; 3rd February 2015 at 15:55. |
|
4th February 2015, 09:54 | #97 | Link |
unsigned int
Join Date: Oct 2012
Location: 🇪🇺
Posts: 760
|
Oh, I forgot: comparing to 2.5.11.3 isn't exactly fair anymore. I imported a change from the SVP fork which makes it a bit faster, so that's what should be used in comparisons.
__________________
Buy me a "coffee" and/or hire me to write code! |
4th February 2015, 22:10 | #100 | Link | |
Registered User
Join Date: Jul 2011
Posts: 1,121
|
Quote:
Something seems off though, was sure it used more CPU. But then again in my fast tests the FPS difference is like yours. It's very impressive, only downside is that i am so used to Avisynth that it's hard to get things going, luckily though it's still script which makes it fairly easy to understand |
|
Thread Tools | Search this Thread |
Display Modes | |
|
|