Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Capturing and Editing Video > Avisynth Usage

Reply
 
Thread Tools Search this Thread Display Modes
Old 24th August 2004, 16:42   #1  |  Link
Leak
ffdshow/AviSynth wrangler
 
Leak's Avatar
 
Join Date: Feb 2003
Location: Austria
Posts: 2,441
MMX optimized LeakKernelDeint 1.5.4

Well, the last version of KernelDeint came out about a year ago, so I decided to take a look at it since I've been heavily using it in my BlendBob plugin.

Much to my surprise, I found that the code was pure C++, with no SIMD optimizations in sight, so I went and optimized it (and took care of a handful of bugs while I was at it):

LeakKernelDeint 1.5.4 (with source)

I've revamped the motion mask code to use a combined motion mask for both chroma planes in YV12/YUY2, which gets rid of some stray chroma artefacts that the older versions of KernelDeint produced when it deinterlaced a spot on only one chroma plane instead of on both.

I've also made the code mirror the top and bottom 4 lines when deinterlacing them so the resolution at the top and bottom is kept intact - the older versions just duplicated every other line there.

Additionally, I've added a LeakKernelBob function to the plugin that takes the same parameters as LeakKernelDeint but does full framerate deinterlacing (i.e. it returns a frame for every field).

And, of course, I've made the whole thing faster using MMX...

(I assume that my pure C++ implementations of the various functions perform a bit worse than Donald's code, but unless you force those to be used it'll use the MMX implementations - which, on my machine, are about 2.5 times as fast as the original KernelDeint in YV12 when measured using AvsTimer, and even more in YUY2 and RGB32)

Enjoy.

np: Funkstörung - Captured In Tones (ft. Sarah Jay) (Disconnected)

Last edited by Leak; 6th November 2006 at 12:39.
Leak is offline   Reply With Quote
Old 24th August 2004, 16:54   #2  |  Link
Boulder
Pig on the wing
 
Boulder's Avatar
 
Join Date: Mar 2002
Location: Hollola, Finland
Posts: 4,448
Nice!

Thank you, I'll try to crash it
__________________
And if the band you're in starts playing different tunes
I'll see you on the dark side of the Moon...
Boulder is offline   Reply With Quote
Old 24th August 2004, 19:02   #3  |  Link
Chainmax
Huh?
 
Chainmax's Avatar
 
Join Date: Sep 2003
Location: Uruguay
Posts: 3,103
Awesome, I was going to rip my Lion King DVD in a couple of days and I always use KD as a postprocessor for Telecide. Thanks a bunch, Leak .

As a side question: what do you guy use for IVTC postprocessing and/or straight up deinterlacing?
Chainmax is offline   Reply With Quote
Old 24th August 2004, 19:06   #4  |  Link
Leak
ffdshow/AviSynth wrangler
 
Leak's Avatar
 
Join Date: Feb 2003
Location: Austria
Posts: 2,441
Quote:
Originally posted by Chainmax
Awesome, I was going to rip my Lion King DVD in a couple of days and I always use KD as a postprocessor for Telecide. Thanks a bunch, Leak .

As a side question: what do you guy use for IVTC postprocessing and/or straight up deinterlacing?
Well, there's that BlendBob plugin I've written... *whistles innocently*

It's all I've been using lately.

np: Manual - Baja Nights (Until Tomorrow)
Leak is offline   Reply With Quote
Old 24th August 2004, 23:54   #5  |  Link
Bogalvator
Registered User
 
Join Date: Jun 2003
Location: Northampton, England
Posts: 187
Nice work Leak.

A note about the KernelBob() - I notice you say it just does the same as the Scharfis' script. Do you think a "proper" bobbing option can be coded? The patent is still downloadable from the following thread:
http://neuron2.net/ipw-web/bulletin/...wtopic.php?t=9
Bogalvator is offline   Reply With Quote
Old 25th August 2004, 00:20   #6  |  Link
Leak
ffdshow/AviSynth wrangler
 
Leak's Avatar
 
Join Date: Feb 2003
Location: Austria
Posts: 2,441
Quote:
Originally posted by Bogalvator
A note about the KernelBob() - I notice you say it just does the same as the Scharfis' script. Do you think a "proper" bobbing option can be coded? The patent is still downloadable from the following thread:
http://neuron2.net/ipw-web/bulletin/...wtopic.php?t=9
Well, I haven't yet read the PDF (it's waaay to late for that today... ) but I was wondering about a few details of the algorithm itself, so thanks for that pointer.

But still - what do you mean by "proper bobbing"?

np: Lali Puna - Small Things (Faking The Books)
Leak is offline   Reply With Quote
Old 25th August 2004, 07:31   #7  |  Link
sh0dan
Retired AviSynth Dev ;)
 
sh0dan's Avatar
 
Join Date: Nov 2001
Location: Dark Side of the Moon
Posts: 3,480
Great! My favorite deinterlacer - now even faster!

__________________
Regards, sh0dan // VoxPod
sh0dan is offline   Reply With Quote
Old 25th August 2004, 11:09   #8  |  Link
Leak
ffdshow/AviSynth wrangler
 
Leak's Avatar
 
Join Date: Feb 2003
Location: Austria
Posts: 2,441
Quote:
Originally posted by sh0dan
Great! My favorite deinterlacer - now even faster!
Glad to be of service...

But still - where are the bug reports? Surely I must have messed up _somewhere_?

np: Underworld - Moaner (Beaucoup Fish)
Leak is offline   Reply With Quote
Old 25th August 2004, 12:45   #9  |  Link
Boulder
Pig on the wing
 
Boulder's Avatar
 
Join Date: Mar 2002
Location: Hollola, Finland
Posts: 4,448
I can't get it to crash, but it is nice and fast

KernelBobbing the old way with v1.40, I get ~0.33RT in CCE with my regular TV capture script. With v1.50, I get ~0.42RT so there's a real nice improvement. I wonder what those SSE optimizations might provide
__________________
And if the band you're in starts playing different tunes
I'll see you on the dark side of the Moon...
Boulder is offline   Reply With Quote
Old 25th August 2004, 14:11   #10  |  Link
Leak
ffdshow/AviSynth wrangler
 
Leak's Avatar
 
Join Date: Feb 2003
Location: Austria
Posts: 2,441
Quote:
Originally posted by Boulder
I can't get it to crash, but it is nice and fast

KernelBobbing the old way with v1.40, I get ~0.33RT in CCE with my regular TV capture script. With v1.50, I get ~0.42RT so there's a real nice improvement. I wonder what those SSE optimizations might provide
I assume you're doing a lot more than just KernelBobbing in your script, right?

Yeah, it's probably not the slowest filter in most chains, so it can't do wonders, but it's still nice to have some speedup...

I've sent an email to Milan, as I guess a sped up KernelDeint in ffdshow wouldn't be out-of-place, either...

np: Jimi Tenor - Moon Goddess (Beyond The Stars)
Leak is offline   Reply With Quote
Old 25th August 2004, 14:23   #11  |  Link
Boulder
Pig on the wing
 
Boulder's Avatar
 
Join Date: Mar 2002
Location: Hollola, Finland
Posts: 4,448
Quote:
Originally posted by Leak
I assume you're doing a lot more than just KernelBobbing in your script, right?
That's true My standard chain for pure interlaced streams is KernelBob-RemoveGrain-Crop-ColorYUV-RemoveDirt-BicubicResize-Blockbuster-Limiter-AddBorders-SeparateFields-SelectEvery-Weave-ConverttoYUY2. The ~25% speedup is very much appreciated, it used to take about 2h15min to encode a 45-min episode and now it takes a little over 1h45min Saves a decent amount of time a couple of times a week, I'd say.

Edited the chain a bit..
__________________
And if the band you're in starts playing different tunes
I'll see you on the dark side of the Moon...

Last edited by Boulder; 25th August 2004 at 14:36.
Boulder is offline   Reply With Quote
Old 25th August 2004, 16:50   #12  |  Link
Si
Simply me
 
Si's Avatar
 
Join Date: Aug 2002
Location: Lancashire, England
Posts: 610
Just a small point.

Shouldn't you call it something else?

What happens if Donald wants to update his version - you've pinched his filter name and version sequence

Please ignore this if he's given his permission

regards
Simon
Si is offline   Reply With Quote
Old 25th August 2004, 17:02   #13  |  Link
scharfis_brain
brainless
 
scharfis_brain's Avatar
 
Join Date: Mar 2003
Location: Germany
Posts: 3,607
@boulder: do converttoyuy2 BEFORE sepfields - selevery - weave
__________________
Don't forget the 'c'!

Don't PM me for technical support, please.
scharfis_brain is offline   Reply With Quote
Old 25th August 2004, 17:07   #14  |  Link
Boulder
Pig on the wing
 
Boulder's Avatar
 
Join Date: Mar 2002
Location: Hollola, Finland
Posts: 4,448
Quote:
Originally posted by scharfis_brain
@boulder: do converttoyuy2 BEFORE sepfields - selevery - weave
Sorry for all this OT,

but is that due to getting correct chroma upsampling or is there some other point? I thought that using ConverttoYUY2(interlaced=true) after reinterlacing worked correctly.
__________________
And if the band you're in starts playing different tunes
I'll see you on the dark side of the Moon...
Boulder is offline   Reply With Quote
Old 25th August 2004, 17:12   #15  |  Link
Leak
ffdshow/AviSynth wrangler
 
Leak's Avatar
 
Join Date: Feb 2003
Location: Austria
Posts: 2,441
Quote:
Originally posted by Si
Just a small point.

Shouldn't you call it something else?

What happens if Donald wants to update his version - you've pinched his filter name and version sequence

Please ignore this if he's given his permission
Well, I tried to ask him 2 weeks ago via PM in his forum, but I haven't heard from him - he's dropped off the planet again obviously...

I doubt he'll change his version much, as it hasn't been changed for a year, so I just went with the name and increased the minor version number, as it's mostly optimizations and small bugfixes...

np: Manual - Lunate (Until Tomorrow)
Leak is offline   Reply With Quote
Old 25th August 2004, 22:17   #16  |  Link
Cyberia
Moderator
 
Cyberia's Avatar
 
Join Date: Nov 2002
Location: Inside
Posts: 718
Don's away on personal matters. He'll be back when he can, no ETA.

I doubt if he'll mind keeping the same name for the program, but I almost would expect him to find a bug somewhere.

Are further SSE/SSE2/3DNOW/3DNOW2 optimizations possible?
Cyberia is offline   Reply With Quote
Old 25th August 2004, 22:28   #17  |  Link
Leak
ffdshow/AviSynth wrangler
 
Leak's Avatar
 
Join Date: Feb 2003
Location: Austria
Posts: 2,441
Quote:
Originally posted by Cyberia
Don's away on personal matters. He'll be back when he can, no ETA.
Well, I guess we'll hear from him then...

Quote:
I doubt if he'll mind keeping the same name for the program, but I almost would expect him to find a bug somewhere.
NOOOOOOOOOooooooo~!

Quote:
Are further SSE/SSE2/3DNOW/3DNOW2 optimizations possible?
I don't have an AMD CPU but a P4, so I couldn't really test 3DNOW, but SSE/SSE2 could be done. It's just that processing 8 pixels in parallel is easy in AviSynth, since the line pitch is a multiple of 8, whereas you have to be careful when processing 16 pixels in parallel so you don't read/write out of bounds...

I'll probably try it, but I guess I'll go back working on my BlendBob plugin first...

np: Radiohead - Kid A (Kid A)
Leak is offline   Reply With Quote
Old 26th August 2004, 01:21   #18  |  Link
Cyberia
Moderator
 
Cyberia's Avatar
 
Join Date: Nov 2002
Location: Inside
Posts: 718
Well I have an Athlon 3200 which has all those optimizations except SSE2. So even an SSE version would help.

Out of curiousity, does anyone know if AviSynth 3.0 will come in a 64-bit flavor?
Cyberia is offline   Reply With Quote
Old 26th August 2004, 09:23   #19  |  Link
kassandro
Registered User
 
Join Date: May 2003
Location: Germany
Posts: 502
Quote:
Originally posted by Leak
Well, I tried to ask him 2 weeks ago via PM in his forum, but I haven't heard from him - he's dropped off the planet again obviously...

I doubt he'll change his version much, as it hasn't been changed for a year, so I just went with the name and increased the minor version number, as it's mostly optimizations and small bugfixes...

np: Manual - Lunate (Until Tomorrow)
Why don't you call it FastKernenlDeint? After all you did substantial work and an own name is well deserved.

Quote:

It's just that processing 8 pixels in parallel is easy in AviSynth, since the line pitch is a multiple of 8, whereas you have to be careful when processing 16 pixels in parallel so you don't read/write out of bounds...
You should not assume that the line pitch is a multiple of 8, but you nevertheless can handle an arbitrary line pitch with SSE and even SSE2 effectively(you need a minimum width, though). Look at trbarry's Undot or my RemoveGrain for examples.
kassandro is offline   Reply With Quote
Old 26th August 2004, 10:28   #20  |  Link
Leak
ffdshow/AviSynth wrangler
 
Leak's Avatar
 
Join Date: Feb 2003
Location: Austria
Posts: 2,441
Quote:
Originally posted by kassandro
Why don't you call it FastKernenlDeint? After all you did substantial work and an own name is well deserved.
But it still does the same thing that Don's last version did, so I didn't really want to add yet another new name to the already long-enough list of filters...

Quote:
You should not assume that the line pitch is a multiple of 8,
http://www.avisynth.org/BensAviSynthDocs

Quote:
Buffers created by NewVideoFrame are always quadword (8-byte) aligned and always have a pitch that is a multiple of 8.
What other way than NewVideoFrame is there to create a frame, and why would anybody do something like that? And even then, my output buffer will be 8-byte aligned, so all I'd have to do is special-case the last line so I don't do an out-of-bounds read at the very end; if I happen to process some data from the next line at the line end that's not used it doesn't matter. Or I could just copy non-8-byte-aligned frames into a new frame; that's still faster than falling back to the C++-implementation.

Quote:
but you nevertheless can handle an arbitrary line pitch with SSE and even SSE2 effectively(you need a minimum width, though).
Yes, I know that. It's just that doing so (a loop for the length rounded down to 16 and another one for the beginning or end) would almost triple the size of my KernelDeint DLL compared to what it's now, when adding SSE2 routines alone would probably double it - and it's already over 10 times bigger than the old one.

Also, I'm not convinced that the speedup I'll get from going from MMX to SSE/SSE2 (which mostly added floating point stuff I wouldn't use anyway) is as big as the one I got from going from pure C++ to MMX; I'll do it, but it's further down my To-Do list than working on my own plugin again...

np: Markus Guentner - So Well (Audio Island)
Leak is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 01:53.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2018, vBulletin Solutions Inc.