Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Capturing and Editing Video > Avisynth Usage

Reply
 
Thread Tools Search this Thread Display Modes
Old 26th August 2004, 12:27   #21  |  Link
kassandro
Registered User
 
Join Date: May 2003
Location: Germany
Posts: 502
Quote:
Originally posted by Leak
What other way than NewVideoFrame is there to create a frame, and why would anybody do something like that? And even then, my output buffer will be 8-byte aligned, so all I'd have to do is special-case the last line so I don't do an out-of-bounds read at the very end; if I happen to process some data from the next line at the line end that's not used it doesn't matter. Or I could just copy non-8-byte-aligned frames into a new frame; that's still faster than falling back to the C++-implementation.
You are right: with external filters you can only creat frames with NewVideoFrame, but the internal filters can! For instance, frames generated by crop are usually not aligned to avoid an unnecessary bitblt. Only if you crop with align=true, which is not the default you get a properly aligned frame. While you are correct: it is nearly impossible to get a read access error, if you read only a few bytes beyond the allowed range (a memory page is at least 4 kb), it is simply a dirty programming style to do so. If your mind is not sharpened for these kind of problems, you will sooner or later end up in a mess - at least in larger projects.

Quote:

Also, I'm not convinced that the speedup I'll get from going from MMX to SSE/SSE2 (which mostly added floating point stuff I wouldn't use anyway)
SSE contains a lot more than some new floating point instructions. While in SSE you can do only single precision floating point stuff in SSE registers, you can do a lot more in the MMX registers. For instance, the very useful instructions pminub and pmaxub are SSE only and you only can emulate them with MMX and the extremely powerful psadbw instruction cannot even emulated. Though there are only very few new integer SSE instruction, they are very useful and fill a gap left by MMX. I could have implemented RemoveGrain (not RemoveDirt) for MMX only, because it doesn't use psadbw, but it would have been much slower. The fun starts with SSE and not with MMX.
kassandro is offline   Reply With Quote
Old 26th August 2004, 13:04   #22  |  Link
sh0dan
Retired AviSynth Dev ;)
 
sh0dan's Avatar
 
Join Date: Nov 2001
Location: Dark Side of the Moon
Posts: 3,480
A sidenote should be that the MMX-extensions kassandro mentions are also refered to as Integer SSE, which is present on 95% of all processors using AviSynth today.

However, I didn't find any obvoius places it would make sense to apply.

A thing you use a lot:
Code:
mov ebx,043544354h ; 32768*0.526
movd mm2,ebx
G.P. Register <-> MMX register transfers are bad (=slow). You should either 1) Read it from memory to MMX. or 2) store GPR in memory + read from memory with MMX (yes, this is faster - the CPU will be able to do a Store->Load Forward, if the memory is aligned.


__________________
Regards, sh0dan // VoxPod

Last edited by sh0dan; 26th August 2004 at 13:15.
sh0dan is offline   Reply With Quote
Old 26th August 2004, 15:24   #23  |  Link
Leak
ffdshow/AviSynth wrangler
 
Leak's Avatar
 
Join Date: Feb 2003
Location: Austria
Posts: 2,441
Quote:
Originally posted by sh0dan
A thing you use a lot:
Code:
mov ebx,043544354h ; 32768*0.526
movd mm2,ebx
G.P. Register <-> MMX register transfers are bad (=slow). You should either 1) Read it from memory to MMX. or 2) store GPR in memory + read from memory with MMX (yes, this is faster - the CPU will be able to do a Store->Load Forward, if the memory is aligned.
Well, that might be because when I started dabbling around with assembler 486s were the top of the line and accessing registers was much less expensive than accessing memory - guess I haven't gotten over that yet...

I guess I'll fix that and do a comparison; are you sure the difference will be really noticeable?

(EDIT: Okay, so my test script went from 338 FPS to 342 FPS with this change; good to have, but still hardly noticeable...)

Then again, I also made the mistake to use MOVNTQ for writing to the target buffer after reading it's description in Intel's docs thoroughly, only to discover that it tears the filter's performance to shreds...

np: Komeit - When The Sun Hits (Blue Skied An' Clear comp.)

Last edited by Leak; 26th August 2004 at 16:19.
Leak is offline   Reply With Quote
Old 26th August 2004, 15:31   #24  |  Link
Leak
ffdshow/AviSynth wrangler
 
Leak's Avatar
 
Join Date: Feb 2003
Location: Austria
Posts: 2,441
Quote:
Originally posted by kassandro
You are right: with external filters you can only creat frames with NewVideoFrame, but the internal filters can! For instance, frames generated by crop are usually not aligned to avoid an unnecessary bitblt.
Ugh. Didn't think about crop there...

Quote:
Only if you crop with align=true, which is not the default you get a properly aligned frame. While you are correct: it is nearly impossible to get a read access error, if you read only a few bytes beyond the allowed range (a memory page is at least 4 kb), it is simply a dirty programming style to do so. If your mind is not sharpened for these kind of problems, you will sooner or later end up in a mess - at least in larger projects.
I know that. As I said, it didn't occur to me that crop would fail the assumption I had made; but still - all I need to special case is the last line (assuming the line is at least 8 bytes long, which would IMHO be a sensible constraint to be checked in the constructor) of each field - still a bit messy, but it takes a lot less code duplication. Overreading from one of the other lines into the next one is totally harmless.

Quote:
SSE contains a lot more than some new floating point instructions. While in SSE you can do only single precision floating point stuff in SSE registers, you can do a lot more in the MMX registers. For instance, the very useful instructions pminub and pmaxub are SSE only and you only can emulate them with MMX and the extremely powerful psadbw instruction cannot even emulated.
Yeah, it could be emulated, but it'd need a lot of effort... still, I can't really use it as I need the absolute difference for each pixel, not the sum of them - and that's something that's doable with reasonable effort in MMX. If you add some shuffling and unpacking you can emulate PSADBW...

Quote:
Though there are only very few new integer SSE instruction, they are very useful and fill a gap left by MMX. I could have implemented RemoveGrain (not RemoveDirt) for MMX only, because it doesn't use psadbw, but it would have been much slower. The fun starts with SSE and not with MMX.
As I said, I'll do a SSE/SSE2 version as well, but it's further down my to-do list; I started with an MMX version as I wanted it to run on every machine that's capable of running AviSynth.

It's already a lot faster than the last version, and I'm quite happy about that, and to be totally honest I did it to be able to integrate part of it into BlendBob...

np: The Notwist - Trashing Days (Neon Golden)
Leak is offline   Reply With Quote
Old 26th August 2004, 23:24   #25  |  Link
Leak
ffdshow/AviSynth wrangler
 
Leak's Avatar
 
Join Date: Feb 2003
Location: Austria
Posts: 2,441
New Version 1.5.1

KernelDeint 1.5.1 (with source) (Old version; see first post for newest version)

This version includes the changes sh0dan suggested (which resulted in a small speedup) and doesn't read beyond the end of frames anymore if your pitch doesn't happen to be a multiple of 8.

EDIT: There's one other change I forgot - in 1.5.0 the order parameter was inverted for RGB32 video; is anybody even using it for that?

Still, if your pitch isn't evenly divisible by 8, you'll get a minor drop in speed if the frame is still aligned on an 8 byte boundary (for example if you just crop off something to the right), but you'll get a bigger speed drop if that's not the case as unaligned memory access just plain takes longer (which can happen when you cut off stuff on the left with crop) - in that case, always use "align=true" with crop.

Or, in numbers:

Code:
Testclip                       FPS
=============================  ===
Normal                         332
Crop(0,0,716,480,align=false)  329
Crop(2,0,716,480,align=false)  282
Crop(2,0,716,480,align=true)   325
My testclip is a 720x480 VOB read via MPEG2Source, and the FPS of doing a KernelBob(1,8,linked=false) were measured using AvsTimer.

np: Markus Guentner - Sleep Well (Audio Island)

Last edited by Leak; 15th January 2005 at 19:47.
Leak is offline   Reply With Quote
Old 27th August 2004, 00:26   #26  |  Link
Bogalvator
Registered User
 
Join Date: Jun 2003
Location: Northampton, England
Posts: 187
Quote:
Originally posted by Leak
But still - what do you mean by "proper bobbing"?
In KernelBob's current state it is working as a single rate deinterlacer, shifted by a field and then simply reinterleaved.

It seems to me that the aim of a "proper" bobber would be to return each field to to it's full resolution - which may involve different strategies than that of a deinterlacer thats aiming to give best results for 25 fps or 29.97 progressive output.

I hope that made some sort of sense.
Bogalvator is offline   Reply With Quote
Old 27th August 2004, 13:59   #27  |  Link
Xesdeeni
Registered User
 
Join Date: Aug 2002
Posts: 467
Quote:
Originally posted by Bogalvator
In KernelBob's current state it is working as a single rate deinterlacer, shifted by a field and then simply reinterleaved.

It seems to me that the aim of a "proper" bobber would be to return each field to to it's full resolution - which may involve different strategies than that of a deinterlacer thats aiming to give best results for 25 fps or 29.97 progressive output.

I hope that made some sort of sense.
As I understand it, bob was created for displaying an interlaced video on a progressive display, normally creating the same number of progressive frames as input fields. I.E. PAL would result in 50 progressive fps, while NTSC would result in 59.94 progresive fps (obviously then using frame duplication for display at 70, 72, 75, 85 etc. Hz).

Deinterlacing is basically the process of creating the progressive frames from the interlaced input. But the term also seems to have evolved into the process that outputs progressive frames at the input frame rate (i.e. PAL would result in 25 progressive fps, while NTSC would result in 29.97 progressive fps), normally to improve video encoding by feeding progressive instead of interlaced frames to the codec.

In my book, technically inverse telecine is deinterlacing as well, but it's so specific to film (and the output for NTSC is not the input frame/field rate) I think it's a class of its own.

So at the risk of being pedantic, I guess I'd say a "proper deinterlacer" would use any technique possible (including inverse telecine, field matching, etc.) to create a progressive output at the input frame rate (25 or 29.97), while "proper bob" would use the same techniques to create a progressive output at the input field rate (50 or 59.94).

BTW, my interest in deinterlacing is for standards conversion, so bob is much more useful to me .

Oh, and just to throw out a little controversy , the better the deinterlacer, the harder the image will be to compress. So saying that the output of a particular deinterlacer is "harder to compress" can mean the quality is better! [To prove this, you'd need perfect deinterlacing. The closest we have is inverse telecine. So if you don't believe me, take a telecined video and IVTC it, and compare compressing this to a video deinterlaced using your favorite deinterlacer. (Be sure to match frame rates.) Barring out-and-out failures of the deinterlacer (combing left in), the IVTC should be at least as difficult to compress, and usually more difficult.]

Xesdeeni
Xesdeeni is offline   Reply With Quote
Old 28th August 2004, 15:21   #28  |  Link
Nicholi
Registered User
 
Join Date: Apr 2003
Location: Lancaster, CA
Posts: 89
The shiny packages reference to the chroma artifacts possibly being lessened doesn't seem to hold true for all the things I've tried so far.

Wondering if anyone else has though? And what type of source?
Dealing mostly with anime's here and the 1.4.0 and 1.5.1 images look exactly the same.

Last edited by Nicholi; 28th August 2004 at 16:06.
Nicholi is offline   Reply With Quote
Old 28th August 2004, 15:54   #29  |  Link
Leak
ffdshow/AviSynth wrangler
 
Leak's Avatar
 
Join Date: Feb 2003
Location: Austria
Posts: 2,441
Quote:
Originally posted by Nicholi
The shiny packages reference to the chroma artifacts possibly being lessened doesn't seem to not hold true for all the things I've tried so far.
Whoa there - could you please use less double negatives? I'm having a hard time figuring out if you get more or less chroma artifacts.

Quote:
Wondering if anyone else has though? And what type of source?
Dealing mostly with anime's here and the 1.4.0 and 1.5.1 images look exactly the same.
Could you please post some images then?

Also, what I've fixed was one kind of chroma artifacts (which produced some faint chroma ghosting when one plane was deinterlaced and the other wasn't), I didn't say it'd fix _ALL_ possible chroma artifacts; if you mean the slight ghosting KernelDeint causes now and then then that's something inherent in the algorithm that can't be avoided.

Try using a threshold of 0 on the images you get artifacts on, if they stay they're not of the kind that my change fixes.

np: Sole - Teepee On A Highway Blues (Selling Live Water)
Leak is offline   Reply With Quote
Old 28th August 2004, 16:11   #30  |  Link
Nicholi
Registered User
 
Join Date: Apr 2003
Location: Lancaster, CA
Posts: 89
Err yeah sorry. Edited accordingly. I suppose we are talking about different things, my mistake. I speak of the usual "ghosted" lines from the deinterlaced movement. Already on thresh=0 also.

I would upload the pics if I could, but alas I have no webhost or such. So it seems what I was looking for is not actually here, heh. And also unavoidably removed...oh well. There is always Sangnom to alternate with on occasion.

Thank you for continu'ing work on an already great filter however.
KernelDeint is my default choice for deinterlacing, and mayhaps when DG returns he will hopefully implement your hard work.
Nicholi is offline   Reply With Quote
Old 28th August 2004, 16:56   #31  |  Link
DDogg
Retired, but still around
 
DDogg's Avatar
 
Join Date: Oct 2001
Location: Lone Star
Posts: 3,058
Nicholi, - For image hosting try this. It is great. You don't even have to register.
DDogg is offline   Reply With Quote
Old 6th September 2004, 16:30   #32  |  Link
Boulder
Pig on the wing
 
Boulder's Avatar
 
Join Date: Mar 2002
Location: Hollola, Finland
Posts: 4,475
There seems to be something wrong with KernelBob.

With this script I get a bad result:
MPEG2Source("c:\temp\captures\startrek.d2v",idct=7)
KernelBob(order=1,sharp=true,threshold=7)
SeparateFields()
SelectEvery(4,1,2)
Weave()

With this script it's OK:
MPEG2Source("c:\temp\captures\startrek.d2v",idct=7)
KernelBob(order=1,sharp=true,threshold=7)
ConverttoYUY2()
SeparateFields()
SelectEvery(4,1,2)
Weave()

Here are the screenshots:

YV12


YUY2


The original material


If I replace KernelBob with a simple Bob(), both scripts give a proper result. I didn't test whether the v1.4.0 with scharfis_brain's function would behave the same way.

The screenshots are not exactly at the same frame so don't pay attention to that. Just pay attention to the amount of combing in the subtitles in the YV12 version.

The funny thing is that this weird behaviour is not seen throughout the whole clip, it's just a small portion in the middle. I also tried order=0 and SelectEvery(4,0,3) but it didn't help.
__________________
And if the band you're in starts playing different tunes
I'll see you on the dark side of the Moon...
Boulder is offline   Reply With Quote
Old 6th September 2004, 17:31   #33  |  Link
Boulder
Pig on the wing
 
Boulder's Avatar
 
Join Date: Mar 2002
Location: Hollola, Finland
Posts: 4,475
Just tried it with KernelDeint v1.5.1 and scharfis_brain's function from the Restore24 package, and it gives a correct output without a need to convert to YUY2. So there's something wrong with Leak's implementation of the function

Code:
function kernelbob(clip a, int "th",bool "mask")
{	mask=default(mask,false)
	th=default(th,5)
	ord = getparity(a) ? 1 : 0
	f=a.kerneldeint(order=ord, sharp=true, twoway=false, threshold=th,map=mask) 
	e=a.separatefields.trim(1,0).weave.kerneldeint(order=1-ord, sharp=true, twoway=false, threshold=th,map=mask)
	interleave(f,e).assumeframebased
}
__________________
And if the band you're in starts playing different tunes
I'll see you on the dark side of the Moon...
Boulder is offline   Reply With Quote
Old 6th September 2004, 18:15   #34  |  Link
Leak
ffdshow/AviSynth wrangler
 
Leak's Avatar
 
Join Date: Feb 2003
Location: Austria
Posts: 2,441
Quote:
Originally posted by Boulder
There seems to be something wrong with KernelBob.

...

If I replace KernelBob with a simple Bob(), both scripts give a proper result. I didn't test whether the v1.4.0 with scharfis_brain's function would behave the same way.

The screenshots are not exactly at the same frame so don't pay attention to that. Just pay attention to the amount of combing in the subtitles in the YV12 version.

The funny thing is that this weird behaviour is not seen throughout the whole clip, it's just a small portion in the middle. I also tried order=0 and SelectEvery(4,0,3) but it didn't help.
Hmmm... yeah, taking a close look at KernelBobs output it seems something isn't totally right; could you try setting the threshold to 0 and have a look at the b0rked segment again? I have the nagging feeling that I've got an error in my motionmask code, so if there's no artifacts using a threshold of 0 (which turns KernelDeint into something closer to a regular Bob()) I know where to look.

Also, could you cut out a short part of that sequence and upload it somewhere?

I'm pretty sure it's got something to do with the order of the fields getting passed into MotionMask, but I can't put my finger on it...

np: Plaid - Crumax Rins (Spokes)
Leak is offline   Reply With Quote
Old 6th September 2004, 18:17   #35  |  Link
erratic
member
 
erratic's Avatar
 
Join Date: Oct 2003
Location: Belgium
Posts: 106
I have noticed that with Leak's KernelBob I have to use SelectEvery(4,0,3) to maintain the field order. If I use SelectEvery(4,1,2) the field order is reversed. This happens with both TFF and BFF sources.

No matter what the source is, with scharfis_brain's kernelbob function I have to use SelectEvery (4,1,2) to get TFF, and SelectEvery(4,0,3) to get BFF.
erratic is offline   Reply With Quote
Old 6th September 2004, 18:28   #36  |  Link
Leak
ffdshow/AviSynth wrangler
 
Leak's Avatar
 
Join Date: Feb 2003
Location: Austria
Posts: 2,441
Quote:
Originally posted by erratic
I have noticed that with Leak's KernelBob I have to use SelectEvery(4,0,3) to maintain the field order. If I use SelectEvery(4,1,2) the field order is reversed. This happens with both TFF and BFF sources.

No matter what the source is, with scharfis_brain's kernelbob function I have to use SelectEvery (4,1,2) to get TFF, and SelectEvery(4,0,3) to get BFF.
That might be because I do an AssumeTFF() internally before calling SeparateFields in my filter to get a fixed field order. Maybe this problem also crops up because I'm not doing an AssumeFrameBased() at the end of KernelBob - Boulder, could you try if adding that after KernelDeint helps?

np: Plaid - Assault On Precinct Zero (Double Figure)
Leak is offline   Reply With Quote
Old 6th September 2004, 18:35   #37  |  Link
Boulder
Pig on the wing
 
Boulder's Avatar
 
Join Date: Mar 2002
Location: Hollola, Finland
Posts: 4,475
OK, I'll do the test and get you a small sample tomorrow - unfortunately I probably won't have the time today
__________________
And if the band you're in starts playing different tunes
I'll see you on the dark side of the Moon...
Boulder is offline   Reply With Quote
Old 6th September 2004, 18:39   #38  |  Link
erratic
member
 
erratic's Avatar
 
Join Date: Oct 2003
Location: Belgium
Posts: 106
I just ran a short test and with AssumeFrameBased after KernelBob it behaves like scharfis_brain's kernelbob function: SelectEvery(4,1,2) results in TFF, SelectEvery(4,0,3) results in BFF.

EDIT: as far as the field order is concerned, Avisynth's internal Bob() command works like scharfis_brain's kernelbob function.

Last edited by erratic; 6th September 2004 at 19:47.
erratic is offline   Reply With Quote
Old 6th September 2004, 18:39   #39  |  Link
Mug Funky
interlace this!
 
Mug Funky's Avatar
 
Join Date: Jun 2003
Location: i'm in ur transfers, addin noise
Posts: 4,547
that's bizarre, because i haven't encountered this at all and i've been using the leak version since it was released.

maybe i should check my plugin directory for avsi files with kernelbob in them?
__________________
sucking the life out of your videos since 2004
Mug Funky is offline   Reply With Quote
Old 6th September 2004, 18:55   #40  |  Link
Leak
ffdshow/AviSynth wrangler
 
Leak's Avatar
 
Join Date: Feb 2003
Location: Austria
Posts: 2,441
Quote:
Originally posted by Mug Funky
that's bizarre, because i haven't encountered this at all and i've been using the leak version since it was released.

maybe i should check my plugin directory for avsi files with kernelbob in them?
Nah, it's true. My KernelBob is not doing the AssumeFrameBased() after bobbing, so the frames seem to keep the TFF flag I force on them in my filter when doing another SeparateFields() afterwards - which of course was something I didn't do when testing, as I'm always going for a progressive result and which won't cause havoc until going back to fieldbased processing (which I hardly ever do) afterwards...

Try this: add a call to Info() after KernelBob() and then add a AssumeFrameBased() in between and compare the parity...

Still, a AssumeFrameBased() after KernelBob() should do the trick until I release the next version; it's just that I currently don't have much time to work on AviSynth filters...

Now I'm just wondering if that was what bit Boulder as well...

np: The Black Dog - Frisbee Skip (Spanners)
Leak is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 20:02.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2019, vBulletin Solutions Inc.