PDA

View Full Version : General-purpose masking filters?


primitive
4th February 2003, 01:34
It seems like we have a lot of neat filters that calculate a mask for the video clip and then process things that fall under / don't fall under that mask.

For example, we have MSmooth / MSharpen which detect edges, BlockBuster which detects areas most likely to be afflicted with DCT blocks, etc.

My question: would it be simple to convert these masking algorithms into filters that just apply the masks and return the relavant clips? For example, it would make something like this possible:

clip = MPEG2Source("test.d2v")

#split into edges and not edges
clip_edges = clip.EdgeMask(strength=5)
clip_notedges = clip.EdgeMask(inverse=true,strength=5)

#process edges
...

#process parts of the clip which aren't edges
clip_notedges_block1 = clip_notedges.DCTBlockMask(block_size=4,detail_min=1,detail_max=20)
clip_notedges_block2 = clip_notedges.DCTBlockMask(block_size=4,detail_min=20,detail_max=100)

#more processing here
...

#splice the clips back together
clip_notedges_final = Layer(clip_noedges_block1,clip_notedges_block2)
clip_final = Layer(clip_edges,clip_notedges)

return clip_final

---

It seems to me that these masking functions could be more useful than they already are if they are broken out of their respective filters so we can play with them more.

-p

Richard Berg
4th February 2003, 05:19
This is already possible. I know MSharpen, for instance, has a parameter along the lines of "showmask=true" that can then be fed to Mask/Layer -- I've done it myself. You can also use GeneralConvolution to simulate edge masks via Sobel matrices; there's an example of this somewhere on avisynth.org.

primitive
4th February 2003, 06:46
Originally posted by Richard Berg
This is already possible. I know MSharpen, for instance, has a parameter along the lines of "showmask=true" that can then be fed to Mask/Layer -- I've done it myself. You can also use GeneralConvolution to simulate edge masks via Sobel matrices; there's an example of this somewhere on avisynth.org.

Does Layer() support YV12?

Richard Berg
4th February 2003, 07:19
Not to my knowledge.

primitive
4th February 2003, 19:34
Here's the problem I think needs to be addressed here: too many filters duplicate concerns that would better served by having a single function that all the functions can call.

For example, take MSmooth and MSharpen. They both use an edge masking algorithm to apply a Blur or a Sharpen filter to specific parts of the image. From a high level, this approach has several faults:

- A change to the edge masking algorithm must be manually applied to both filters. Faults can crop up here.
- Assuming that the code for blurring and sharpening in MSmooth and MSharpen is reimplemented within those filters rather than called from the Avisynth core, bugfixes and speed improvements for Avisynth's internal routines do not carry over to the filter.
- It is conceviable that someone would want to apply a filter such as Con3d to a clip which has been edge-masked. As it is right now, this is not possible unless that filter's author reimplements Con3d's routines and adds another argument to his filter: "con3d=[true|false]". But what if a user wants to run Con3d on the edge-masked clip, and then say WarpSharp? As it is right now, this is not possible.

My point in brief: Avisynth is a piece of software that supports complex and robust scripting. In this way, it is philosophically a very UNIX-like piece of software. This point is supported by the internal filters present in Avisynth: each of the internal filters does one thing, and does it well.

Many user-coded plugins, on the other hand, are monolithic, like Windows applications tend to be. Rather than each plugin doing one thing and doing it well, many plugins in some way reimplement each other at some level. This is not optimal from a software engineering standpoint, and the end-user has less power using monolithic filters than he would have if he were able to use atomic filters (as justified above).

-p

*edit* word choice

neuron2
4th February 2003, 19:52
Please accept my apologies for robbing you. I can avoid it by not releasing any more filters.

primitive
4th February 2003, 20:08
Originally posted by neuron2
Please accept my apologies for robbing you. I can avoid it by not releasing any more filters.

This is not a request, simply an observation. And I didn't single out your filters for any reason other than the fact that they are the ones I use the most often and find the most useful. Off the top of my head, Blockbuster also uses the mask + process approach, so I could level the same "criticism" at that filter as well (which isn't really criticism because all these filters are quite useful as they are).

Open source development is characterized by groups of developers that band together to scratch a collective itch. I believe that more atomic plugins will increase Avisynth's power as an editing tool. I'm a Linux guy, so I appreciate the power of having a collection of atomic tools that are easily bound together by a robust scripting language. Avisynth provides this robust scripting environment, so why not use it to its fullest?

-p

primitive
4th February 2003, 23:14
*edit* removed

neuron2
4th February 2003, 23:14
I wanna cut some boards. I better pull out my generic motor and my generic blade and my generic power supply and my generic handle assembly and spend hours assembling them together. Of course, having a separate motor in all my power tools and my vacuum cleaner, etc., is really ugly and suboptimal, so I wouldn't even consider it. Of course, since I have to have a common motor I can't specialize the tools, so they'll all have to sacrifice power and functionality to use the common motor, but hey, I'm Unix-like!

And by the way, MSharpen and MSmooth cannot easily use a common atomic edge mask algorithm.

Performance is probably the most important criterion for video work. Atomicity works against that.

I already deleted my ill-chosen words before your response. Remember you cannot be robbed of something you never had.

primitive
4th February 2003, 23:38
Originally posted by neuron2
I wanna cut some boards. I better pull out my generic motor and my generic blade and my generic power supply and my generic handle assembly and spend hours assembling them together. Of course, having a separate motor in all my power tools and my vacuum cleaner, etc., is really ugly and suboptimal, so I wouldn't even consider it. Of course, since I have to have a common motor I can't specialize the tools, so they'll all have to sacrifice power and functionality to use the common motor, but hey, I'm Unix-like!

Two observations.
- An improvement to any one part of the generic setup should improve whatever you combine the generic components to make. If most operations are atomic, small optimizations in one component can make big gains across the board.
- Software is not a material good. Though it may take you a while to piece together a script function for the first time, it's not like everyone who ever uses that function has to make that kind of time investment, and the function itself can be distributed for free.

And by the way, MSharpen and MSmooth cannot easily use a common atomic edge mask algorithm.

Performance is probably the most important criterion for video work. Atomicity works against that.

I was of the impression that "show=true" for both MSmooth and MSharpen would both produce the same output for a particular clip at any particular strength, though I made this assertion without testing to see if it was true.

Also, is there something about the Avisynth architecture that would make clip.EdgeMask(inverse=true,strength=5).Convolution3d(preset="AnimeHQ") slower than clip.MSmooth(strength=5,Con3d=true,preset="AnimeHQ") assuming that Con3d was integrated into MSmooth?

-p

neuron2
4th February 2003, 23:50
Originally posted by primitive
I was of the impression that "show=true" for both MSmooth and MSharpen would both produce the same output for a particular clip at any particular strength, though I made this assertion without testing to see if it was true.You didn't carefully read what I said.

I don't know why you chose me to rag on. MSharpen and MSmooth already have options to output their edge masks. Are you ragging on me because I didn't code them to read other (nonexistent) ones, or because I had the audacity to want an integrated application? Tell me exactly what you are saying I should do with respect to these filters that you singled out, presumably as paradigmatic of your case. Thank you.

Extra copying will often be required to make things atomic. There will also be an additional load on the Avisynth frame cache.

primitive
5th February 2003, 01:12
Originally posted by neuron2
You didn't carefully read what I said.

So there is an algorithmic concern that makes it inappropriate to use MSharpen's edge-detection algorithm with MSmooth's smoothing method and vice-versa. Particular optimizations are taken in MSharpen's edge-masking algorithm that are inappropriate / invalid to take in MSmooth's edge-masking algorithm. A general edge-masking algorithm would be so slow as to be mostly useless.

I don't know why you chose me to rag on. MSharpen and MSmooth already have options to output their edge masks. Are you ragging on me because I didn't code them to read other (nonexistent) ones, or because I had the audacity to want an integrated application? Tell me exactly what you are saying I should do with respect to these filters that you singled out, presumably as paradigmatic of your case.

I "picked on" your filters (MSmooth in particular) because I am familiar enough with their use to fully comprehend both their advantages and limitations. MSmooth produces fantastic results in my experience (anime in particular), but I couldn't help wondering what would happen if a spatio-temporal smoothing algorithm (like Con3d) were used for the smoothing routine rather than a comparatively-simple 3x3 blur. Then I started thinking about downloading the source to MSmooth and Convolution3d and making the code work together. I decided this would be doable, but would probably be a bad way to solve the problem because:

- I'd have to manually update my MSmooth/Con3d hybrid every time a new version of either filter is released. That'd be a pain in the ass, and I'd probably find a strange and obscure way to introduce faults.
- I'd have to add more arguments to MSmooth to make it work with the Con3d smoothing method, including a modality trigger (mode="3x3" or mode="Con3d"). This type of tying-together of functionality is called "logical cohesion" and is highly undesirable. Reference: http://www.it.lut.fi/opetus/99-00/010758000/Lectures/lecture7.html

The other option is to break the edge-masking functionality into its own plugin and then let the user specify the smoothing method. This seems attractive because the MSmooth documentation implies that MSmooth's smoothing method is similar to Avisynth's internal Blur(). This also avoids the cohesion problem referred to earlier by requiring neither a modality switch nor separately compiled versions of what is essentially the same filter.

Extra copying will often be required to make things atomic. There will also be an additional load on the Avisynth frame cache.

What %overhead, in terms of speed, do you forsee this extra copying imposing? When you say it would impose an additional load on the Avisynth frame cache, is this something that could be relieved by increasing the amount of memory Avisynth uses?

-p

neuron2
5th February 2003, 01:52
Specifically, what are you asking me to do with MSmooth and MSharpen? As I said, I already output the edge map and if someone else wants to write a different back end filter that reads the edge map, there's nothing to stop them.

I'm really not interested in general tools versus applications discussions or academic arguments about software "purity". If you don't have any specific requests for me I will bow out.

Richard Berg
5th February 2003, 05:35
I like what you're saying, primitive, but am as mystified as neuron2 WRT what the Avisynth core team should do (in your opinion). Are you proposing any backend changes, or can your ideas be implemented adequately in the current system?

primitive
5th February 2003, 06:33
Originally posted by Richard Berg
I like what you're saying, primitive, but am as mystified as neuron2 WRT what the Avisynth core team should do (in your opinion). Are you proposing any backend changes, or can your ideas be implemented adequately in the current system?

- The Mask() family of functions are only implemented in the core for RGB32, and Layer() is only implemented for RGB32 and YUY2. If we start using Mask() and Layer() often when processing video, these color-space conversions will start to hurt, both speed-wise and color integrity-wise. These functions would need to be brought up to speed to support YU12. (I thought I read something by sh0dan in another thread that said this was already underway.)

- Right now, it looks like it's possible to extract a mask using MSmooth in conjunction with ColorKeyMask(). You should be able invert that mask as well with creative use of Subtract(). However, using this method hurts, and using it to get an inverted mask really hurts (because you have to mask the MSmooth-generated frame, then use that masked frame again as a mask against that same frame in its unprocessed form). Rather than go through this runaround, it'd be cleaner (and faster) for the filter to output masked pixels directly; this would eliminate the need to call any of the Mask() functions altogether.

(primitive)

Richard Berg
5th February 2003, 07:18
Layer() et al. will definitely support YV12 in the near future -- problem is, most of the transparency operations have no effect when you don't have an alpha channel.

vlad59
5th February 2003, 08:53
Hi,

I have to admit I mainly agree with Primitive (Althought I'm not a linux guy).

I sometimes dreamed of an internal integrated tool box in addition to
env->BitBlt for example :
Env->EdgeMask (Method=[Fast|Slow|...]
Env->MotionMask (...)
etc

Because the same method are used in many filters. I know that if wanted to use an edgemask. My first action would be to ask Donald his permission to use his edge detection algo to make primary test and then if the results are good I'll try to find a better one or to optimize Donald's one.
That would be cool to have an integrated tool box.

But by reading Primitive post, it's true that it would be even better to allow the end user to use the same trick.
I have not thought about it long enought to say wether it's possible or not.

I'll have to think about it.

neuron2
5th February 2003, 14:20
I release my source code under GPL so that others may use it. As long as GPL is followed there is never any need to ask my permission for anything.

vlad59
5th February 2003, 14:36
Originally posted by neuron2
I release my source code under GPL so that others may use it. As long as GPL is followed there is never any need to ask my permission for anything.

It's not my fault ;)
My mother always said me to ask even if I knew the answer will be yes ;) .

neuron2
5th February 2003, 17:34
Vlad59, you are a gentleman and a scholar, a true model of integrity and propriety for us all. I salute you.

vlad59
5th February 2003, 17:39
Originally posted by neuron2
Vlad59, you are a gentleman and a scholar, a true model of integrity and propriety for us all. I salute you.

I'll forward this to my mother immediatly, I hope she'll believe you ;) :D ;) :D

Sorry Primitive for being OT ;)

primitive
5th February 2003, 20:08
Originally posted by vlad59
I'll forward this to my mother immediatly, I hope she'll believe you ;) :D ;) :D

Sorry Primitive for being OT ;)

Be as off-topic as you like; simply knowing that someone other than myself thinks the idea deserves exploration is a relief.

@Richard Berg

Let me see if I understand the problem. For RGB32, ColorKeyMask() turns all pixels of a certain color (specifiable by the user) transparent by manipulating the alpha channel of those pixels of that color. To use this with dgraft's MSmooth and MSharpen, you'd have to set the color key to the particular green he uses in his masks; those pixels would then be treated as transparent for the rest of the process.

YU12 and YUY2, OTOH, don't keep any data that's directly analogous to an alpha channel, so the simple method of setting the pixels to be masked to transparent no longer works.

Would it not be possible to flag a certain pattern of pixel data as "transparent" a la transparent .gif images? This flag would have to be different for every image (because it'd have to use a pattern which isn't present in the original frame), so the flag for EdgeMask() and EdgeMask(invert=true) would be different in some cases. Or would it be wiser to choose a flag for the entire frame and use that as "transparent" through both the masking operations? What would happen if during supplemental processing some pixels were changed to this pattern accidentally? Is there another method for dealing with "transparency" in YUY2 and/or YU12?

(primitive)

Bidoche
6th February 2003, 00:38
@Primitive

Methods for attaching tag to VideoFrame are considered for add into the 3.0 framework.
With them marking frames with a transparent color will be possible, but it will probably simpler to directly attach an alpha channel anyway (and less bothersome for filters who would use it after)

But maybe we could still use it and generate the alpha channel on the fly from this transparent color too.

Hum, well, that maybe smart, setting alpha channel as a frame tag (internally, users won't see that), it will allow to use them when needed without space cost when not.

Edit: Or thanks to VideoFrameBuffer sharing we allocate a big buffer full of 255 for all the alpha channel to point to...

Richard Berg
6th February 2003, 04:22
I think the best way to add transparency support to the YUV colorspaces is to define our own pixel format that has it "built in" so to speak. There was some discussion of this in another thread where we were considering adding a YUV 4:4:4 format -- so long as it's nonstandard already, there's no reason we couldn't make it YUVA. (In fact, barring planar formats, it wouldn't hurt speed at all since it aids alignment.)

Like you say there are other options (private/protected fields in VideoFrame, centralized buffer...), but by comparison they sound like they'd cause way more coupling than is necessary.

Acaila
6th February 2003, 11:13
@primitive

I just want to say thank you for bringing this up. Your ideas about a general toolbox coincide with something I have wondered about for a while now, but I lack the programming experience to put it to words adequately.

Bidoche
6th February 2003, 12:48
@Richard Berg

I agree that using internal flags like I said before may be a bit too much acrobatic.

I'd rather add an alpha plane to all VideoFrame, then the centralised buffer is used to initialise those plane. Thus we avoid countless alpha plane creations who will be unused most of the time while retaining possibility to read and change it.

It has a structural benefit too: since all videoframes have alpha, we don't have to worry about distinction between them, we know all of them can be used for alpha work.

Richard Berg
7th February 2003, 02:54
Glad we agree. I'm itching to start putting all our discussions into code...I think I'll tackle merging the CVS tonight...

Richard Berg
7th February 2003, 19:04
Done...sort of (http://www.avisynth.org/forum/viewtopic.php?p=49#49). Gotta run, catch you guys later tonight or tomorrow...

primitive
19th February 2003, 00:55
Sorry to bring this thread back from the dead...

@Richard Berg

Is it safe to say that the feature we've been discussing, if it's implemented, would be targeted for Avisynth 3.0? Is there an Avisynth roadmap?

Richard Berg
19th February 2003, 03:08
A new colorspace could be added to 2.5 without much difficulty, but anything more advanced (per-frame tags) probably requires the 3.0 framework. The closest thing we have to a roadmap are the threads in the avisynth.org developer forum.