Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
![]() |
#1 | Link |
Noise is your friend
Join Date: Sep 2002
Location: Ontario, Canada
Posts: 554
|
A "psychovisual" approach to noise reduction
I've been using Avisynth (and lurking in this forum) for quite a while, and experimenting with the various noise reduction filters, during which time some thoughts have occurred to me. I don't claim these are original or even totally on-topic (my excuse being that I've been testing various algorithms as Avisynth filters), but here they are anyway.
From my tests (mainly DVD rips but also some captures of varying quality VHS tapes) I've gradually settled on Convolution3D as a good all-round noise reduction filter, especially when applied prior to resizing and with higher values than the recommended 8 for chroma. A 3x3 average blur seems to be the most effective at increasing compressibility (or, in my case, decreasing the Q level of a CBR MPEG-1), but obviously is also the most destructive. Selective filters such as msmooth do a good job, and I believe are, theoretically speaking, on the right track as far as noise reduction goes. That said, however, smoothing between edges is a fairly simplistic approach - again, from a conceptual and not a practical point of view - and most of my thoughts have been towards other factors besides edges that could be used to influence a smoother. Other factors I've identified might be: 1) Detection of "focus." From my tests with various home-grown algorithms this sounds a lot simpler than it is. While SLR cameras are able to auto-focus very effectively by, apparently, comparing tonal contrast levels at various hotspots in the frame, I'm not sure how easily this translates to a flat image (more sophisticated cameras use distance-detection to aid in determining focus, something that obviously cannot be done with a 2D array of pixels). It would seem to me that objects out of focus can receive more aggressive noise reduction. Perhaps tonal contrast might be combined with judging the "sharpness" of existing edges. From what I've read this has something to do with the high frequency parts of the picture, but I've yet to find locally a book on image processing so my knowledge ends here. I'm hoping someone can pick this up and run with it, or explain to me why this is an exercise in futility ![]() 2) Detection of motion. I've dabbled with absolute luminance difference which does a fairly good job at finding parts of the frame in motion, but seems to be attracted to the edges of moving objects rather than the entire object. My thinking here is that once moving objects are identified, anything else must be fairly stationary and can perhaps receive slightly more aggressive smoothing. It also might be the case that scenes with very low motion could handle stronger temporal smoothing than, for example, pans, which tend to produce "trails" with over-aggressive temporal blurring. 3) Framing. While not always (or even mostly) true, the centre of the frame tends to contain the most important information, and is certainly where the eyes seem to "return" upon a scene change. This would imply that smoothing can often be weighted towards the edges of the frame, when other factors (such as focus) determine nothing important resides there. 4) Scene changes. Since it takes a finite - possibly relatively substantial - amount of time for the brain to register a scene change, and there is a momentary "confusion" while you reorientate yourself to the new angle, etc., could the frame or two following a scene change also be candidates for more aggressive smoothing? 5) Dissolves. Somewhat related to the previous idea, during the course of a dissolve stronger blurring might be less noticable to the viewer. 6) Intrinsic detail levels. Possibly very difficult to measure, it might make sense that areas containing relatively little detail compared to the rest of the frame (e.g. an out-of-focus background to a headshot) could receive more attention from a smoothing filter. 6) Edge detection. While on its own edge detection can lead to over-smoothing of important details (and, I would submit, is more suited to anime), when combined with other "psychovisual" factors it may be more useful for live-action footage. For example, an out-of-focus background could be further blurred between edges, as could areas already identified as slow-moving. While each of these individually might have only a little effect on compressibility, perhaps in combination they could provide a fairly significany improvement. Whether the processing time involved would make it unrealistic is another matter. I apologise for the length of this post (I had no idea it was going to turn out this long), but I've yet to find any other outlet for my ideas - certainly none with such knowledgable participants. Waiting five days to post after registering was very frustrating! Please feel free to do with these ideas what you will. For all I know I may just be rehashing old and rejected concepts, but a fairly exhaustive reading of the Doom9 forums turned up very few posts regarding psychovisual processing (that is, unrelated to DivX 5), except for the smooth-edges-of-frame idea which has been briefly discussed before, but with no real conclusion. Also feel free to roll your eyes and point me in the direction of reading material that will set me straight. Image processing is a topic somewhat neglected by my local bookstores, and math class for me was mainly a time to fantasize about the teacher ![]() Thanks for your attention. Regards, SansGrip |
![]() |
![]() |
![]() |
#2 | Link |
Kilted Yaksman
Join Date: Oct 2001
Location: South Carolina
Posts: 1,303
|
Re: A "psychovisual" approach to noise reduction
2) Detection of motion. I've dabbled with absolute luminance difference which does a fairly good job at finding parts of the frame in motion, but seems to be attracted to the edges of moving objects rather than the entire object. My thinking here is that once moving objects are identified, anything else must be fairly stationary and can perhaps receive slightly more aggressive smoothing. It also might be the case that scenes with very low motion could handle stronger temporal smoothing than, for example, pans, which tend to produce "trails" with over-aggressive temporal blurring.
I'm working on a temporal smoother that keeps track of where blocks move over time, hopefully eliminating the trails that plague a smoother that assumes a still image. Applying aggressive smoothing to areas in motion is a nice idea, though I'd rather leave that to the codec. We'll see. -h |
![]() |
![]() |
![]() |
#3 | Link | |
Noise is your friend
Join Date: Sep 2002
Location: Ontario, Canada
Posts: 554
|
Re: Re: A "psychovisual" approach to noise reduction
Quote:
![]() |
|
![]() |
![]() |
![]() |
#4 | Link |
Asker of Questions
Join Date: Oct 2001
Location: Florida
Posts: 433
|
MPEG-1 barely defines the encoder at all, only the decoder requirements. You could have a valid MPEG-1 encoder that did what you're discussing here. Someone would just have to write it.
![]()
__________________
"The real trick to optimizing color space conversions is of course to not do them." --trbarry, April 2002 |
![]() |
![]() |
![]() |
#5 | Link | |
Noise is your friend
Join Date: Sep 2002
Location: Ontario, Canada
Posts: 554
|
Quote:
@all: It also occurred to me that softening could be more aggressive in dark areas of the screen, and I set about writing a filter to do that. Then I realized that it's pointless making Yet Another Softener, so instead I'm working on a filter to selectively "mix" two clips based on luma thresholds. E.g.: orig = AviSource(blah) blur = orig.Blur(1).Blur(1).Blur(1) orig.ThresholdMix(blur, 16, 35) would replace dark areas (16-35 luma) with a heavily blurred version. I've got something working along these lines and the results seem promising, with the blurring hardly noticible (whether this will remain true when viewed on a TV screen is something I'll have to test). Right now all the filter does is a straight pixel-for-pixel replacement, but ideally it would "feather" the edges to be less obvious (one scene with a neon sign surrounded by darkness didn't look too good, and nor do high-contrast titles). Can anyone suggest an effective way to do this? I thought of checking a 3x3 block around the pixel in question and only "mixing in" the alternate frame if *all* the pixels in the block fall within the threshold, but this seems to slow it down a great deal (at least my implementation of it ![]() Any suggestions (even "look at the source for xxx") would be greatly appreciated. |
|
![]() |
![]() |
![]() |
#6 | Link |
Registered User
Join Date: Jan 2002
Posts: 283
|
@SansGrip,
Sounds like lots of good ideas to me. Here are some thoughts on the points in your original post... 1) Focus — Like you say, areas with a lot of detail are going to be less suited to spatial smoothing. 2) Motion detection — This clearly is the key to noise reduction. 3) I’m sort of skeptical about framing. It would have to be subtle, and would probably be more trouble than it’s worth. 4&5) Scene changes and dissolves — These should be taken care of by good motion and detail detection without any need for special cases. Scene changes will be detected as motion, but with low detail as the scene fades in; likewise for dissolves. That’s a case where you would already want to use spatial smoothing. 6&6) Edge detection and detail level — I think these are the same problem as focus detection. All of them are figuring out the amount of detail in the picture — the more detail, the less spatial averaging you want. I think there are three tasks for a good noise reduction algorithm: Noise detection, motion detection, and detail detection. Noise detection allows you to distinguish motion and detail. Then motion detection is used — If a pixel is completely stationary, then the best estimate of its value is a temporal average of the values at that pixel. If it isn’t stationary, then the best estimate is a compromise between spatial smoothing (to reduce noise) and just keeping the pixel’s value (to maintain detail). You can get a lot of milage for motion detection [Edit: This used to read “noise detection.”] by saving the amount of detected color difference to an array, and blurring that amount for use as an estimate of motion. (That’s because moving pixels tend to be next to other moving pixels.) > ...math class for me was mainly a time to fantasize about the teacher Maybe that explains your interest in visual mathematical problems? ![]() Most smoothing algorithms will automatically average more in dark regions! There are two causes for this — One is that noise in dark regions is reduced (since the pixel value is clipped to a minimum of 0). So any algorithm which averages based on color differences will tend to average more in dark areas. The second reason is that video signals are gamma corrected, and therefore the same pixel difference corresponds to a much greater perceived difference in dark areas. This is also the reason why it’s a good idea to average more in dark areas — although the real amount of noise is reduced, the perceived noise is higher than elsewhere in the picture.
__________________
9:) Lindsey Dubb Last edited by High Speed Dubb; 7th October 2002 at 07:10. |
![]() |
![]() |
![]() |
#7 | Link | ||||||
Noise is your friend
Join Date: Sep 2002
Location: Ontario, Canada
Posts: 554
|
Quote:
Quote:
Quote:
Quote:
Quote:
![]() Quote:
While this might be theoretically possible to do with Layer() and Mask(), these - I think - only work in ARGB space and I like to keep everything YUV2. Hacks like my "ThresholdMix" filter work, but involve blurring the whole frame even if only a few pixels are affected. I think what's needed is an extension to the core to allow new filters to "know" which parts of the frame they should ignore, and old filters to be "clipped" automatically to within the bounds of the mask. Well, I'm rambling again. I'm going to try to figure out how to blend two images together given arbitrary weighting. I think it might be time to order a book on image processing from Amazon... ![]() |
||||||
![]() |
![]() |
![]() |
#8 | Link | ||||
Registered User
Join Date: Jan 2002
Posts: 283
|
Quote:
So far my attempts have been to check for the difference between the pixel and its nearby neighbors. That does reasonably well at classifying textures as “high detail.” And that’s important, since spatial smoothing on textures looks really bad. Quote:
- Is it stationary? If so, use temporal smoothing. Otherwise... - Is is low detail? If so, use spatial smoothing - Is it high detail and in motion? Then just use the pixel as is, since there’s nothing useful (short of block motion detection) you can do to improve it. The noise level would be important when deciding whether a pixel is in motion, and how much detail there is. But even with noise detection, you don’t know for sure how much motion or detail there is at a pixel. So optimally it makes sense to use each of these possibilities, weighted by the amount of evidence for motion and detail. about my odd “just blur the motion evidence” comment — Quote:
Like you said, difference in Y can be used as an indicator of motion. Problem is, there’s noise at each pixel, so it’s possible that a stationary pixel will have a big change in Y — and it’s also possible that a moving pixel won’t have a large change. But moving pixels tend to be next to moving pixels, and stationary next to stationary. So you can reduce the chance of being fooled by using the information from nearby pixels. For example, if all nearby pixels show evidence for motion, then that pixel is probably in a moving area, even if the change in Y is small. About averaging in dark areas — Quote:
For esoteric reasons (because of the way tube TV sets work), the color difference between Y=20 and Y=21 is much greater than the color difference between Y=200 and Y=201. The function which maps differences to a ~linear scale is called “gamma.” Gamma correction is actually done on RGB rather than YUV color space, but you’re better off reading that level of explanation from a graphics reference. The Google search will turn up lots of stuff on it. Stay away from anything talking about the “gamma function” or “gamma distribution,” which are completely unrelated. Um... there’s more stuff to comment on, but I’m getting tired, so I’ll leave it for tomorrow.
__________________
9:) Lindsey Dubb |
||||
![]() |
![]() |
![]() |
#9 | Link | ||||
Noise is your friend
Join Date: Sep 2002
Location: Ontario, Canada
Posts: 554
|
Quote:
Quote:
Quote:
![]() Quote:
You know, using the word "fuzzy" above has triggered my memory. I've also been thinking in very vague ways about applying fuzzy logic to all this. From the little I know of it it seems to be very applicable to image processing. Another area I'm interested in investigating is the possibility of using genetic algorithms, which can be extremely powerful. I've not come up with a concrete example of how they might be applied yet (perhaps "learning" the noise patterns of a particular source material), but it's something else to keep me up at night ![]() |
||||
![]() |
![]() |
![]() |
#10 | Link |
Kilted Yaksman
Join Date: Oct 2001
Location: South Carolina
Posts: 1,303
|
Another area I'm interested in investigating is the possibility of using genetic algorithms, which can be extremely powerful. I've not come up with a concrete example of how they might be applied yet (perhaps "learning" the noise patterns of a particular source material), but it's something else to keep me up at night
![]() I have no idea what a genetic algorithm is, but I've detected the noise content by using the mean of the sum of absolute differences between a block and its temporal neighbour as a threshold value. That is, if motion estimation has found (a series of) good matches in the reference frame, its a safe bet that the difference between the current and reference blocks will be the noise content, which can be filtered out by thresholding. Of course it pays to be conservative, as its not only noise that alters a block over time, even when its motion is tracked. -h |
![]() |
![]() |
![]() |
#11 | Link |
Asker of Questions
Join Date: Oct 2001
Location: Florida
Posts: 433
|
One of the requirements for writing a genetic algorithm is having a way to evaluate the "fitness" of a particular configuration. Unfortunately, there's no reliable way to determine whether an output frame has had noise removed, but only minimal detail loss. If there were, you could probably just reverse that process and come up with the perfect technique.
You could "evolve" an algorithm, but it wouldn't necessarily be more fit than the starting point, because you'd have no way (other than looking at the image yourself) to determine whether it had done a good job or not. You can't just seek out maximum compressibility, because you'd just end up with a grey screen.
__________________
"The real trick to optimizing color space conversions is of course to not do them." --trbarry, April 2002 |
![]() |
![]() |
![]() |
#12 | Link | |
Noise is your friend
Join Date: Sep 2002
Location: Ontario, Canada
Posts: 554
|
Quote:
![]() Noise detection isn't the only application for this. It might also work for difficult IVTC/deinterlacing too. Of course, I might be talking out of my hat (or something else round and inappropriate ![]() |
|
![]() |
![]() |
![]() |
#14 | Link | ||
Registered User
Join Date: Jan 2002
Posts: 283
|
Quote:
Quote:
Since you’re just starting up, my advice on genetic algorithms (or neural nets) would be to avoid them for the moment. You can do just as well (and with a lot less bother) by just moving sliders until you like the output. For ease of use, “which of these looks better” parameter selection does have a lot of appeal. But you should be aware that “mixing” — the time it takes to search the reasonable parameters — can take a while. A simple hill climber would be easier and possibly more effective than a genetic algorithm. Fuzzy logic isn’t my favorite method, but it should work better than simple thresholds for a lot of things. It’s also reasonably easy to implement. My favorite buzzword, if you’re looking for esoteric algorithms, is Hidden Markov Modelling. That has the benefit that its parameters can be optimized without user input (using maximum likelihood estimates), and is conceptually a very good match to some video problems.
__________________
9:) Lindsey Dubb |
||
![]() |
![]() |
![]() |
#15 | Link |
Registered User
Join Date: Jan 2002
Posts: 283
|
@h,
Yeah, that’s pretty much how I do it, too. The tough part has been the motion detection — I’ve only used local measures of motion, without trying to figure out vectors.
__________________
9:) Lindsey Dubb |
![]() |
![]() |
![]() |
#16 | Link | |
Registered User
Join Date: Apr 2002
Posts: 67
|
Quote:
|
|
![]() |
![]() |
![]() |
#17 | Link |
Noise is your friend
Join Date: Sep 2002
Location: Ontario, Canada
Posts: 554
|
@Defiler
That sounds like a cool application. Go ahead and write it. ![]() What I'm thinking at the moment (though bear in mind that it's 02:32 and I have a very bad cold ![]() It might not be necessary to have an actual plugin to generate the variable part of the script. Perhaps the user could simply specify a template (e.g. SmoothHiQ(a, b, c, d, e)) and valid ranges for each parameter. The fun part would be in the "mutating" of the parameters each iteration. I'm not sure how this would work yet ![]() @High Speed Dubb: Masks (and blocking) are good, powerful tools. You do have to be careful that the boundaries of the mask don’t look weird, though. That's my problem with the "ThresholdMix" filter right now. But alpha blending (now I know the name for it thanks to Leuf) might fix that. I shall add fuzzy logic and Hidden Markov Modelling to my list of things to ask Google about ![]() @Leuf: But the way I would have done it is to have 2 thresholds, below the low threshold becomes 100% the blurred pixel, above the high threshold 100% the source, and between an alpha blend of the two. This is pretty cheap computationally, you can use a lookup table even if you like, and should do what you want. Blending from a lookup table sounds good. Previously I was playing with an only-replace-this-pixel-if-all-its-neighbours-will-also-be-replaced algorithm, but it was slow. Mind you, none of it was optimized in any way. |
![]() |
![]() |
![]() |
Thread Tools | Search this Thread |
Display Modes | |
|
|