PDA

View Full Version : Lossless Noise Reduction via Frequency Doubling


Isochroma
30th December 2006, 08:08
Lossless Noise Reduction via Frequency Doubling

Tonight I was contemplating noise reduction, as I so often do. Ruminating on the various methods, it struck me that all of them without exception do the same thing: they reduce the amplitude of noise.

This can be done by decreasing the content of high frequencies in the image, and also by averaging pixels across frames. Motion compensation makes the averaging more precise, in that the non-noise motion-correlated image components are less affected by the averaging.

Unfortunately, methods which remove certain frequencies by interframe or intraframe averaging, or other methods, inevitably blur non-noise detail in the image.

There is another way, not to remove or decrease noise, but to increase its frequency until it is invisible to the eye (more than 2x the retina's response rate) or undisplayable (liquid crystal transition times, in an LCD). The process has the unique ability to make all noise invisible, without in any way removing the original frame's detail.

This method of course will increase storage requirements (bitstream datarate), and also playback processor usage. However, both of those are increasing anyway, so we might as well make use of them.

The advantage of this method, is that the original frames are never modified, thus no details (frequencies) in the original are removed. The method is analogous to increasing the spacial resolution of a dithered image (think 256-color GIF). Some might remember the failed Philips SACD, which uses 1-bit quantization...

The process works by inserting new frames. The new frames are made by a process which starts by examining a number (maybe 4) frames backward and forward. The examined frames are first motion-compensated to remove as much motion-correlated data as possible.

Next, a spacial FFT is done on each frame; only pixels within the selected spacial frequency band(s) are included in the next steps; the others are simply duplicated. This is done to prevent artifacting from for example large areas having interframe transition frequencies within the temporal doubling region but which are not actually noise. An example might be if a TV screen were filmed; the in-frame region where temporally-aliased 60Hz. frequencies are present.

Then, rather than averaging the pixel values, a frequency analysis is done. The user-selected, default or auto-detected frequency band which is to be frequency-shifted is singled out.

This band is then split into segments via FFT, and the energy in each band is frequency-doubled, tripled, or more as required to reach the selected output framerate, which must be an integral multiple of the source framerate. New frames are synthesized by sampling at the new frequency, the frequency-multipled output.

Assuming a narrow band of high frequencies is selected, interframe pixel amplitude deltas in the synthetic output stream will be identical to the source stream, except that the select band's energy's frequency will be multiplied.

Only two doublings should be sufficient to render the majority of noise invisible to the eye assuming a fast display, or undisplayable with a slow one.

It seems audio tools are very well developed in this area. Here are some links of interest:

FreqTweak (http://freqtweak.sourceforge.net/)
Warp -- This one is a little different, both axes represent frequency, and the identity matrix is unaltered audio. Changing the value (height) of a bin, reallocates the energy at that frequency to the new frequency bin represented by the height of the bar. For instance, if all bins are the same height, all the frequency energy is added to a single bin. This is a sensitive filter, the Log frequency scale is helpful here (it affects both axes).

An alternate method makes use of a software synthetic feedback oscillator, which is fed a looped signal train of one pixel's value in the previous n, current and next n motion-compensated frames, first filtered by FFT to remove all but the the selected band(s) [noise frequencies] and then split into frequency bands whose width and number are auto-determined based on relative component strength within the signal, with the primary selection criterion being to capture the signficant components without wasting extraneous cycles.

The oscillator's resonant frequency is tuned to each auto-detected band in turn; it produces harmonics at the selected even multiple of the base framerate (ex. 1/2, 2x, 4x the original frequencies). The synthetic oscillator's output is filtered to remove all frequencies except the even harmonic which is equal to the target framerate.

The output impulse trains for the n auto-selected frequency bands are then summed, normalized to match the average amplitude of the previous and next frames (considering that the current frame to be generated at the new framerate is n2), amplitude-offset to match the source amplitude offset, and finally either inverted such that the alternate (interpolated) frames have pixel amplitudes offset by the output data in the opposite direction as the original amplitude or phase-shifted such that the phase is 180 degrees offset.

The processes described may be executed upon the luma channel only, or a combination of luma and chroma channels. Data in unselected channels is duplicated without modification.

An example: the source is film at 24fps. The output is quadrupled to 96 fps. Each of the three synthetic frames in every four in the output stream, has the same amount of noise as a single source frame. Also, the amplitude and spacial frequency distributions are indentical as well. But, since the playback frequency is now 96fps, and assuming the source noise primary frequency was equal to the framerate, the new noise frequency is 96Hz, which is invisible to the human eye.

Tomorrow I will clarify with some sample images, which will hopefully illustrate better the desired result.

tsp
30th December 2006, 12:48
and how do you know which frequency band that are needed to be increased and how do you intend to preserve the non- noise part of these frequency band?

davidhorman
30th December 2006, 13:10
I'm probably missing something, but why couldn't you then average your 4 96fps frames to create a single frame with 4 times less noise? Then you could do this at encode time and not worry about playback performance.

And if your display only runs at 60fps (as most LCDs do) does that mean you can only double once?

David

MfA
30th December 2006, 13:30
This is either one of the best markov chain text generators ever or some of the most fuzzy reasoning ever.

jmac698
30th December 2006, 17:55
I think that inventions often come about this way; thinking of a problem in certain terms that seem to make sense, only to realize later that what you thought of is far simpler instead. I think one can be fooled by the statement of a problem in equivalent terms that seems to offer a solution.

So, what you seem to be doing (if I follow), is increasing the frame rate, and saying noise is also a high frame rate therefore you can't see it. In reality it doesn't just go away, it gets averaged by the eye, the LCD, or may simply be dropped frames never displayed on the LCD (depending on your refresh rate).
You can achieve the same effect mathematically then, by temporal denoising of frame rate interpolated video.
That doesn't gain you anything I think; frame rate interpolation of static areas are just duplicated, nothing extra to average out. Motion areas are ideally perfectly compensated, same result.
Think of 2 frames of a moving block; a search correlates the block in the 1st frame to the position in the 2nd. Motion compensation interpolates by an inbetween frame with a copy of the blocks' pixels in the inbetween position. There's now 3 copies of the same pixels in three frames plus the original backgrounds. The only thing new is the displacement of pixels, there are no new pixels.
In reality there's errors in the moved pixels, these are fixed by blending I suppose. I mean, the moved pixels are a close match but not exact, so the colors are morphed into the new ones.

tsp
30th December 2006, 18:00
and if you can isolate the noise in 1 frame why not remove the entire frame thereby removing all the noise.
I mean you wouldn't notice if there are 96 or 95 fps :)

Trixter
30th December 2006, 18:09
Isochroma, check out fft3dfilter. It tackles filtering noise frequencies without your muddled thinking :-)

Isochroma
30th December 2006, 21:10
Averaging takes out noise, but also blurs detail. It doesn't change the frequency of noise, only reduces its amplitude.

Hidden under the impulsivity of the noise, is the signal. In order to fully preserve the signal, the impulsivity must also be preserved.

Last night was late, but this morning I have a simpler description of the process:

1. original video, has rate f (fps), the requested video has framerate f2
2. motion-compensate frames n-3, n-2, n-1, n+1, n+2, n+3 (number of frames needed depends on which frequencies we want to increase; lower frequencies, more +- frames, this value will be called w)
3. FFT: select frequency bands marked by user as noise
4. split bands into p highest-amplitude parts
5. to each band in turn, further split if necessary until we are down to q single frequencies
6. take each single frequency, and using any of a number of simple methods, multiply its frequency by the integral required to get to output framerate. Filter to remove non-output harmonic frequencies, gain to match range of source pixels, offset to match source pixel average offset, and finally phase-shift or invert such that the pixel's value is maximally offset from the source, within the amplitude range of the noise component only.
7. when each single source frequency within the selected range has been doubled and postprocessed, sum them to obtain total offset for the pixel, at each of the output frames' phase offsets. store the values in a cubic array (x,y,w).
8. Generate synthetic frames by first copying source frames to destination frames (just duplication), then for each pixel add the value generated for that pixel in step 7.

-------------

More explantation:

Averaging the output frames would indeed work to remove noise, just like averaging the source frames. However, we'd be back to the methods and disadvantages of other denoisers.

The goal here is to shift selected frequency bands above the visible range. In order to do that, we first must increase the framerate, in order to provide frequency headroom - a place to put those frequencies. Of course, noise frequences could be shifted down rather than up, but that has two problems. One, the human eye has no reasonable lower-limit to frequencies which are visible. Two, we'd be shifting the noise into an area already containing signal.

The magic is shifting the noise into a frequency range that is not visible at the output framerate, and that does not already contain signal. By frame duplication and playback rate multiplication, we create an area at least 4x larger on the frequency-vs-power graph. That area is totally empty, and better yet most of its upper reaches are outside the temporal response rate of the human eye. Because most noise tends to be at framerate frequency, multiplying the framerate by for example 4, quaduples the available frequency space to shift energy bands.

Then, it is only a matter of selecting the frequencies and shifting them. The shift methods can vary from logic-based deterministic methods, to sequentially-tuned synthetic oscillators which generate the required harmonics.

The important thing to remember is this: in order for a traditional noise reduction, the amplitude of noise must be decreased, thus also decreasing the amplitude of real signal.

The proposed method does not modify the amplitude of original data, but inserts new data with equal but inverse amplitude. The generated values have the same quantity of spacial and temporal noise component as the source, only it is phase- or amplitude-offset such that the final sequence of frames contains values that sequentially maximally differ within the interframe amplitude range of the original.

Thus by adding new frames that contain unique noise, and playing back at a higher rate, the selected noise bands are shifted to the output frequency. It might then be proposed to instead use a frame-doubler and ffdshow's noise adding filter.

The problem with such methods is they increase the amount of noise by adding it to every output frame, including original source frames, thus decreasing the SNR. Also, the added noise will not likely have the same amplitude, offset or frequency distribution as the existing noise.

The proposed method creates new frames with the same total amount, frequency distribution, and pixel-unique source-interframe stride amplitude delta ranges, but phase-shifted or amplitude inverted. All energy at each frequency range in the original, is preserved in the output.

With a fast enough multicore processor, the entire process can occur during playback, rather than before compression. However, the entire original frequency range must be present in the compressed source, otherwise the process will be only marginally effective.

rfmmars
30th December 2006, 21:29
Isochroma, check out fft3dfilter. It tackles filtering noise frequencies without your muddled thinking :-)

I like this guys "muddled" thinking, he's excersizing his brain!

Richard

davidhorman
30th December 2006, 23:12
Averaging takes out noise, but also blurs detail.

Yes, if you're averaging different frames. I'm talking about averaging your 4 same frames that you've added different noise to. The signal would be in the same place every time, so how could it be blurred by averaging?

David

Isochroma
31st December 2006, 00:52
It still blurs - you can try fft3d and find out for yourself. Because there is non-noise data that changes between frames, and is not motion-correlated, so cannot be nulled by motion-compensating the frames to be averaged.

During the capture process, spacial data becomes mixed with noise. The high-frequency portion of the desired signal is difficult to separate from both spacial and temporal noise components.

It is almost as if the noise itself encodes the sharpness. This is not true, but yet it seems right. Thus by not decreasing the noise amplitude, but increasing its frequency, sharpness can be preserved.

davidhorman
31st December 2006, 01:37
Because there is non-noise data that changes between frames

Are you talking about generally averaging multiple frames from a video? I'm talking about averaging your quadrupled frames into one - since you said you were doing simple duplication, then adding this special "noise".

If on the other hand they've been motion-compensated, then that's going to change the whole look of the video, which is a Bad Thing.

David

Didée
31st December 2006, 02:44
During math, did you hear about the proving method "complete induction"? There are those funny cases where you have a wonderfully working induction step, but then stumble about having no first valid element where the induction's assertion is true ...

As I see it, one fundamental implication has been forgotten here.

You mistaked a sideffect with the cause of it, and in the effort to circumvent the mistaken "cause", you trap in the still standing real cause.

It does not matter by which method you try to "vanish" the noise. The basic problem stays the same: what is noise, and what is not.

From the aspect of visual experience, it does not matter if you make the noise invisible by reducing its amplitude, or by temporal superpositioning of phaseinversed echoes.
The point is that either method will make a certain subset of the raw image data "invisible". And the eternal problem still is to make the right things invisible. If either method can't manage to make the distinction between signal and noise, then the result is either "noise gone, detail weakened" or "detail intact, noiseremoval insufficient".

All what you have described is another approach to "what to do with what a given method considers as noise". There is not one yota about "how to decide better what is detail or noise".

Realize this: everything lies with the quality of the used method to reckognize "noise". If noise reckognition is bad, then some detail will be misjudged as "noise", and your method, too, will make it "invisible". If otoh noise reckognition is good, then your method brings no benefit, because in that case Averaging&Co. do a very good job.

And from another angle ... it is not like "traditional" mo-comp'ed noise reduction would necessarily reduce detail. There are methods like "temporal superresolution" that even try to increase frame detail, and, be astounded or not, the basic methods of TNR and TSR are quite similar.

Lastly, a quick guess about space efficiency. I've had grainy cases where some thorough noise reduction improved compression by 70% : compressed filesize not-filtered=100% vs. filtered=30% ... where your approach of quadrupling without removing noise is obviously aiming at 400%.
400/30 is 13.33. Lets round that down, and estimate that your approach will need potentially 10 times more storage space than oldfashioned noise removal. :)

PaulKroll
31st December 2006, 05:07
First, Isochroma, the place you want to start with is MVTools (http://forum.doom9.org/showthread.php?t=84770&highlight=mvtools) , which will let you experiment.

Second, there's no frame rate that is "invisible to the human eye." For one thing, people are different. For another, an enormous amount depends on the exact video: flashing a word in black, on a white background, for 1/1000 of a second, is probably going to be caught by most people, even though that's "1000 frames per second." Flashing 4 words in the same place one right after another, at that rate, that's going to be a blur.

Which brings us to a seeming misconception, "Averaging takes out noise, but also blurs detail." That's technically incorrect in both ways. If you take two identical frames, and create a third frame based on the average of the two, well, you're going to end up with exactly the same frame, noise and details alike. Averaging pixels based on surrounding pixels, averaging frames based on surrounding, non-identical frames, that's when you start seeing noise, and details, blur. There have been discussions here (here's one (http://forum.doom9.org/showthread.php?s=&threadid=28438) I could find on short notice) about capturing from video multiple times and then blending the multiple captures, to reduce the noise but not destroy detail. This works well, and I've done this a lot, but it's extremely tedious and really works best when you have two separate sources (in the original examples, there were multiple captures from off-air. I've tried with a single VHS recording, and you do reduce the noise the VCR this way, but you get best results come from having two copies of a tape). In these case you really do average out the noise (because it changes from capture to capture) and not the details (because they don't).

Thing is, if you play 96 frames/sec in front of someone, their eyes/brain will do the averaging of the frames instead of the computer doing it. The noise won't be gone because "it's invisible" past a certain frame rate, it'll be gone because your eyes are doing the averaging job, whether at 96 or 1000 fps. In fact, most of the DLP projectors out now, generally do several thousand "frames" per second of each color (red/green/blue, some use variants), since they can only DO one color at a time, and your eyes merge them into a solid, multi-color image.

You can see the effect of high frame rates with MVTools, which I've used to create 60 FPS version of some action scenes. In that case, my primary goal was to see what a higher framerate would do to The Incredibles, or Road Warrior. Aside from looking considerably more "smooth" or "live," noise is also reduced. (Since there's no way for MVTools to be perfect, though, there are also sequences of frames that don't work. The 100 Mile Dash scene in The Incredibles looks great at 60 fps, sometimes, and some of the running-in-the-jungle parts look completely abstract, as creating interstitial versions of blurred motion frames is Not So Easy.)

Isochroma
31st December 2006, 06:23
Thanks everyone for the feedback! Yes, it was just a dream, one that someday I might test...

It is true that the eye will average the frames out, but the eye's average will be much higher frequency than the average generated by combining frames at 24fps. Still, the detail will undoubtedly be obscured by the added noise, even though the generated frames have the same noise magnitude, because in generating synthetic frames, the added noise will decrease their SNR.

708145
31st December 2006, 18:22
Thanks everyone for the feedback! Yes, it was just a dream, one that someday I might test...

It is true that the eye will average the frames out, but the eye's average will be much higher frequency than the average generated by combining frames at 24fps. Still, the detail will undoubtedly be obscured by the added noise, even though the generated frames have the same noise magnitude, because in generating synthetic frames, the added noise will decrease their SNR.

I guess Didée is right in priniciple. But the eye will average out differently depending on brightness, on which parts of the screen you concentrate - even how awake the viewer is.

I think it is worthwhile to try this approach. And using a clever encoding technique the bitrate penalty won't be huge. Watch out for a codec called "n0153" (noise) which is particularly suited for this kind of compression.

bis besser,
T0B1A5