Isochroma
30th December 2006, 08:08
Lossless Noise Reduction via Frequency Doubling
Tonight I was contemplating noise reduction, as I so often do. Ruminating on the various methods, it struck me that all of them without exception do the same thing: they reduce the amplitude of noise.
This can be done by decreasing the content of high frequencies in the image, and also by averaging pixels across frames. Motion compensation makes the averaging more precise, in that the non-noise motion-correlated image components are less affected by the averaging.
Unfortunately, methods which remove certain frequencies by interframe or intraframe averaging, or other methods, inevitably blur non-noise detail in the image.
There is another way, not to remove or decrease noise, but to increase its frequency until it is invisible to the eye (more than 2x the retina's response rate) or undisplayable (liquid crystal transition times, in an LCD). The process has the unique ability to make all noise invisible, without in any way removing the original frame's detail.
This method of course will increase storage requirements (bitstream datarate), and also playback processor usage. However, both of those are increasing anyway, so we might as well make use of them.
The advantage of this method, is that the original frames are never modified, thus no details (frequencies) in the original are removed. The method is analogous to increasing the spacial resolution of a dithered image (think 256-color GIF). Some might remember the failed Philips SACD, which uses 1-bit quantization...
The process works by inserting new frames. The new frames are made by a process which starts by examining a number (maybe 4) frames backward and forward. The examined frames are first motion-compensated to remove as much motion-correlated data as possible.
Next, a spacial FFT is done on each frame; only pixels within the selected spacial frequency band(s) are included in the next steps; the others are simply duplicated. This is done to prevent artifacting from for example large areas having interframe transition frequencies within the temporal doubling region but which are not actually noise. An example might be if a TV screen were filmed; the in-frame region where temporally-aliased 60Hz. frequencies are present.
Then, rather than averaging the pixel values, a frequency analysis is done. The user-selected, default or auto-detected frequency band which is to be frequency-shifted is singled out.
This band is then split into segments via FFT, and the energy in each band is frequency-doubled, tripled, or more as required to reach the selected output framerate, which must be an integral multiple of the source framerate. New frames are synthesized by sampling at the new frequency, the frequency-multipled output.
Assuming a narrow band of high frequencies is selected, interframe pixel amplitude deltas in the synthetic output stream will be identical to the source stream, except that the select band's energy's frequency will be multiplied.
Only two doublings should be sufficient to render the majority of noise invisible to the eye assuming a fast display, or undisplayable with a slow one.
It seems audio tools are very well developed in this area. Here are some links of interest:
FreqTweak (http://freqtweak.sourceforge.net/)
Warp -- This one is a little different, both axes represent frequency, and the identity matrix is unaltered audio. Changing the value (height) of a bin, reallocates the energy at that frequency to the new frequency bin represented by the height of the bar. For instance, if all bins are the same height, all the frequency energy is added to a single bin. This is a sensitive filter, the Log frequency scale is helpful here (it affects both axes).
An alternate method makes use of a software synthetic feedback oscillator, which is fed a looped signal train of one pixel's value in the previous n, current and next n motion-compensated frames, first filtered by FFT to remove all but the the selected band(s) [noise frequencies] and then split into frequency bands whose width and number are auto-determined based on relative component strength within the signal, with the primary selection criterion being to capture the signficant components without wasting extraneous cycles.
The oscillator's resonant frequency is tuned to each auto-detected band in turn; it produces harmonics at the selected even multiple of the base framerate (ex. 1/2, 2x, 4x the original frequencies). The synthetic oscillator's output is filtered to remove all frequencies except the even harmonic which is equal to the target framerate.
The output impulse trains for the n auto-selected frequency bands are then summed, normalized to match the average amplitude of the previous and next frames (considering that the current frame to be generated at the new framerate is n2), amplitude-offset to match the source amplitude offset, and finally either inverted such that the alternate (interpolated) frames have pixel amplitudes offset by the output data in the opposite direction as the original amplitude or phase-shifted such that the phase is 180 degrees offset.
The processes described may be executed upon the luma channel only, or a combination of luma and chroma channels. Data in unselected channels is duplicated without modification.
An example: the source is film at 24fps. The output is quadrupled to 96 fps. Each of the three synthetic frames in every four in the output stream, has the same amount of noise as a single source frame. Also, the amplitude and spacial frequency distributions are indentical as well. But, since the playback frequency is now 96fps, and assuming the source noise primary frequency was equal to the framerate, the new noise frequency is 96Hz, which is invisible to the human eye.
Tomorrow I will clarify with some sample images, which will hopefully illustrate better the desired result.
Tonight I was contemplating noise reduction, as I so often do. Ruminating on the various methods, it struck me that all of them without exception do the same thing: they reduce the amplitude of noise.
This can be done by decreasing the content of high frequencies in the image, and also by averaging pixels across frames. Motion compensation makes the averaging more precise, in that the non-noise motion-correlated image components are less affected by the averaging.
Unfortunately, methods which remove certain frequencies by interframe or intraframe averaging, or other methods, inevitably blur non-noise detail in the image.
There is another way, not to remove or decrease noise, but to increase its frequency until it is invisible to the eye (more than 2x the retina's response rate) or undisplayable (liquid crystal transition times, in an LCD). The process has the unique ability to make all noise invisible, without in any way removing the original frame's detail.
This method of course will increase storage requirements (bitstream datarate), and also playback processor usage. However, both of those are increasing anyway, so we might as well make use of them.
The advantage of this method, is that the original frames are never modified, thus no details (frequencies) in the original are removed. The method is analogous to increasing the spacial resolution of a dithered image (think 256-color GIF). Some might remember the failed Philips SACD, which uses 1-bit quantization...
The process works by inserting new frames. The new frames are made by a process which starts by examining a number (maybe 4) frames backward and forward. The examined frames are first motion-compensated to remove as much motion-correlated data as possible.
Next, a spacial FFT is done on each frame; only pixels within the selected spacial frequency band(s) are included in the next steps; the others are simply duplicated. This is done to prevent artifacting from for example large areas having interframe transition frequencies within the temporal doubling region but which are not actually noise. An example might be if a TV screen were filmed; the in-frame region where temporally-aliased 60Hz. frequencies are present.
Then, rather than averaging the pixel values, a frequency analysis is done. The user-selected, default or auto-detected frequency band which is to be frequency-shifted is singled out.
This band is then split into segments via FFT, and the energy in each band is frequency-doubled, tripled, or more as required to reach the selected output framerate, which must be an integral multiple of the source framerate. New frames are synthesized by sampling at the new frequency, the frequency-multipled output.
Assuming a narrow band of high frequencies is selected, interframe pixel amplitude deltas in the synthetic output stream will be identical to the source stream, except that the select band's energy's frequency will be multiplied.
Only two doublings should be sufficient to render the majority of noise invisible to the eye assuming a fast display, or undisplayable with a slow one.
It seems audio tools are very well developed in this area. Here are some links of interest:
FreqTweak (http://freqtweak.sourceforge.net/)
Warp -- This one is a little different, both axes represent frequency, and the identity matrix is unaltered audio. Changing the value (height) of a bin, reallocates the energy at that frequency to the new frequency bin represented by the height of the bar. For instance, if all bins are the same height, all the frequency energy is added to a single bin. This is a sensitive filter, the Log frequency scale is helpful here (it affects both axes).
An alternate method makes use of a software synthetic feedback oscillator, which is fed a looped signal train of one pixel's value in the previous n, current and next n motion-compensated frames, first filtered by FFT to remove all but the the selected band(s) [noise frequencies] and then split into frequency bands whose width and number are auto-determined based on relative component strength within the signal, with the primary selection criterion being to capture the signficant components without wasting extraneous cycles.
The oscillator's resonant frequency is tuned to each auto-detected band in turn; it produces harmonics at the selected even multiple of the base framerate (ex. 1/2, 2x, 4x the original frequencies). The synthetic oscillator's output is filtered to remove all frequencies except the even harmonic which is equal to the target framerate.
The output impulse trains for the n auto-selected frequency bands are then summed, normalized to match the average amplitude of the previous and next frames (considering that the current frame to be generated at the new framerate is n2), amplitude-offset to match the source amplitude offset, and finally either inverted such that the alternate (interpolated) frames have pixel amplitudes offset by the output data in the opposite direction as the original amplitude or phase-shifted such that the phase is 180 degrees offset.
The processes described may be executed upon the luma channel only, or a combination of luma and chroma channels. Data in unselected channels is duplicated without modification.
An example: the source is film at 24fps. The output is quadrupled to 96 fps. Each of the three synthetic frames in every four in the output stream, has the same amount of noise as a single source frame. Also, the amplitude and spacial frequency distributions are indentical as well. But, since the playback frequency is now 96fps, and assuming the source noise primary frequency was equal to the framerate, the new noise frequency is 96Hz, which is invisible to the human eye.
Tomorrow I will clarify with some sample images, which will hopefully illustrate better the desired result.