Doom9's Forum - View Single Post

frustum · 25th April 2011, 04:01

John, thanks for the input. I had some time this weekend to think a little about autolevels, and one of the ideas is along the lines of your first comment -- a clustered group of N luma=255 pixels should matter more than N single pixels with luma=255 which are scattered around. My thought was to pre-process each frame with something like a median filter with a small radius (maybe just 1) to get rid of those specs. Perhaps one of the more sophisticated speckle removal filters would be enough. Stats would be taken on this munged frame and applied to the original frame pixels.

Warning: rambling thoughts ahead.

Another fundamental problem is this: the statistics take the mean histogram high/low values in a window +/-N frames of the current frame (ignoring frames judged to be from a different scene). The problem is that if the histogram changes suddenly, say due to the scene panning and something glaring in the sun pops into frame, this bright area will be visible in N+1 frames or fewer but it is averaged with 2N+1 frames by the time that bright spot becomes the center of the averaging window. To be more specific, say maxluma=200 for a good long time, then this bright object suddenly appears and stays in frame, and shows up as a spike near luma=240. maxluma averaged over the 2N+1 frame window will be (N*200 + (N+1)*240), or roughly 220 for larger values of N. This will cause all the pixels with luma > 220 to get saturated, wiping out all detail in the bright regions.

I think I should take this into account and do a smooth limiting function when calculating the average maxluma. Doing something simple like avg_maxluma = min( sum(maxluma(frame=current-N to current+N)), max(maxluma(frame=current-N to current+N)) ) would fail -- the first time the bright frame appeared in the averaging window would cause a sudden drop in brightness. Instead, it should have an influence based on its distance from the center of the window.

I've downloaded and perused a number of PDFs of people thinking about this same problem. One takes a very sensible approach of figuring out minluma and maxluma for each scene, then the correction for each frame is based on a combination of the minluma & maxluma of the current frame and the current scene. This algorithm isn't directly suitable for an avisynth filter, as a literal implementation would require searching forward and backward potentially for an unbounded number of frames until a scene boundary is detected. In real life, scenes lengths are typically less than a minute long, but that can still be more than 1000 frames. It might make sense to approximate the algorithm by having wide "scene window" and a smaller averaging window like currently exists.

Another failure mode of autolevels() which I had never thought about is the problem of fades and blended transitions. I was unaware of this because all of my work has been on 8mm home movies, and every cut is a jump cut. I think it would be relatively easy to look for a smooth transition to/from near black and to preserve it rather than attempting to boost those dark frames. Having that larger "scene" window would make it easy to see such transitions coming.

I had also long thought of some ways to improve autolevels' scene boundary detection logic using just the already-gathered histogram data, but before charging off to try it, I did a google search. It turns out that detecting scene boundaries is a ripe research area. Google on "shot boundary detection" and you'll find dozens of papers. There is even an annual shoot-out where researchers pit their latest algorithms against a battery of videos to see which works best. Many of these algorithms are not suitable for avisynth for the same reason stated before: they assume that the entire video will be processed in order, perhaps with two passes. avisynth filters can do this, but even with caching, the first frame out of the filter may take many seconds/minutes to produce.

One final observation -- none of the photo and video equalization algorithms operate in YUV space -- they either use RGB or better yet, HSV. It wouldn't be hard to change autolevels to convert from YUV to HSV, make adjustments, then convert back, but it would be performance hit and I know for a great number of avisynth users, speed is more important than quality (well, quality matters, but not if it drops processing to sub-realtime).

25th April 2011, 04:01	#85 \| Link
frustum Registered User Join Date: Sep 2010 Location: Austin, TX Posts: 40	John, thanks for the input. I had some time this weekend to think a little about autolevels, and one of the ideas is along the lines of your first comment -- a clustered group of N luma=255 pixels should matter more than N single pixels with luma=255 which are scattered around. My thought was to pre-process each frame with something like a median filter with a small radius (maybe just 1) to get rid of those specs. Perhaps one of the more sophisticated speckle removal filters would be enough. Stats would be taken on this munged frame and applied to the original frame pixels. Warning: rambling thoughts ahead. Another fundamental problem is this: the statistics take the mean histogram high/low values in a window +/-N frames of the current frame (ignoring frames judged to be from a different scene). The problem is that if the histogram changes suddenly, say due to the scene panning and something glaring in the sun pops into frame, this bright area will be visible in N+1 frames or fewer but it is averaged with 2N+1 frames by the time that bright spot becomes the center of the averaging window. To be more specific, say maxluma=200 for a good long time, then this bright object suddenly appears and stays in frame, and shows up as a spike near luma=240. maxluma averaged over the 2N+1 frame window will be (N200 + (N+1)240), or roughly 220 for larger values of N. This will cause all the pixels with luma > 220 to get saturated, wiping out all detail in the bright regions. I think I should take this into account and do a smooth limiting function when calculating the average maxluma. Doing something simple like avg_maxluma = min( sum(maxluma(frame=current-N to current+N)), max(maxluma(frame=current-N to current+N)) ) would fail -- the first time the bright frame appeared in the averaging window would cause a sudden drop in brightness. Instead, it should have an influence based on its distance from the center of the window. I've downloaded and perused a number of PDFs of people thinking about this same problem. One takes a very sensible approach of figuring out minluma and maxluma for each scene, then the correction for each frame is based on a combination of the minluma & maxluma of the current frame and the current scene. This algorithm isn't directly suitable for an avisynth filter, as a literal implementation would require searching forward and backward potentially for an unbounded number of frames until a scene boundary is detected. In real life, scenes lengths are typically less than a minute long, but that can still be more than 1000 frames. It might make sense to approximate the algorithm by having wide "scene window" and a smaller averaging window like currently exists. Another failure mode of autolevels() which I had never thought about is the problem of fades and blended transitions. I was unaware of this because all of my work has been on 8mm home movies, and every cut is a jump cut. I think it would be relatively easy to look for a smooth transition to/from near black and to preserve it rather than attempting to boost those dark frames. Having that larger "scene" window would make it easy to see such transitions coming. I had also long thought of some ways to improve autolevels' scene boundary detection logic using just the already-gathered histogram data, but before charging off to try it, I did a google search. It turns out that detecting scene boundaries is a ripe research area. Google on "shot boundary detection" and you'll find dozens of papers. There is even an annual shoot-out where researchers pit their latest algorithms against a battery of videos to see which works best. Many of these algorithms are not suitable for avisynth for the same reason stated before: they assume that the entire video will be processed in order, perhaps with two passes. avisynth filters can do this, but even with caching, the first frame out of the filter may take many seconds/minutes to produce. One final observation -- none of the photo and video equalization algorithms operate in YUV space -- they either use RGB or better yet, HSV. It wouldn't be hard to change autolevels to convert from YUV to HSV, make adjustments, then convert back, but it would be performance hit and I know for a great number of avisynth users, speed is more important than quality (well, quality matters, but not if it drops processing to sub-realtime).