There's such a microscopic difference between those two frames that I have to ask if it's really worth it - or if other scenes show a much bigger difference in quality. Presumably you're going to be re-encoding, so you may find out your re-encode looks just as "bad" as the one you've called bad, if the whole thing has such a small difference.
You'll need to do some more tweaking, anyway, since your "tweaked" difference frame doesn't include the drop shadow - maybe invert() and then use overlay with mode="lighten".
binarize in Masktools will do the black/white thing, or you can probably achieve the same thing with careful use of levels(). But you'll also need to spread out the mask a bit, otherwise you'll get some ghostly subtitle outlines.
|