I have added support for processing standard 8-bit-per-channel instead of 16-bit-per-channel. Simply add parameter "precision=1" to each conversion and shader calls. Quality is considerably lowered.
With MT=8
SuperRes(1, 1, 0, true, """nnedi3_rpow2(2, cshift="Spline16Resize", Threads=2)""")
I get 24fps @ 78% CPU