I've tried converting NNEDI3 to DirectCompute, but it performed *much* slower than with OpenCL, unlike error diffusion which was actually slightly faster with DirectCompute. So nevcairiel ist right, NNEDI3 is done in OpenCL, error diffusion in DirectCompute.
@Orf, unfortunately I don't have any time atm. But I know that it was 100% reproducable for me, when I reported the problem.
|