well as long as the partionsize is a power of 2 the slowdown shouldn't be that inefficient. I mean on a modern computer there should be more than enough proccesing power to do the fft in realtime. Have a look at my
variableblur sourcecode (gaussian.cpp/.h) for how it is possible to use fftw to do a 2d convolution (of course with audio it is only 1d=faster)