PDA

View Full Version : Opinions needed for a multi-threading library


cretindesalpes
29th January 2012, 22:13
I'm currently writing a small multi-threading library mainly for Avisynth plug-in development (but not restricted to). I have a few questions and would like to read advices form people here before going further.

The library works as a thread pool, meaning that you just have to push parallel tasks into the queue and wait for their completion.

Now I would like to make all the plug-ins implementing the library share a single pool of threads per process, in order to reduce the resource load, particularly when using the MT modes. So instead of having 50+ threads trying to access more or less simultaneously the computer resources, the plug-ins would share the N threads available from the pool (more the original MT threads). This would help to reduce the number of MT threads in order to reach a full CPU load with a single Avisynth process.

The main drawback is that the common code would have to be in a shared dll, installed in the System32/SysWOW64 folder. This probably sounds a bit complicated for most users (remember the libfftw*.dll issues?) And this is one more dll in the Avisynth plugin folder if you want the possibility to set the global number of threads from the script. So I'm wondering if the shared stuff is really worth it. Or is there any alternative for sharing the thread pool?

At the moment, the header looks like this:
#ifdef __cplusplus
extern "C"
{
#endif


#ifdef __cplusplus
namespace avstp { class TaskDispatcher; }
typedef avstp::TaskDispatcher avstp_TaskDispatcher;
#else
typedef struct avstp_TaskDispatcher avstp_TaskDispatcher;
#endif // __cplusplus

typedef void (__cdecl *avstp_TaskPtr) (avstp_TaskDispatcher *td_ptr, void *user_data_ptr);

enum
{
avstp_Err_OK = 0,

avstp_Err_EXCEPTION = -999,
avstp_Err_INVALID_ARG
};

enum { avstp_INTERFACE_VERSION = 1 };



__declspec (dllexport) int __stdcall avstp_get_interface_version ();
__declspec (dllexport) avstp_TaskDispatcher * __stdcall avstp_create_dispatcher ();
__declspec (dllexport) void __stdcall avstp_destroy_dispatcher (avstp_TaskDispatcher *td_ptr);

__declspec (dllexport) int __stdcall avstp_get_nbr_threads ();
__declspec (dllexport) int __stdcall avstp_enqueue_task (avstp_TaskDispatcher *td_ptr, avstp_TaskPtr task_ptr, void *user_data_ptr);
__declspec (dllexport) int __stdcall avstp_wait_completion (avstp_TaskDispatcher *td_ptr);


#ifdef __cplusplus
}
#endif

Any comment appreciated.

SAPikachu
7th February 2012, 01:13
I think we don't actually have to put the DLL into the system folder, we can implement a loader function like this:


avstp_FunctionTable* __inline avstp_get_function_table(void)
{
TCHAR* memory_mapping_name = <generate a name based on ID of the current proccess>;
TCHAR* mutex_name = <similar to above>;
HANDLE map_handle = CreateFileMapping( ... );
HANDLE mutex_handle = CreateMutex( ... );

// mutex-related code snipped

if (map_handle && GetLastError() != ERROR_ALREADY_EXISTS)
{
// Get path of the running DLL,
// load avstp DLL in the same directory or AVS plugin directory,
// call a exported function to get function table,
// store the function table to the memory mapping
} else if (map_handle) {
// Initialized in another place, just use the stored table
} else {
// Error
}
return <pointer to the function table>;
}

cretindesalpes
7th February 2012, 08:35
Yes, this is more or less the solution I came up with in between. Current results are encouraging. With a few utility classes wrapping the main API, I could multi-thread almost all the dither.dll functions in about a day, and the most simple functions just took 5 min to convert. I'll do more testing and will publish something.

SAPikachu
7th February 2012, 08:41
Sounds great, looking forward to your results. :)

PhrostByte
11th February 2012, 19:25
Windows has a built-in QueueUserWorkItem API that automatically manages a thread pool shared by the whole process. You could build a header-only library around that without any DLL troubles.

If you want ease of use, Intel's TBB and VC++'s TBB-compatible PPL are very well designed and will share a thread pool so long as you dynamically link the DLL.

But, before you get too far:

In developing ResampleHQ I spent a lot of time optimizing for multi-threading. I got it as efficient as possible but in the end I still ripped it all out in favor of SetMtMode(2) compatibility. Fine-grained threading is simply not very performant when you have the option of threading at a macro level.

It might be better to make something for devs to get SetMtMode(2) compatible.

leiming2006
15th February 2012, 10:20
like PhrostByte said, windows has build-in thread pool feature.
and, how about boost::asio. It also helps maintain a job queue.

cretindesalpes
15th February 2012, 22:41
OK, thanks for your input. I wasn't aware of this Windows function. Anyway my thread pool is working as desired and is efficient even for fine-grained tasks (minus the task begin/end overhead). The base code was originally designed to multithread real-time low-latency audio processing. But of course, some tasks can't be multi-threaded very well and the goal here is to maximize the CPU use while keeping the Avisynth MT threads (hence resource consumption) as low as possible.

PhrostByte, you mean SetMtMode(1), not 2? MT mode 2 compatibility is quite easy to achieve if you don't do silly things like storing frame-related data into global variables or use an old unpatched avisynth.h header.

PhrostByte
16th February 2012, 16:34
my thread pool is working as desired and is efficient even for fine-grained tasks (minus the task begin/end overhead).

I meant the efficiency of fine-grained concurrency in general, not necessarily anything library-specific. A CPU's prefetching logic is easy to take for granted, and fine-grained concurrency can defeat it even easier.

You operate on large chunks to diminish this, but macro-level threading will always be easier and more performant.

PhrostByte, you mean SetMtMode(1), not 2? MT mode 2 compatibility is quite easy to achieve if you don't do silly things like storing frame-related data into global variables or use an old unpatched avisynth.h header.

SetMtMode(2). Many filters don't have state, but some do. I've considered writing some code to allow these filters to pool their state (so it's not allocated per-frame) in a NUMA-aware fashion, but I doubt it would result in a noticeable perf gain.