View Single Post
Old 17th August 2010, 15:27   #65  |  Link
hust_xcl
Registered User
 
Join Date: Jul 2010
Posts: 11
Quote:
Originally Posted by royia View Post
Crossing my fingers for you.
Just for knowledge, are you aiming for Open CL or CUDA?

I wish I could help :-).
iRoyia, I have to suspend the work because the emulation mode does not support shared memory. while my old computer can only run in that mode. After I buy a new nv card, I will restart.
IMHO,
1.Interpolation can be optimized by cuda.
2. Full search is more suitable than diamond search to be used to offload me on gpu.
My algorithm is:
(1) Transfer an original frame and a reference frame to gpu
(2) For 7 search modes (16x16, 16x8, 8x16…), each one employs a full search of 8x8 search range around original point(0,0) on gpu.
(3) transfer all the mvs back to cpu
(4) in the analysis process, cpu makes use of the mvs ( predicated mv should be near 0) calculated by gpu.
If the mv is on the border of 8x8 search range, a refined search should be employed to enhance the search result.
hust_xcl is offline   Reply With Quote