View Single Post
Old 25th January 2007, 19:00   #4  |  Link
ARDA
Registered User
 
Join Date: Nov 2001
Posts: 291
Quote:
Originally Posted by IanB
GreyScale is nice and simple to analyse but is not a good practicle target to optimise,
even in C++ code it is fast so your return on investment is small.
i.e. If greyscale takes 5ms per frame (200 fps) and you optimise it to be 10 times faster,
0.5ms per frame (2000 fps) but it is part of a filter chain that takes 100ms per frame (10fps)
then the improvement is only 4.5 ms or 95.5ms per frame (10.47 fps).
This new option of doing greyscale is not probably too important in itself; this thread was
mainly about memset:
Quote:
Originally Posted by ARDA
At first I was just studying greyscale codes. I've started looking for assembler improvements
and finished just thinking how to do a kind of Memset for modern architectures and taking advantages
of SSE2 instructions etc.
Finally I arrived to a code that is already usable and fast enough in this plugins GreyYV12.

Meanwhile one purpose can be achieved, some steps to develop alternative codes for memset.
Besides that the fact that greyscale under YV12 was so fast with simple C++ code
encouraged me to take up a challenge.
Quote:
Originally Posted by IanB
Spend your time optimizing the filter that takes 50+ms per frame!
Thanks for trusting on my skills and knowledge! I know you are pointing to the right direction.
Maybe in near future I'll face up to more complex tasks; by now I am sharing that I can
assume according to my capabilities and time.

Coming back to develop code:
Quote:
Originally Posted by IanB
A better implementation would use IsWriteable so the two outcomes become :-
1. Blit the just Luma plane, then memset both chroma planes.
or
2. Keep the Luma plane and memset both chroma planes.
Saving is the blits of 2 chroma planes in case 1.
As always you have pointed a good way to develop this plugins

I have tested and confirm your previsions. Soon I'll put some benchmark results.

Finally I want to point the real objective of this thread An approach to an open source fast memset ISSE and SSE2 code shows an example how to use write combining and how to take such decision.
Another point would be to finish this code to have a full compatible memset library for avisynth.
Is that usefull? I don't know. Please, give us your opinion about that!
Just to mention memset is used 95 times in avisynth, but I confess I don't know the real weigh of them.
Fast memset of Intel Compiler is already done, but I wanted an open source code with similar perfomace
or better. Why not?

Summarizing my points:
Find bugs
Improve code
A discussion about write combining techniques and when they should be applied
How to take such decision.
Make it compatible with standard library

I don't aspire to limit the discussion only about these subjects but I would appreciate comments in
such directions.

Thanks ARDA

Version 1.2.1
updated:
Source and dll http://www.iespana.es/Ardaversions/GREY1_21.7z

Last edited by ARDA; 23rd June 2007 at 21:11. Reason: version update
ARDA is offline   Reply With Quote