Doom9's Forum - View Single Post - DDigit v1.06 Text Rendering Pack For Plugin Writers, 30 Mar 2015

IanB · 5th December 2010, 03:36

As always in the CVS repository, avisynth > src > core > info.h.

No way X86 has enough registers to do all 3 loops plus the decisioning, but it doesn't really matter that much, this is C++, not hand optimised assembler. As I have restructured the code, the inner loop executes 10 times, bit shifting the font mask in a single int and sequentially accessing the pixel line data. The middle loop executes strlen(text) times, it picks up the next font mask and has the inner loop continue sequentially accessing the pixel line data. This maximises L1 cache performance for video pixels. The outer loop executes 20 times and has the middle loop re-access the text string data each time, but this is 1 access per 10 pixel accesses, the L2 cache can easily manage this. The pixels data access commence on the next line. But all this is pretty theoretical, practically all the improvement happened just by reversing the row/column order, moving the text scan into the middle loop gained a very minor amount.

Also the level at which these functions are targeted does not necessarily have access to a VideoInfo structure, to be able to select the appropriate routine would need this information and possibly limit the applicability of the code. When coding Avisynth functions best practice is to make pixel type and other VideoInfo decisions in the class constructor and then have the GetFrame routine blindly crank the handle.

5th December 2010, 03:36	#14 \| Link
IanB Avisynth Developer Join Date: Jan 2003 Location: Melbourne, Australia Posts: 3,167	As always in the CVS repository, avisynth > src > core > info.h. No way X86 has enough registers to do all 3 loops plus the decisioning, but it doesn't really matter that much, this is C++, not hand optimised assembler. As I have restructured the code, the inner loop executes 10 times, bit shifting the font mask in a single int and sequentially accessing the pixel line data. The middle loop executes strlen(text) times, it picks up the next font mask and has the inner loop continue sequentially accessing the pixel line data. This maximises L1 cache performance for video pixels. The outer loop executes 20 times and has the middle loop re-access the text string data each time, but this is 1 access per 10 pixel accesses, the L2 cache can easily manage this. The pixels data access commence on the next line. But all this is pretty theoretical, practically all the improvement happened just by reversing the row/column order, moving the text scan into the middle loop gained a very minor amount. Also the level at which these functions are targeted does not necessarily have access to a VideoInfo structure, to be able to select the appropriate routine would need this information and possibly limit the applicability of the code. When coding Avisynth functions best practice is to make pixel type and other VideoInfo decisions in the class constructor and then have the GetFrame routine blindly crank the handle.