View Single Post
Old 25th August 2012, 17:53   #20095  |  Link
JanWillem32
Registered User
 
JanWillem32's Avatar
 
Join Date: Oct 2010
Location: The Netherlands
Posts: 1,083
Quote:
Originally Posted by Liisachan View Post
I don't think they are compatible for x=...-2.5, -1.5, 0.5, 1.5, 2.5, 3.5... (Cast vs. Round to nearest even). And I think x=1 means 1/8-pixel in VSFilter (it's already subpixel). Although it's a fact that VSFilter code is kind of insane in a good sense or a bad sense, it was not the original authors (Avery Lee/Gabest) who wrote this SSE2 function.
Subpixel rendering is very useful for anti-aliasing. (Something the subtitle renderer scores really badly at, and it isn't even configurable for the degree of anti-aliasing.) The best solution to render curvy shapes is to have their vertices stored as floating-point throughout the entire pipeline (like any normal image renderer). There are multiple casts back and forth to integer and floating point for various objects, this is just one of the functions that does that. Oh well, it could be a lot worse. The functions that take care of subtitle color rendering for instance...

I took a peek at the latency and throughput table (Intel® 64 and IA-32 Architectures Optimization Reference Manual, edition june 2011):
Given for a Sandy Bridge model (06_2AH) and an older Merom (06_0FH):
divps: 14/14, <21, <16
rcpps: 5/1, 3/1
mulps: 5/1, 4/1
addps: 3/1, 3/1
subps: 3/1, 3/1

I certainly can also look it up for AMD, but I think it won't matter much. Straight divisions are always expensive in both latency and throughput, no matter for integer or floating point.
The rcpps and a Newton-Raphson iteration method is mostly a lot faster if the pipeline can be reordered easily. If the µops are crammed together, the pipeline will stall for a little bit.
Of course, divps outputs with full 24 bits of precision, and the approximate routine with about 22. That's probably the main reason that there's no rcppd or rcpsd for doubles.
__________________
development folder, containing MPC-HC experimental tester builds, pixel shaders and more: http://www.mediafire.com/?xwsoo403c53hv
JanWillem32 is offline