Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Hardware & Software > Software players

Closed Thread
 
Thread Tools Search this Thread Display Modes
Old 24th August 2012, 13:17   #20081  |  Link
Liisachan
李姗倩 Lǐ Shān Qiàn
 
Liisachan's Avatar
 
Join Date: Nov 2002
Posts: 1,340
Sample clip

This happened between 5555 and 5572; the revision 5569 by XhmikosR on July 19, 2012 should be the culprit.

This SSE2 optimization from VSFilterMod (i.e. Transform_SSE2) is obviously broken, where they're rotating font-x in the wrong (negative) direction. I just confirmed that VSFilterMod does really rotate font to the opposite direction. This "bug" should be something trivial, where they typed "+" when they should have typed "-" or something like that, when handling sin or cos.
The original rotation by Gabest agrees with MPC-HC 1.6.2 and before; with ffdshow, VLC, and probably with libass too (which I cannot test directly since I'm sadly on Windows). The latest version of MPC-HC is practically the only one that is not compatible.

See for yourself: font_rotation_x.mkv (~50 kB)

MPC rev. 114


ffdshow rev. 4483


VLC v2.0.3


MPC-HC rev. 5555


MPC-HC rev. 5572 (Broken!)


If you guys are not interested in \frx at all, will you at least revert to the non-SSE2 version before many typesetters start rotating fonts to the opposite direction, creating a lot of confusion? Like I said, VSFilterMod should be fixed too if someone is still working on it.

Thanks again for your wonderful jobs!
Liisachan is offline  
Old 24th August 2012, 14:37   #20082  |  Link
sneaker_ger
Registered User
 
Join Date: Dec 2002
Posts: 5,565
VLC is using libass...

xy-vsfilter seems to be unaffected.
sneaker_ger is offline  
Old 24th August 2012, 15:55   #20083  |  Link
Keiyakusha
契約者
 
Keiyakusha's Avatar
 
Join Date: Jun 2008
Posts: 1,576
Why would anyone import something from vsfiltermod anyway? Its not meant for playback! Better revert everything and just integrate xy-vsfilter if you need optimizations.
Keiyakusha is offline  
Old 24th August 2012, 16:12   #20084  |  Link
nevcairiel
Registered Developer
 
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,342
Quote:
Originally Posted by Liisachan View Post
This SSE2 optimization from VSFilterMod (i.e. Transform_SSE2) is obviously broken, where they're rotating font-x in the wrong (negative) direction.
Should be fixed. It was a simple mistake in the SSE2 code, two operands flipped.

Quote:
Originally Posted by Keiyakusha View Post
Why would anyone import something from vsfiltermod anyway? Its not meant for playback! Better revert everything and just integrate xy-vsfilter if you need optimizations.
Not much was added from VSFilterMod, only stuff that seemed harmless at the time, like this SSE2 optimization.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders
nevcairiel is online now  
Old 25th August 2012, 02:32   #20085  |  Link
Liisachan
李姗倩 Lǐ Shān Qiàn
 
Liisachan's Avatar
 
Join Date: Nov 2002
Posts: 1,340
Still buggy though better

Quote:
Originally Posted by nevcairiel View Post
Should be fixed. It was a simple mistake in the SSE2 code, two operands flipped.
Thank you very much, nevcairiel, for trying to fix the problem, and thank you very much for working for MPC-HC in general too! Your efforts are really appreciated.

The obvious problem you mentioned has been surely fixed, but as it turned out, something (like the rotation origin, the nature of rotation...) is still wrong, suggesting the SSE2 optimizations in VSFilterMod are not well-tested, not compatible with the unoptimized code, and this problem is more subtle than I first thought.

The following images show the +70-degree x-rotation of "Èj" in the {\pos} (position) (0,0) with the {\an7} style (upper-left corner is the origin), that is the {\org} (origin) is (0,0) too. The original letters that are not rotated are shown in pale green. The results of the rotation are shown in red.

1) This is the good one, rendered by old, reliable 1.6.2, compatible with Gabest and with ffdshow, and almost compatible with VLC.


2) The next image shows what we have now, using 1.6.4.5887 (698e56d) taken from http://xhmikosr.1f0.de/mpc-hc/ (thanks to XhmikosR).

As you can see, the general direction of the rotation is more or less better now, but this is far from being compatible.

See for yourself:
font_rotation_x2.mkv (~50 kiB): hardsubbed in blue, softsubbed in red.

Also note that, although I said that this was borked in 5569 by XhmikosR, the code was not written by XhmikosR. It was already there, already imported, and conditionally commented out. He/she (XhmikosR) simply enabled it. It's not his/her fault that this SSE2 code is broken.
Liisachan is offline  
Old 25th August 2012, 03:23   #20086  |  Link
cyberbeing
Broadband Junkie
 
Join Date: Oct 2005
Posts: 1,859
I remember these \frx \fry \frz bugs. It's why we disabled the Transform_SSE2 code in xy-VSFilter a long time ago.

Quote:
Originally Posted by nevcairiel View Post
Should be fixed. It was a simple mistake in the SSE2 code, two operands flipped.
As Liisachan mentioned, it's unfortunately still not fixed. The problem affects more than just \frx.

Here is a better sample which was initially created to track down this bug in xy-VSFilter last year:
http://www.mediafire.com/?j9ota09016rt2yb

The video is hardsubbed with a reference VSFilter 2.39 \frx \fry \frz rendering.
The external script subtitles with opposite color should match up identically through all the rotations.
cyberbeing is offline  
Old 25th August 2012, 03:58   #20087  |  Link
Liisachan
李姗倩 Lǐ Shān Qiàn
 
Liisachan's Avatar
 
Join Date: Nov 2002
Posts: 1,340
Quote:
Originally Posted by cyberbeing View Post
The external script subtitles with opposite color should match up identically through all the rotations.
Confirmed here. \fry is wrong too. \frz seems okay (not buggy).

The bug officially came to MPC-HC in the v1.6.3 release like 10 days ago, which I first tried like 2 days ago, when I noticed the incompatibility. So this was a new problem for me. (Btw, xy-VSFilter seems really promising. Like, \fscx/fscy + float is what I've always wanted. I'll look into xy-VSFilter too when I have time.)
Liisachan is offline  
Old 25th August 2012, 07:00   #20088  |  Link
Liisachan
李姗倩 Lǐ Shān Qiàn
 
Liisachan's Avatar
 
Join Date: Nov 2002
Posts: 1,340
I looked into the inside of RTS.cpp, and this is my first impression. If you're watching __pointx in Transform_SSE2:
Code:
__pointx = _mm_add_ps(__xx, __yy); // xx = x * caz + y * saz; //#1
__xx = _mm_mul_ps(__pointx, __cay); // x * cay //#2
__pointx = _mm_add_ps(__xx, __zz); // xx = x * cay + z * say //#3
__xx = _mm_mul_ps(__pointx, __say); // x * say //#4
#1 Now __pointx is the vectorized version of "xx" from the C code.
#2 is wrong: you're using __pointx as "x" but it's "xx" since #1.
#3 makes __pointx "xx" again.
#4 is wrong: you're using __pointx as "x" but it's "xx" since #3.

Similarly, for __pointy:
Code:
__pointy = _mm_add_ps(__yy, __zz); // y = yy * cax + zz * sax //#5
__yy = _mm_mul_ps(__pointy, __sax); // yy * sax //#6
#5 __pointy is "y".
#6 is wrong. You're using __pointy as "yy" but it's "y" since #5.
Liisachan is offline  
Old 25th August 2012, 08:11   #20089  |  Link
nevcairiel
Registered Developer
 
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,342
You're right, i really should've seen that when i documented the calls. I didn't actually change them except flipping the one operand there, just added the docs, but apparently didn't pay enough attention.
No matter, i gave it another try and refactored the whole function to use variable names matching the C ones very closely, no changes of meaning in the variables anymore.

The test videos you and cyberbeing posted are working fine now.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders

Last edited by nevcairiel; 25th August 2012 at 08:24.
nevcairiel is online now  
Old 25th August 2012, 14:38   #20090  |  Link
JanWillem32
Registered User
 
JanWillem32's Avatar
 
Join Date: Oct 2010
Location: The Netherlands
Posts: 1,083
Well, you chose quite a monster of a function to work on there. (I'll try to explain it mostly to the public that knows at least basic programming, but hasn't seen this function yet.)
I disabled Transform_SSE2() when I was fixing the issues with scaling to the video area on the window area and aspect ratio correction with the frx, fry and frz functions a while ago. I marked Transform_SSE2() with: "TODO: The methods here are very wrong (the C function does work as it should), fix this with decent code, or delete this function." As resolving the list of issues of Transform_SSE2() plus fixing the scaling and aspect ratio bugs seemed to be too much work at the time, I chose to only edit Transform_C().
As the variables are prefixed with "_", I was already wondering who would write such code. That prefix is generally reserved for the global compiler-specific functions, constants and variables. Renaming would indeed be a good start.

A quick review of the function shows that:
-None of the typecasts between single- and double-precision floating points and 32-bit signed integers used in this function is properly handled with SSE.
-"static_cast<LONG>(x + 0.5)" is used to do rounding casts to integer in both the C and the SSE functions. SSE has native rounding casts to do this, which also handles negative values properly. (Vertices stored as integers is one of the most retarded design choices of the subtitle renderer. Using floating-point vertices would eliminate these type casts and provide decent sub-pixel precision.)
-No intrinsic to generate a native load data to register is used. "_mm_set_~" is used everywhere, where "_mm_load_~" should have been used.
-The basic cos() and sin() functions are not handled with SSE. (I'll ignore the horribly truncated PI in these lines for now.)
-4 lines of code feature access to member variables of the __m128 union. That's illegal if you take it strictly, but the compiler will probably bend over backwards and do a lot of register shuffles to output the correct values.
-"__m128 __pointz = _mm_set_ps1(0);"; this functionality is provided properly by _mm_setzero_ps().
-_mm_rcp_ps() (reciprocal approximation) is used. That's reasonably okay, but the output precision with only this instruction is low (12 bits). It could use a Newton-Raphson iteration to improve that (to about 22 bits, out of the full 24 bits). Else, a regular floating-point division to get the reciprocal would work fine.

The design choice of using single-precision instead of double-precision for this function is okay. (Transform_C() uses doubles.) The lower precision isn't significant. Handling this function with SSE2 vectorized doubles would have been just as easy, though.
I don't think that the author of the Transform_SSE2() function had any experience working with SSE intrinsics or ever wrote assembly before. The function is marked with "// speed up ~1.5-1.7x", but I doubt that to be true. Pretty much all the code for the subtitle renderer's SSE2 functions have issues, although most are less severe than this one.
Unfortunately, this function illustrates the state the subtitle renderer is in rather well. Refactoring the functions with these kinds of issues isn't easy, but any effort on getting proper, efficient routines in the subtitle renderer is very welcome indeed.
__________________
development folder, containing MPC-HC experimental tester builds, pixel shaders and more: http://www.mediafire.com/?xwsoo403c53hv
JanWillem32 is offline  
Old 25th August 2012, 15:13   #20091  |  Link
nevcairiel
Registered Developer
 
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,342
I just fixed the math in it to match the C math, nothing else.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders
nevcairiel is online now  
Old 25th August 2012, 15:27   #20092  |  Link
cyberbeing
Broadband Junkie
 
Join Date: Oct 2005
Posts: 1,859
On that note, xy-VSFilter rewrote the Transform_C code for better performance a couple weeks ago. Though from what I remember hearing, the transform code was never really a bottleneck for VSFilter in the first place.
cyberbeing is offline  
Old 25th August 2012, 16:20   #20093  |  Link
Liisachan
李姗倩 Lǐ Shān Qiàn
 
Liisachan's Avatar
 
Join Date: Nov 2002
Posts: 1,340
The fix itself is awesome. Thank you very much, Nevcairiel. (I've not yet got a binary, though.)

However...
Quote:
Originally Posted by JanWillem32 View Post
-_mm_rcp_ps() (reciprocal approximation) is used. That's reasonably okay, but the output precision with only this instruction is low (12 bits).
That's a very good point. And I'm not sure if it's okay. If x=1000, the possible error is roughly 1000/(2^12) = 0.25. But what if the true value is x=90.5 and we got x=90.4 because of this? Like this, a few points (maybe 1 out of 100) in the path will be off by 1 unit (1/8-pixel?) after rounding and slight distortion is possible, compared to the C version. Just divide, and the result is 10000 times more accurate:
Code:
__xx = _mm_mul_ps(__xx, __20000); // xx * 20000
__pointx = _mm_div_ps(__xx,__zz); // x = (xx * 20000) / (zz + 20000)
__yy = _mm_mul_ps(__yy, __20000); // yy * 20000
__pointy = _mm_div_ps(__yy, __zz); // y = (yy * 20000) / (zz + 20000);
Or use _mm_rcp_ps a la Newton.
Code:
//__zz = _mm_rcp_ps(__zz); // 1 / (zz + 20000)
__m128 z0 = _mm_rcp_ps(__zz);
__zz = _mm_sub_ps( _mm_add_ps(z0,z0), _mm_mul_ps(z0,_mm_mul_ps(z0,__zz)) );
Quote:
Originally Posted by JanWillem32 View Post
-"static_cast<LONG>(x + 0.5)" is used to do rounding casts to integer in both the C and the SSE functions. SSE has native rounding casts to do this, which also handles negative values properly.
I don't think they are compatible for x=...-2.5, -1.5, 0.5, 1.5, 2.5, 3.5... (Cast vs. Round to nearest even). And I think x=1 means 1/8-pixel in VSFilter (it's already subpixel). Although it's a fact that VSFilter code is kind of insane in a good sense or a bad sense, it was not the original authors (Avery Lee/Gabest) who wrote this SSE2 function.
Liisachan is offline  
Old 25th August 2012, 16:41   #20094  |  Link
nevcairiel
Registered Developer
 
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,342
the easiest solution would obviously be to use _mm_div_ps, but no idea how speed comparison is there.
Considering the C version also uses a real division, its probably not going to be slower then C at least.

Edit:
I pushed a patch to use _mm_div_ps and another small cleanup.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders

Last edited by nevcairiel; 25th August 2012 at 16:47.
nevcairiel is online now  
Old 25th August 2012, 17:53   #20095  |  Link
JanWillem32
Registered User
 
JanWillem32's Avatar
 
Join Date: Oct 2010
Location: The Netherlands
Posts: 1,083
Quote:
Originally Posted by Liisachan View Post
I don't think they are compatible for x=...-2.5, -1.5, 0.5, 1.5, 2.5, 3.5... (Cast vs. Round to nearest even). And I think x=1 means 1/8-pixel in VSFilter (it's already subpixel). Although it's a fact that VSFilter code is kind of insane in a good sense or a bad sense, it was not the original authors (Avery Lee/Gabest) who wrote this SSE2 function.
Subpixel rendering is very useful for anti-aliasing. (Something the subtitle renderer scores really badly at, and it isn't even configurable for the degree of anti-aliasing.) The best solution to render curvy shapes is to have their vertices stored as floating-point throughout the entire pipeline (like any normal image renderer). There are multiple casts back and forth to integer and floating point for various objects, this is just one of the functions that does that. Oh well, it could be a lot worse. The functions that take care of subtitle color rendering for instance...

I took a peek at the latency and throughput table (Intel® 64 and IA-32 Architectures Optimization Reference Manual, edition june 2011):
Given for a Sandy Bridge model (06_2AH) and an older Merom (06_0FH):
divps: 14/14, <21, <16
rcpps: 5/1, 3/1
mulps: 5/1, 4/1
addps: 3/1, 3/1
subps: 3/1, 3/1

I certainly can also look it up for AMD, but I think it won't matter much. Straight divisions are always expensive in both latency and throughput, no matter for integer or floating point.
The rcpps and a Newton-Raphson iteration method is mostly a lot faster if the pipeline can be reordered easily. If the µops are crammed together, the pipeline will stall for a little bit.
Of course, divps outputs with full 24 bits of precision, and the approximate routine with about 22. That's probably the main reason that there's no rcppd or rcpsd for doubles.
__________________
development folder, containing MPC-HC experimental tester builds, pixel shaders and more: http://www.mediafire.com/?xwsoo403c53hv
JanWillem32 is offline  
Old 26th August 2012, 02:57   #20096  |  Link
cyberbeing
Broadband Junkie
 
Join Date: Oct 2005
Posts: 1,859
Quote:
Originally Posted by cyberbeing View Post
On that note, xy-VSFilter rewrote the Transform_C code for better performance a couple weeks ago.
I was just emailed the following from Yu Zhuohuang (xy-VSFilter dev):
Quote:
Nevcairiel may still want to look into our commit bac768e02081317ae17cf19e496ca3a3a1e5f7af, in which the Transform_C function was rewritten. This one is well documented, in comparison to other changes I've made. That commit reduces the number of multiplication to 1/3 of the original function. It is likely faster than a "1.7x - 2.x" speed up SSE code.
@Nevcairiel

He's basically suggesting that xy-VSFilter's Transform_C code be used as a base if any proper Transform_SSE2 function is ever going to be written.

If xy-VSFilter's rewritten Transform_C really is faster than the 'fixed' VSFilterMOD Transform_SSE2, that may be a good reason for MPC-HC to just scrap it and start over.
cyberbeing is offline  
Old 26th August 2012, 07:55   #20097  |  Link
JanWillem32
Registered User
 
JanWillem32's Avatar
 
Join Date: Oct 2010
Location: The Netherlands
Posts: 1,083
I took a peek at that commit.
PI is still truncated to 3.1415. DSUtil (linked in) has M_PI with enough digits to fill a double.
Code:
double scalex = style.fontScaleX/100;
Right-hand side is implicitly an integer, the intent is a double. Luckily, only it's only esthetic in this case, but the same is done over and over again in the subtitle renderer's code, while it doesn't have to be like that at all.
Code:
(long)(x + org.x + 0.5);
Is the org struct still made of integers? If so, two lines incur a useless cast to double and a cast back to long. Also, in many cases, org.x and .y are implicitly cast to double. Casting them to a temp double value before the loop would probably be better for those cases. Otherwise it looks like a good improvement overall, and a lot better to use than the current SSE2-ish function.
__________________
development folder, containing MPC-HC experimental tester builds, pixel shaders and more: http://www.mediafire.com/?xwsoo403c53hv
JanWillem32 is offline  
Old 26th August 2012, 08:51   #20098  |  Link
Liisachan
李姗倩 Lǐ Shān Qiàn
 
Liisachan's Avatar
 
Join Date: Nov 2002
Posts: 1,340
Tested 1.6.4.5895 (0235e5b), and the bug was finally fixed. Yay! Since I don't have a binary with the fix AND _mm_rcp_ps, I can't tell how _mm_rcp_ps is bad in a real-world sample.

The SSE2 version of CWord::Transform was originally committed on May 13, 2010:
https://code.google.com/p/vsfiltermod/source/detail?r=76
The log message says, "is this good and correct?" So the author was not sure themself if they were doing this right. But hey, everyone has the right to enjoy experimenting It's part of what free-software is all about. Let's stop blaming this function. It was unfortunate that this was just imported to MPC-HC without being tested properly.

Similarly, things should be tested properly before imported from xy-VSFilter, even though I'm willing to support xy-VSFilter too. I read somewhere that xy-VSFilter crashed just because you used a negative value between {\p1}...{\p0}, showing it's not very stable yet, though promising.

Quote:
Originally Posted by JanWillem32 View Post
Subpixel rendering is very useful for anti-aliasing. (Something the subtitle renderer scores really badly at, and it isn't even configurable for the degree of anti-aliasing.)
If you're implying VSFilter is not handling subpixels or it's not doing anti-aliasing, that is incorrect. The following images show, left to right: 1) Not anti-aliased; 2) Anti-aliased, pixel-accuracy; 3) Half-pixel accuracy; 4) Quarter-pixel accuracy; 5) VSFilter (I think this is 1/8-pixel accuracy).


Compare the 4th and 5th images, and you'll see the quality gain by going from qpel to 8th-pel is marginal. Probably the algorithm of anti-aliasing itself could be improved, but you wouldn't gain much here by just using (float) or (double).
A subtitle renderer has to render things quickly, like 24 times a second. When Subtitler (the parent of VSFilter) was created about 10 years ago by Avery Lee (the same author of VirtualDub), 0.125-pixel accuracy stored as integer was probably an optimal choice - considering the CPUs used back then. And no one is going to say that Avery Lee didn't use float because he didn't know the best solution. That's too disrespectful! Today we have better hardware, and probably someone will create something even better eventually, like libass. But even if the font rendering quality of VSFilter is not perfect, maybe the "limited subpixelness" (8x8 subpixels per pixel) is not the reason. Rendered text looks solid to me, if not perfect. Correct me if I'm wrong.

Quote:
Originally Posted by JanWillem32 View Post
There are multiple casts back and forth to integer and floating point for various objects.
I agree with you about this one. For one thing, I've been always unhappy about \fscx, where VSFilter does (double)wcstol, which doesn't make sense to me. If only it was wcstod... then you could use 99.5% or 100.5% font size easily. Technically, though, that would break the compatibility of a tag like {\fscx99.5}.

It seems better to me not to change 3.1415, even though it's weird. Changing it might break existing ASS scripts that depend on this weirdness. But then again, {\blur} broke the compatibility of {\b} too...
Liisachan is offline  
Old 26th August 2012, 08:54   #20099  |  Link
nevcairiel
Registered Developer
 
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,342
All i did was to make the SSE2 function behave the same as the C function, i have no real interest in changing it any more.

If someone wants to still optimize the old and rather broken VSFilter, have fun.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders
nevcairiel is online now  
Old 26th August 2012, 09:28   #20100  |  Link
MasterNobody
Registered User
 
Join Date: Jul 2007
Posts: 552
While you fixing subtitles renderer may be somebody can look at crash with this sample:

To reproduce bug at least this conditions must be met:
1) Internal subtitles enabled and configured like this:

2) Desktop resolution is 1280x1024
3) Sample played in full screen from beginning to end

I checked with MPC-HC 1.6.4.5895 (0235e5b) x86 and it still happen. OS: Windows XP SP3.

P.S. Looks like final crash happen not inside subtitle renderer but caused by it (probably write outside of its memory).
MasterNobody is offline  
Closed Thread

Tags
dxva, h264, home cinema, media player classic, mpc-hc

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 11:15.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.