Log in

View Full Version : MPC-HC GothSync tryouts


Pages : 1 2 3 4 5 6 7 8 [9] 10 11 12 13 14 15

pirlouy
7th October 2009, 22:54
- I can assure it's not in the video, since judder is not always at the same place.
- I've tried several settings (even if I don't understand anything, but I haven't seen differences.

In fact, I'm not sure:

http://img12.imageshack.us/img12/3001/gothsync2.jpg

With this graph, knowing I'm in 24 Hz, I should have several glitches, like if video was jerky, shouldn't I ? Because in fact the graph panics, but I don't see any judder during this time. Could it be possible that the graph is not right ? :/

ar-jar
8th October 2009, 07:10
- I can assure it's not in the video, since judder is not always at the same place.
- I've tried several settings (even if I don't understand anything, but I haven't seen differences.

In fact, I'm not sure:

http://img12.imageshack.us/img12/3001/gothsync2.jpg

With this graph, knowing I'm in 24 Hz, I should have several glitches, like if video was jerky, shouldn't I ? Because in fact the graph panics, but I don't see any judder during this time. Could it be possible that the graph is not right ? :/

Hi, this looks like you don't have matching rates. The video is running slower than the display. My guess is that your display is at 24 Hz and the video is 23.976 fps. This is not a situation handled by the Sync Video sync option that you have selected. Cheers! -A

Edit: may be I answered the wrong question :-) Yes you should see judder and you will if the ugliness in the graphs conicide with a pan scene in the movie. The graph is most likely correct wrt timings.

Jong
8th October 2009, 09:19
Yeah, the green line should not be slopping like that. Can we have a full screenshot?

pirlouy
8th October 2009, 11:59
Hi, this looks like you don't have matching rates. The video is running slower than the display. My guess is that your display is at 24 Hz and the video is 23.976 fps. This is not a situation handled by the Sync Video sync option that you have selected. Cheers! -A
Ok. Then I really have not understood what Goth Sync offers. I'll try to read your doc again. I thought that was the aim of Goth Sync: sync video and refresh rate when they are really close (like 23.976 and 24).
Maybe a guru should write a topic "how to (and why) achieve smoothness video playback" for people like me...

Edit: may be I answered the wrong question :-) Yes you should see judder and you will if the ugliness in the graphs conicide with a pan scene in the movie. The graph is most likely correct wrt timings.
Then I confirm I don't see judder. But for information, I've tried with these settings:
http://img196.imageshack.us/img196/4112/gothsync3.png
If I set "Target sync offset" to 12ms, the graph is prettier but I don't see differences when watching video. I've tried with different values, but result is not each time the same (judder-free or not).

But as I say, I think it's just 24 Hz which is unusable and there's nothing we can do.


@Jong: http://img159.imageshack.us/img159/3397/gothsync4.jpg

STaRGaZeR
8th October 2009, 12:27
Don't use Sync video to display otherwise you'll get what you're seeing. Use Present at nearest VSync, you'll get a flat line and no judder except for one glitch each 10-15 seconds because of the 23,976-->24 mismatch. If you want a perfect line with no glitches whatsoever use Reclock, configured to Nearest integer speed. It will accelerate your video a little to 24FPS and you'll achieve perfect sync with no glitches.

@ar-jar, now we get those glitches (I get the same, you can see it in my previous screenshots) when trying to do 23,976-->24. It's possible to modify your algorithm so it can be properly done or it's just impossible by design? Also there's a new audio renderer coming to MPC-HC with WASAPI support, maybe it can be useful for all the sync stuff.

noee
8th October 2009, 12:33
If you want a perfect line with no glitches whatsoever use Reclock, configured to Nearest integer speed. It will accelerate your video a little to 24FPS and you'll achieve perfect sync with no glitches.

THis is exactly what I use with the MPC-Goth trial and the latest (9018) is giving me outstanding results. Pirlouy, if you are playing back 23.976fps on a 24Hz display, this is the best option.

STaRGaZeR
8th October 2009, 14:02
I also use that config but with 120Hz, perfect results too.

Jong
8th October 2009, 14:15
Don't use Sync video to display otherwise you'll get what you're seeing. Use Present at nearest VSync, you'll get a flat line and no judder except for one glitch each 10-15 seconds because of the 23,976-->24 mismatch. If you want a perfect line with no glitches whatsoever use Reclock, configured to Nearest integer speed.Reclock configured to "AUTO" is fine too, and no need to mess around when you change configs/refresh rates..

And you shouldn't get a glitch every 1-15 secs surely. Mayeb ar-jar can confirm, but I'd have though it was only approximately every 1000 secs, minus a "safe margin" of maybe 10% or so?

starla_
8th October 2009, 14:56
And you shouldn't get a glitch every 1-15 secs surely. Mayeb ar-jar can confir, but I'd have though it was only approximately every 1000 secs, minus a "safe margin" of maybe 10% or so?

Actually it is more closely to the 40 seconds. 23,976 / 24 approx every 1000th frame and 24 fps per seconds 1000 / 24 is approx 41 seconds.

Jong
8th October 2009, 15:04
Of course, don't know what I was thinking :stupid:

pirlouy
8th October 2009, 18:14
This is a printscreen with these settings:
- reclock active
- refresh rate: 50 Hz
- Present at nearest vsync; 10 ms (9018 build)

http://img203.imageshack.us/img203/4295/gothsync5.jpg


And I have judder.

It's just for information, I would totally understand if you have nothing to answer or if you were bored.

Jong
8th October 2009, 18:49
Are you saying you see judder when the OSD is smooth? And you are sure it is not encoded into the video?

If so, what is your display? Does it truly support 50Hz? Your display could be adapting the signal to an internal 60Hz rate.

ar-jar
8th October 2009, 19:33
This is a printscreen with these settings:
- reclock active
- refresh rate: 50 Hz
- Present at nearest vsync; 10 ms (9018 build)

And I have judder.

It's just for information, I would totally understand if you have nothing to answer or if you were bored.

I understand that you have reset the statistics right before the screeshot. When you see the judder, do you see either "sync glitches" or "frames dropped" increase? Those two numbers should really catch all types of judder due to rate mismatch (or so the theory goes at least).

A "sync glitch" means that two frames are presented with a distance (in time) that deviates significantly from the average distance between frames. In your case most frames are presented with 40 ms distance (2 display cycles) but because there is a mismatch in rates (41.7 ms vs 40 ms), every now and then frames are presented with a 60 ms distance which is interpreted as a "sync glitch". You may or may not see this depending on the contents of the video.

Frames are dropped only when the fps of the video in is larger than the display refresh rate (e.g. 60 fps @ 50 Hz or more commonly perhaps 50 fps @ 49.9... something, i.e. a simple inaccuracy).

-A

ar-jar
8th October 2009, 19:48
Actually it is more closely to the 40 seconds. 23,976 / 24 approx every 1000th frame and 24 fps per seconds 1000 / 24 is approx 41 seconds.

If you are running 23,976 @ 48 Hz, which is perhaps more commonly attainable than 24 Hz, you would get a "half-glitch" every ~20 seconds as the frame presentation time is only shifted one display cycle ~0.5 frame cycles at each time. -A

STaRGaZeR
8th October 2009, 21:38
Reclock configured to "AUTO" is fine too, and no need to mess around when you change configs/refresh rates..

And you shouldn't get a glitch every 1-15 secs surely. Mayeb ar-jar can confirm, but I'd have though it was only approximately every 1000 secs, minus a "safe margin" of maybe 10% or so?

AUTO plays 25fps content as 25,714fps, so it's a no go. Also, I don't change refresh rates or anything, and nearest integer works just fine with every frame rate I use.

I've measured it. Each 23,976-->24 glitch happens every ~8,5s at 120Hz.

ar-jar
8th October 2009, 22:08
I've measured it. Each 23,976-->24 glitch happens every ~8,5s at 120Hz.

And that's a "1/5:th glitch" instead of a "full glitch" which you get at 24 Hz display refresh. The present timing is shifted 8.3 ms each time instead of 41.7 ms (but it is shifted 5 times as often). I'm not really sure which is better. What do you guys think? -A

Casshern
8th October 2009, 22:29
Well, i think you mean the same thing. Let me explain a little more:
1) The vsync code is executed during the vblanking interval at display refresh frequency
2) From my observations it draws the osd, the stats screen, flips the next buffer, calculates the next vsync flip time and invokes the shaders (
3) I think some of the stuff should not be done in the vblank at all, some stuff should probably be done in a different order.

a) the shaders do not have to be done at screen refresh rate. Obviously they should be done on the frames coming from the decoder. So when your display refresh rate is twice the movie frame rate, at the moment, the shaders are applied twice for every movie frame. Unnecessary because the second frame displayed is identical.
b) the osd should be applied after all vsync code is finished (buffers are flipped), maybe even outside the vsync code (as a tear in the stats is not important, but of course the tearing bar must be done in the vsync code) and should be optimized to be faster. Its eating 10-20 scanlines at 1920x1080 with a 118Mhz pixel clock.
c) There is a bug in the vsync code that when the rendered image is completely black and the code is actually finished to fast, theres a problem as the tearing bar begins to judder. This is normally not important as after all one is not watching the tearing bar - but with the gothcode i had some audio dropouts. This might also have to do with the order in which the code does its stuff.
d) In principal it shouldn't matter if the dxva engine, the shaders etc use up less than 1/23.976023976s (movie frame duration) even on a 47.952 refresh rate display. At the moment this is not the case with the stats- for obvious reason (as the stats are outputted every refresh cycle). This i can understand but without any of that stuff the code should be robust enough to handle these cases. In that respect the beliyaal code still has the edge.

e) the resizer is also applied every refresh cycle - unnecessarily. But fortunatly due to my findings in the other thread at least it now only resizes if screen resolution is different from movie res. For some experiencing tearing on slower cards they could try a faster algo (bilinear - ugly but maybe the tearing is gone)

In summary - the gothcode is getting better, for me the beliyaal code still has some tiny edge. But if ar-jay continues at his pace, i have no doubt the gothcode will soon be better in all respects.

regards,

Casshern

Interesting, I think I understand the buffering more from these observations.

So it is buffering the frames before it hits the shaders, rather than the final processed output (incl OSD stats)?
If that was buffered then changing shader options you'd think shouldn't affect the vsync offset interaction as the final frame would always be ready to go?

I wonder if changing this approach would help?
Might introduce some tiny lag for the OSD (after pressing ctrl+j)... maybe some other side effects as well...

Jong
8th October 2009, 23:04
AUTO plays 25fps content as 25,714fps, so it's a no go. Also, I don't change refresh rates or anything, and nearest integer works just fine with every frame rate I use.If it works for you fine. But there is somethig wrong if AUTO is playing @25.714 with 50Hz refresh. Screenshot?

Jong
8th October 2009, 23:05
I've measured it. Each 23,976-->24 glitch happens every ~8,5s at 120Hz.Yeah, I had a brainstorm there. ar-jar has explained.

pirlouy
8th October 2009, 23:53
Are you saying you see judder when the OSD is smooth? And you are sure it is not encoded into the video?
I understand that you have reset the statistics right before the screeshot. When you see the judder, do you see either "sync glitches" or "frames dropped" increase? Those two numbers should really catch all types of judder due to rate mismatch (or so the theory goes at least).
Yes; even if OSD shows great things, video is not smooth. When I watch graph, I can easily see this graph is not smooth.
Whereas the green line looks smooth on 60Hz !

I have reset stats just after the beginning. After that, OSD does not show glitches or dropped frames.


If so, what is your display? Does it truly support 50Hz? Your display could be adapting the signal to an internal 60Hz rate.
It's a Samsung 40B530. It can display 24, 50, 60 Hz. I use a dvi/hdmi adapter for graphic card. I use Windows 7 RC1.

But, is it normal not to find graph smooth ?

webs0r
8th October 2009, 23:59
Are you saying you see judder when the OSD is smooth? And you are sure it is not encoded into the video?

If so, what is your display? Does it truly support 50Hz? Your display could be adapting the signal to an internal 60Hz rate.

One way to help see judder that is encoded into the video it to use the freshly fixed framestep on a slow panning scene.

For example you'll see a signpost or island or whatever reference point move a similar distance each frame, and then in one framestep it will jump a larger (or shorter) distance. That is judder in the video itself. And then you cry... :(

STaRGaZeR
9th October 2009, 00:19
And that's a "1/5:th glitch" instead of a "full glitch" which you get at 24 Hz display refresh. The present timing is shifted 8.3 ms each time instead of 41.7 ms (but it is shifted 5 times as often). I'm not really sure which is better. What do you guys think? -A

If the glitches are less noticeable, that's better even if there are more of them.

If it works for you fine. But there is somethig wrong if AUTO is playing @25.714 with 50Hz refresh. Screenshot?

With 120Hz, dunno about 50Hz :)

http://thumbnails3.imagebam.com/5163/9b863551621288.gif (http://www.imagebam.com/image/9b863551621288)

webs0r
9th October 2009, 02:19
Looks like a reclock bug? Maybe someone should get James to have a look at it?

25 fps @ 50 Hz auto works fine (stays at 25).

25 fps @ 120 Hz hmm what should happen.. slow down 4% to 24 fps?

STaRGaZeR
9th October 2009, 02:31
Looks like a reclock bug? Maybe someone should get James to have a look at it?

25 fps @ 50 Hz auto works fine (stays at 25).

25 fps @ 120 Hz hmm what should happen.. slow down 4% to 24 fps?

Since it can't be synced to anything, it should play at original speed, 25 fps. 24 fps only if PAL SpeedDown is selected.

Jong
9th October 2009, 10:31
I have heard of this type of problem before. Try resetting the Reclock timing database (it should be reset whenever drivers are updated). It will probably sort it. If not, post on the Slysoft forum (http://forum.slysoft.com/forumdisplay.php?f=85). It is the kind of thing James will normally look at reasonably quickly and it may be causing other less obvious oddities!

Jong
9th October 2009, 10:33
Yes; even if OSD shows great things, video is not smooth. When I watch graph, I can easily see this graph is not smooth.
Whereas the green line looks smooth on 60Hz !
60Hz or 50Hz? If one works smoothly and not the other TVs often only have one true internal refresh rate and converting internally often introducing judder.

pirlouy
9th October 2009, 12:02
When the TV is in 50 Hz, graph is not smooth.
When the TV is in 60 Hz, graph is smooth.
When the TV is in 24 Hz, graph looks smooth.

By "smooth graph", I mean the green line moves smoothly from right to left. I don't talk about form of this line, but smooth move...

Thank you Jong for this information. I didn't know this behavior. So, in my case, this 50Hz mode is unusable. I'm a bit disappointed, but I prefer to know instead of believing all works whereas it's not the case.

And 24Hz seems to have problem too (audio lag); not really joyful news, but at least, I won't bother to have sync in these refresh rates. But I'll try to do some tests in 24Hz nevertheless. :)

Jong
9th October 2009, 12:09
Out of interest, what is the full model number of the TV? This should clear up the issue.

pirlouy
9th October 2009, 12:46
It's a Samsung LE40B530 (I think it is also called LN40B530). It's not a "down-market" product. I think a lot of TV should be affected then.

Jong
9th October 2009, 13:03
The LE and LN bit is key. Where did it come from? I admit to not being an expert on that TV, but the US version should be "LN" and have a native 60Hz rate, the LE should be the European Variant and have 50Hz native rate. It looks like you have a US model?! :confused:

ar-jar
9th October 2009, 13:51
Well, i think you mean the same thing. Let me explain a little more:
1) The vsync code is executed during the vblanking interval at display refresh frequency
2) From my observations it draws the osd, the stats screen, flips the next buffer, calculates the next vsync flip time and invokes the shaders (
3) I think some of the stuff should not be done in the vblank at all, some stuff should probably be done in a different order.


Hello Casshern and thanks for the input. A few comments:

Not sure what you mean by the "vsync code". What happens is that at vsync offset milliseconds before the vsync, the "Paint" method is called with a new frame. After that the Paint method executes the shaders, add subtitles, draw the OSD (and perhaps a couple of other things that I now forget sitting on a train). Having done that, it calls IDirect3DDevice9::Present to tell the gfx board that there is a new frame rendered. The gfx driver then replaces the contents of the front buffer with the contents of the next back buffer in line during the next vblank. All this should happen in good time before the vsync to avoid tearing caused by too late buffer flips. The time available to execute the above is vsync offset.

Having added the shortcut <ctrl><alt><up- or down-arrow> to modify vsync offset it is now easy to see how tearing is affected by early and late calls of Paint. At least in my set-up the position of the tearing is pretty much proportional to vsync offset and it vanishes entirely at 15 ms vsync offset @ 50 Hz. This obviously varies with boards and drivers.


a) the shaders do not have to be done at screen refresh rate. Obviously they should be done on the frames coming from the decoder. So when your display refresh rate is twice the movie frame rate, at the moment, the shaders are applied twice for every movie frame. Unnecessary because the second frame displayed is identical.


This is unnecessary but then again, some decoders like both the CyberLink and the NVidia MPEG2 decoders deiiver 50/60 fps in DXVA mode. I think the code should be able to handle that at all times.


b) the osd should be applied after all vsync code is finished (buffers are flipped), maybe even outside the vsync code (as a tear in the stats is not important, but of course the tearing bar must be done in the vsync code) and should be optimized to be faster. Its eating 10-20 scanlines at 1920x1080 with a 118Mhz pixel clock.


See above, the OSD must be rendered onto the surface that is to be renderered. But there should be plenty of time to do that with a large enough vsync offset.


c) There is a bug in the vsync code that when the rendered image is completely black and the code is actually finished to fast, theres a problem as the tearing bar begins to judder. This is normally not important as after all one is not watching the tearing bar - but with the gothcode i had some audio dropouts. This might also have to do with the order in which the code does its stuff.


Since the rendering is a simple blt of a texture or a surface, that part of the algorithm should be totally "color blind". I have one blu-ray file that has similar issues. I believe I'm getting bad samples into the renderer from the decoder but I haven't single-stepped through them to analyze them more. Erroneous time stamps could upset the timing of the video renderer and perhaps upset the audio when it is matching rate (when using the Sync Video option). I haven't yet seen or heard this myself. Do you have a file that causes this to happen that I could test?


d) In principal it shouldn't matter if the dxva engine, the shaders etc use up less than 1/23.976023976s (movie frame duration) even on a 47.952 refresh rate display. At the moment this is not the case with the stats- for obvious reason (as the stats are outputted every refresh cycle). This i can understand but without any of that stuff the code should be robust enough to handle these cases. In that respect the beliyaal code still has the edge.


As I hinted above, I don't think it is a robust solution if it can't handle 50/60 fps. To the extent it can't, I think I should address core problems and not try to get something to work for 24/25 that doesn't work for the higher fps:es. In terms of prioritizing my time that is.


e) the resizer is also applied every refresh cycle - unnecessarily. But fortunatly due to my findings in the other thread at least it now only resizes if screen resolution is different from movie res. For some experiencing tearing on slower cards they could try a faster algo (bilinear - ugly but maybe the tearing is gone)


Again, I aim for full 50/60 fps functionality and are not eager to implement special solutions for 24 Hz. I may re-evaluate this later but that's how I feel now.


In summary - the gothcode is getting better, for me the beliyaal code still has some tiny edge. But if ar-jay continues at his pace, i have no doubt the gothcode will soon be better in all respects.


Thanks. My pace will vary but I'll do my best to arrive at a decent and stable renderer. -A

ar-jar
9th October 2009, 17:30
60Hz or 50Hz? If one works smoothly and not the other TVs often only have one true internal refresh rate and converting internally often introducing judder.

This seems to be true of my Toshiba TV too. It accepts a 47.952Hz input signal but produces nasty judder when inputing 23.796 fps. I have indications that it internally still uses 50 Hz. This has prevented me from watching any 23.796 material with this TV (this was before I realized Reclock actually works quite well with my renderer).

I do have a small Philips 720p TV that I use for testing and that actualy syncs to pretty much anything you throw at it including 47.952, 48, 50, 60 and does it in a "native" way. It's a TV designed by real TV engineers! -A

pirlouy
9th October 2009, 17:55
The LE and LN bit is key. Where did it come from? I admit to not being an expert on that TV, but the US version should be "LN" and have a native 60Hz rate, the LE should be the European Variant and have 50Hz native rate. It looks like you have a US model?! :confused:

I think LN and LE have just some minor differences like 110V for LN and 220V for LE and things like that.

But both models are using 1080p at 60 Hz as native resolution for PC.

Jong
9th October 2009, 18:00
Not according to the spec sheets I have seen on European and US web sites; They show different refresh rates. And indeed it would be mad to sell a TV in PAL countries that cannot display 50Hz smoothly!

Edit: Or are you saying you are using a PC-input, not a normal HDMI input. That might make sense!

pirlouy
9th October 2009, 20:00
No, I use hdmi port.
But more important, good news !

I've understood what was going on. Thank you for having pushed me. I've done tests and I've found where these judders come from ! It was due to... Aero (I'm using W7 RC1) ! :scared:

Indeed, after having tried Overlay mixer (which disables Aero), 50Hz and reclock, there were no judder. So I've tried EVR custom + 50 Hz + reclock + "Disable desktop composition" and... Yes ! No judder anymore;

Beliyaal and some other people had surely seen before, that's why he has implemented this option. Unfortunately, I didn't know what this option really brought so I did not use it.


I've tested 24Hz + reclock + EVR custom + "disable desktop composition" and I don't have audio lags anymore !


So Aero caused judders and audio lags. :/
It's a shame because I like Aero, and each time I'll launch MPC, it will lose some time to disable Aero... But if it's better...

Jong
9th October 2009, 20:23
Great news.

Yes, IMO Aero is a bad idea for really high quality video playback, just another software layer adding latency and inhibiting stand-alone type performance for quality players. Useful for thimgs that don't care for vsync though, like WPF and Flash.

But what you are seeing sounds like a bug/severe oversight - that Aero is re-timing to 60Hz even for a 50Hz display :mad:.

Casshern
9th October 2009, 23:19
I think that the problem is not related to 23.976hz (or 47.952) playback at all. In fact the problem (vsync code taking to long) is getting much worse on 50hz or 60hz displays, which is what the majority of people use. Think about it like this: The vsync code takes time x to do its magic, but the window in that it is possible to do that without tearing is getting smaller and smaller with increasing display refresh rate. In a perfect world this should only scale with input refreshrate (actual movie frame rate) and not the display refreshrate. After all the shaders should handle every decoded frame just once and not do identical work on an identical frame again just because the display is using a higher display refreshrate.

I reckon that the problems people have with tearing especially when using not the most up to date gfx boards (which are fast enough). You are right that changing vsync offset can reduce the problem, but unfortunatly (and i do not really know why) it is not a one size fits all - for different material (translating to different decoder&filter combinations) i had to adjust vsync offset to get a satisfactory result. This is clearly undesirable. Also when using hardware deinterlacing, shaders, resizing, dxva decoding even adjusting vsync offset doesnt help anymore, than one can only choose if it tears at the top or bottom.

As somebody already suggested triple buffering might be a way to ease the problem - just make the flip independent of the stats paint and the shaders. The tearing bar naturally has to be painted by the vsync code. But to me it seems like all the other stuff (Subtitles etc.) should be performed on the decoded frames before being passed to the renderer and no by the vsync code.

I can imagine that might be difficult to implement due to the current architecture and the way evr works- so just take this as a sort of philosophical discussion. If something good comes out of it perfect, if not i helps to understand the renderer better - at least for me.

Black frame problem: Take the BD of "Any given sunday" (there were others which i do not remember off hand) and use the mpc hc dxva decoder - during the title sequence there are a couple of fade to blacks which exhibit the problem. First i thought it was only related to the dxva decoder but recently i had the same issue with an episode of the "big bang theory" which was encoded as an xvid and decoded by the internal software decoder. It might be an optimization by the encoders used which just encode a series of black frames as one black frame with a longer duration. But i was very surprised to see it with an xvid....
I wouldn*t prioritize this thing, as it doesnt impair normal viewing - only once it lead to an audio dropout.

Basically the only worrysome thing is that vsync offset has to be adjusted for different decoder filter combinations. This is as much hassle as adjusting reclocks vsync correction, which gothsync is supposed to make obsolete. So every optimization that speeds up the vsync code would make the window larger and more robust to tearing, leading to less or no need to adjust vsync offset. I wonder why beliyaals code handles this more robustly, is essentially doing the same stuff.... your code can't really be substantially slower as there shoudn't be any heavy computations.... maybe its the sequence.... any ideas....





Hello Casshern and thanks for the input. A few comments:

Not sure what you mean by the "vsync code". What happens is that at vsync offset milliseconds before the vsync, the "Paint" method is called with a new frame. After that the Paint method executes the shaders, add subtitles, draw the OSD (and perhaps a couple of other things that I now forget sitting on a train). Having done that, it calls IDirect3DDevice9::Present to tell the gfx board that there is a new frame rendered. The gfx driver then replaces the contents of the front buffer with the contents of the next back buffer in line during the next vblank. All this should happen in good time before the vsync to avoid tearing caused by too late buffer flips. The time available to execute the above is vsync offset.

Having added the shortcut <ctrl><alt><up- or down-arrow> to modify vsync offset it is now easy to see how tearing is affected by early and late calls of Paint. At least in my set-up the position of the tearing is pretty much proportional to vsync offset and it vanishes entirely at 15 ms vsync offset @ 50 Hz. This obviously varies with boards and drivers.



This is unnecessary but then again, some decoders like both the CyberLink and the NVidia MPEG2 decoders deiiver 50/60 fps in DXVA mode. I think the code should be able to handle that at all times.



See above, the OSD must be rendered onto the surface that is to be renderered. But there should be plenty of time to do that with a large enough vsync offset.



Since the rendering is a simple blt of a texture or a surface, that part of the algorithm should be totally "color blind". I have one blu-ray file that has similar issues. I believe I'm getting bad samples into the renderer from the decoder but I haven't single-stepped through them to analyze them more. Erroneous time stamps could upset the timing of the video renderer and perhaps upset the audio when it is matching rate (when using the Sync Video option). I haven't yet seen or heard this myself. Do you have a file that causes this to happen that I could test?



As I hinted above, I don't think it is a robust solution if it can't handle 50/60 fps. To the extent it can't, I think I should address core problems and not try to get something to work for 24/25 that doesn't work for the higher fps:es. In terms of prioritizing my time that is.



Again, I aim for full 50/60 fps functionality and are not eager to implement special solutions for 24 Hz. I may re-evaluate this later but that's how I feel now.



Thanks. My pace will vary but I'll do my best to arrive at a decent and stable renderer. -A

nijiko
9th October 2009, 23:19
>ar-jar

Hi.
I found a small problem with your GothSync builds while playing videos.
I used 9018 release from your website.
When I push PLAY button or STOP button, the video will lag for a second.
The performance of the problem is the sound(audio) was instant repeated.
The Tomason's Svplayer used your Goth codes.
But no problems.
He said he used threads for it.
Can you check it?
Thanks for your work.

Jong
9th October 2009, 23:35
Basically the only worrysome thing is that vsync offset has to be adjusted for different decoder filter combinations. This is as much hassle as adjusting reclocks vsync correction, which gothsync is supposed to make obsolete. ...I can assure you that in D3D mode (so triple buffered) there is no need to adjust vsync for different combinations, certainly in Reclock. NB: You must not use "D3D fullscreen GUI support" though, that disables the exra buffering.

I know some just do not like D3D; I'm not sure why. People put up with it for games and eliminating judder is even more important for video IMO. Yes, there are some limitations - no right click menu for example, but when watching a movie I never miss that. Disabling D3D is something I do for testing now and again.

Maybe if ar-jar gets a triple buffered non-D3D renderer going that will offer the best of both worlds (PDVD can do it), but for now a huge number of problems go away if you could live with the small restrictions imposed by D3D.

ar-jar
10th October 2009, 00:07
Maybe if ar-jar gets a triple buffered non-D3D renderer going that will offer the best of both worlds (PDVD can do it), but for now a huge number of problems go away if you could live with the small restrictions imposed by D3D.

I just did a test with triple buffering in non-exlusive mode. The tearing is the same unfortunately. And one would have to modify the resizing code for windowed mode when using triple buffering (it requires another flip method which doesn't support resizing in the same way. I did the tests with faulty resizing in windowed mode.)

I agree that the only way that reliably seems to eliminate tearing is exclusive mode fs w/o menu support. Or large sync offsets in my case (> 15 ms @ 50 Hz). It would of course be fairly easy to implement a variable default sync offset of say 3/4 of a display cycle. i have to check if it produces the desired results... -A

Jong
10th October 2009, 01:16
Might need to get a bit more complex than 3/4s of a cycle.

I think that will work for 50Hz and 60hz (does here), but @120Hz I don't think it would (and 96Hz is tight). You might need to ensure a minimum of say 4ms upper-end margin (unless that would make the offset <4ms from the bottom of course!).

What is also needed IMO is some more granularity, so don't target whole numbers of ms.

Casshern
10th October 2009, 10:22
Did not know that about the "d3d fullscreen gui support". But i do remember that without that option it was a real pain to use mpc hc in fullscreen mode. But will try later anyway! Thanks for the tip....

I can assure you that in D3D mode (so triple buffered) there is no need to adjust vsync for different combinations, certainly in Reclock. NB: You must not use "D3D fullscreen GUI support" though, that disables the exra buffering.

I know some just do not like D3D; I'm not sure why. People put up with it for games and eliminating judder is even more important for video IMO. Yes, there are some limitations - no right click menu for example, but when watching a movie I never miss that. Disabling D3D is something I do for testing now and again.

Maybe if ar-jar gets a triple buffered non-D3D renderer going that will offer the best of both worlds (PDVD can do it), but for now a huge number of problems go away if you could live with the small restrictions imposed by D3D.

ar-jar
10th October 2009, 22:14
I ran some tests on how much sync offset that is required to get tearing-free rendering with the different resizers available in MPC-HC. The more complex the resizer, the more sync offset is required to avoid tearing. There seems to be a significant difference between the simple resizers and the more complex ones. See the latest post on my blog (http://www.ostrogothia.com/video/) for the results.

Keiyakusha
10th October 2009, 22:31
I ran some tests on how much sync offset that is required to get tearing-free rendering with the different resizers available in MPC-HC. The more complex the resizer, the more sync offset is required to avoid tearing. There seems to be a significant difference between the simple resizers and the more complex ones. See the latest post on my blog (http://www.ostrogothia.com/video/) for the results.
These PS resizers works only using shaders? If so, software equivalent probably will be better... Maybe it worth to add them?

Edit: Personally I prefer to use lanczos or spline in ffdshow instead of resizing in MPC... By the way, as i remember from MPC-HC thread, Bicubic resizer was somewhat broken. Don't know if it fixed already, probably no.

ar-jar
10th October 2009, 22:58
These PS resizers works only using shaders?.

Yes, the resizers labeled "PS 2.0" use shader code. But then again, there is no significant difference between the non-shader and the shader version of the bilinear resizer.

webs0r
11th October 2009, 00:44
Checked your blog update, wow that bicubic PS 2.0 resizer is a time hog... It does depend on the graphics card though. I get vastly different results on an ATI 4550 vs. an Nv 8800GT.

This was why I was so keen to get the MPC trunk merged that had the "don't resize if source res = output res" enhancement. Has it been now? I think it has. At least this applies when you are watching at 100% res.

I also use spline resizing in ffdshow, and suggest this resize (or a diff CPU resize) for people that have the CPU% to spare. It takes the variability out of your sync offset as Casshern was describing (and to me, gives a sharper upsampling of the image vs bicubic without too many artifacts). Resizing a 720->1080 on the gpu is going to take longer than 288->1080. (Unless you can triple buffer the resize & other shader work, which you would expect would allow you to keep a constant offset, regardless of different source content by having the final frame always ready to just flip to). In fact a strategy now could be to move everything as much as possible to the CPU. Incidentally, I have ffdshow doing the RGB32 conversion as well.

You would think a good implementation of a complex resizer on GPU would also be faster than CPU... but I don't know... Someone would have to write it, or perhaps re-use work from some of those GPU avisynth plugins. But I really think a buffering system for the final frame needs to be in place to negate the time variation of selecting different shader combinations.

Also can the OSD drawing be made any faster?
This explains the difference in behaviour I got with 1x ctrl-J vs 2x ctrl-j with 60 fps content.
It needs to be as minimal impact as possible so that it doesn't skew the results of what you are seeing. If it is causing glitches because it takes too long to draw when you watch video without the OSD everything is fine, then the OSD isn't representing what is happening well :)
Maybe an idea is to review every bit of info on that screen and just isolate the most key ones and use that as the 1x ctrl-J option.
Could have a more verbose option later in the ctrl-J selection.
Unfortunately I'm guessing making the graph (which is so useful) takes time as well...

Or you could put the stats elsewhere (e.g. a window) running on a diff thread?

ar-jar
11th October 2009, 07:38
This was why I was so keen to get the MPC trunk merged that had the "don't resize if source res = output res" enhancement. Has it been now? I think it has. At least this applies when you are watching at 100% res.


When the resolutions are identical, then a nearest neighbor resizer is always used. This is the simplest resizing "filter" (see http://msdn.microsoft.com/en-us/library/ee416649(VS.85).aspx) and as fast as it gets I guess while still working with textures (and since gaming drives the gfx board architectures, I would guess that texturing is a very efficient operation). Identical code is run if you manually choose this filter type in the Options.


You would think a good implementation of a complex resizer on GPU would also be faster than CPU... but I don't know... Someone would have to write it, or perhaps re-use work from some of those GPU avisynth plugins. But I really think a buffering system for the final frame needs to be in place to negate the time variation of selecting different shader combinations.


I briefly tried triple buffering but it didn't make much difference. See an earlier post on this thread. What makes a huge difference is exclusive mode full-screen ("D3D") w/o GUI support.

As you can see from the MSDN page there are nowadays more complex "ready-made" resizing filters available that hopefully have been optimized as they are part of DirectX. They may not work on older boards but might be nice for those of us with somewhat newer boards. I'll see if I can throw some of those in later.


Also can the OSD drawing be made any faster?


I ran some more tests. The full OSD requires an additional 2 ms or so of sync offset (drawing time) on my low-end ATI board. Not an eternity but could make or break a tight schedule. The tearing test bars are drawn with CPU code. Conceivably you could also draw the OSD with the CPU but that's not my priority right now. There would also be some ugliness as the samples haven't been resized while accessible for the CPU.

Casshern
12th October 2009, 12:16
Yeah, the more complex shaders take more time. This also explains one reason for having to adjust sync offset for different material/decoder/filiter chains - if the movie has to be resized (in my case, when its not 1920x1080) a shader other than bilinear is used. The time it takes for that scaler is probably roughly proportional to the source resolution-> ergo you have to adjust vsync offsets for different files.

If there is anyway to decouple resizing from screen refresh (which adds additional computational burden if screen refresh > movie frame rate), it would probably eliminate the tearing (or fiddling with vsync) on most systems.

Also a triple buffer might help here to (still have to try d3d mode without gui support - just cant bring myself to endure that again). I wonder if a algo with good recovery from overlong vsync might be another solution if it has a way to know about repeated frames. Imagine this:
1) resizer+all other vsync code takes to long when screen refresh > movie frame rate
2) now the code checks at which scanline the display is, if its to late (scanline outside vsync area), we do NOT flip the buffers on the condition that the frame is a repeat of last movie frame
3)we abort the current operation and begin immediately on the the next frame vsync code (even though we are not in the vsync area).
4) This makes the complete scanline time = total scanlines - abort scanline + vsync area available to the vsync code for the next real movie frame. Which should be plenty enough -> so that it should sync again to tearing free playback.
5) now just flip when reaching the normal flip position

Of course best would be not to work on the duplicate frames at all....
I ran some tests on how much sync offset that is required to get tearing-free rendering with the different resizers available in MPC-HC. The more complex the resizer, the more sync offset is required to avoid tearing. There seems to be a significant difference between the simple resizers and the more complex ones. See the latest post on my blog (http://www.ostrogothia.com/video/) for the results.

Casshern
12th October 2009, 12:43
One other thing to improve the renderer.

One of the main problems of evr is that one has to use the YV12 chroma upsampling shader for proper chroma interpolation. I can't really imagine that microsoft still hasn't adressed this problem somehow in EVR. Especially since they did a complete makeover for their new media foundation in Win7. Maybe there is a new (or even old) switch to get proper chroma upsampling, without having to use the shader.

regards,

Casshern

When the resolutions are identical, then a nearest neighbor resizer is always used. This is the simplest resizing "filter" (see http://msdn.microsoft.com/en-us/library/ee416649(VS.85).aspx) and as fast as it gets I guess while still working with textures (and since gaming drives the gfx board architectures, I would guess that texturing is a very efficient operation). Identical code is run if you manually choose this filter type in the Options.



I briefly tried triple buffering but it didn't make much difference. See an earlier post on this thread. What makes a huge difference is exclusive mode full-screen ("D3D") w/o GUI support.

As you can see from the MSDN page there are nowadays more complex "ready-made" resizing filters available that hopefully have been optimized as they are part of DirectX. They may not work on older boards but might be nice for those of us with somewhat newer boards. I'll see if I can throw some of those in later.



I ran some more tests. The full OSD requires an additional 2 ms or so of sync offset (drawing time) on my low-end ATI board. Not an eternity but could make or break a tight schedule. The tearing test bars are drawn with CPU code. Conceivably you could also draw the OSD with the CPU but that's not my priority right now. There would also be some ugliness as the samples haven't been resized while accessible for the CPU.

starla_
12th October 2009, 19:37
One other thing to improve the renderer.

One of the main problems of evr is that one has to use the YV12 chroma upsampling shader for proper chroma interpolation.

http://msdn.microsoft.com/en-us/library/ms698989(VS.85).aspx Might be helpful.