Log in

View Full Version : MPC-HC tester builds for internal renderer fixes


Pages : 1 2 3 4 5 6 7 [8] 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

Jtacdf
24th August 2011, 13:39
8:5 is also called 16:10 or vice versa.

JanWillem32
25th August 2011, 08:37
I've edited the renderer settings menus. It was very long overdue this time, and I actually wanted to edit some more things. Because I'm pretty bad at editing these menus, please carefully check if things work. There weren't any big changes in the renderer this time.
New features in the renderer settings menus:
-Better descriptions of renderer features in the menus
-Using 8-bit RGB surfaces disables some of the more advanced renderer features.
-Using any of the 3 other surface types enables internal conversion to linear RGB.
-Added a basic chroma up-sampling fix for ATI hardware. This menu should gray out during playback when using video cards of other brands. Other than that, it doesn't have any automatic detection for interlaced, 4:4:4, 4:2:2 or 4:2:0 inputs. This item is intended as something temporary, no OSD messages are provided.
-Linear conversion on the video input can be disabled, to allow the user to insert it manually. (For example when combining color controls.) The chroma up-sampling fix can not be enabled together with this option (the menu should gray out properly).
-Added several different sizes as options for the color management's lookup table. As an effect, files have to be renamed to work with this revision. Replace "low" by "64", "med" by "128" and "high" by "256" to make previous files compatible. Select the correct size of the lookup table in the menu to prevent defaulting to 64³.

Superb
29th August 2011, 08:27
I wonder... What are the known issues of your build vs. the current trunk of MPC-HC? (i.e. did you introduce new bugs?)
I've read the first post, and the only thing I see is:
"Solve multi-threading issues." - originally created by your build? or the vanilla one?

CruNcher
29th August 2011, 10:28
@Jan i moved to Win7 now

Can all this be improved http://forum.doom9.org/showpost.php?p=1522467&postcount=18094 ?

the Vsync stuff seems to be a big problem with Aero (DWM) :(

Your build doesn't show this with EVR Custom as by default Vsync is off, but it has the same problematic with this 29.xxx sample in terms of flickering on EVR sync and Slideshow on EVR Custom (and when moving the window it updates slow without Aero and much faster with Aero) :(

This is the problem that started me off to test all this http://forum.doom9.org/showpost.php?p=1522460&postcount=5126 the sample is also included

Also non of the Floating point, Half Point, RGB Rendering stuff would work on SB all these options endup in a black screen.

Hera
29th August 2011, 20:05
There is something wrong with your GPU dude. Also V-Sync and Aero seems redundant.

Somewhat on-topic, does ATi/AMD GPU DXVA performance depend on core/shader/ram speeds or is it separate like PureVideo HD?

Jtacdf
29th August 2011, 21:08
@CruNcher
Most of the rendering settings works for me on sandy bridge. Are you using a nvidia gpu or intel igp?
Anyway, your sample seems to be a pal interlaced 25fps video. Might the flickering you see be the interlace line? It looks pretty smooth to me when hardware deinterlacing on my ati gpu is enabled.

@Hera
If you are just decoding video without any PP, then DXVA performance should not be affected by any core/shader/ram speeds of your gpu as it has a dedicated silicon.
However, if options in CCC video settings like advanced quality, deinterlacing, etc, are enabled, they will use shaders on your gpu.
The more shaders you have on your gpu, the more options you can use without affecting smooth playback.

@JanWillem32
With you adding ati video chroma upsampling fix as a rendering option, setting it to any 4:2:0 option, seems to be the similar as "special 4÷2÷0 to 4÷2÷2 intermediate cubic B-spline5 chroma up-sampling" shader?
With any 3 surface type enabling internal linear RGB output, it makes "#define LinearRGBOutput 1" redundant?
IMHO, you should have some documentation on your tester build so that users (like me) do not get confuse with all the rendering options.

JanWillem32
29th August 2011, 22:35
@Superb: I did indeed create a few bugs. While working on the VSync items, I broke several items/modes several times because of various reasons. The new resizers don't support rotation (flipping does work). The bars in the fullscreen windowed mode shrink the window size, instead of overlapping the window (not an issue in the exclusive mode). And of course, big renderer changes also imply new issues. This thread has grown to this size mostly because of all the bug reports and fixes.
The trunk build has its share of errors. It's mostly that the main Paint cycle of the renderer has poor performance and lacks a decent rendering queue. I've written a lot of new&borrowed code to fix those things.
The multi-threading issues were already there, and are very hard to solve.

@CruNcher: The internal VSync code is hard to correct. I've been editing it today again, and it's still problematic.
The default setting of all VSync options unticked simply enables the 4-frame queue of the presenter, with a device VSync setting of presenting a frame at least one refresh cycle. It works perfectly well for both windowed mode with desktop composition enabled and the exclusive mode, with a little bit of help in timing frames to refresh cycles in an ordered manner. The only problem is the windowed mode without desktop composition, as the device can't deal with the large distortion in the timing in this mode on its own.
I'll test your samples a bit more, but files are a bit demanding indeed. The 50 Hz interlaced PAL TV sample has broken time stamps on every frame and the opening and ending data slices are incomplete (I demuxed the file).
Even on my setup I can't get it to render at 50 Hz with the VSync functions enabled.
I wonder why the options for 10-, 16- and 32-bit surfaces won't gray out in your case if the GPU doesn't support it.

@Hera: The decoder is in a separate ASIC. Only the memory and base clocks are shared with the rest of the core, but that's not much of a problem. The bottleneck is always when the shadercore is activated for heavy tasks, such as deinterlacing.

@Jtacdf: I've simply integrated the 1-2-3 shaders with some optimizations and set the output to linearize RGB. The gamma adjustments for the subtitles and on output are automatic.
For a manual setup, disable the linearization step for the video input and set the custom pixel shaders. Enable linear RGB output in the shader that performs the Y'CbCr to RGB color conversion.
Documentation of the functions is indeed a bit hard to do with the renderer settings menu. It doesn't allow extra text to explain anything.

CruNcher
30th August 2011, 11:58
There is something wrong with your GPU dude. Also V-Sync and Aero seems redundant.

Somewhat on-topic, does ATi/AMD GPU DXVA performance depend on core/shader/ram speeds or is it separate like PureVideo HD?

Hmm maybe the Driver who knows but i certainly see these behaviors with MPC-HC in windowed mode with Aero and Vsync ON depending on the FPS of the Input Video with a 60 Hz Refresh rate when Moving the MPC-HC window while playing back this causes the whole move animation and refresh of the MPC-HC window to stutter on Aero (mouse actions are to late rendered latency becomes obvious), it doesn't happen with Jans Builds as Vsync is by default off
and about the Performance it depends how the resources are being used shaders dependency first comes into action when PP is applied either in Direct3D/Direct2D/DirectWrite/Warp/OpenGL/OpenCL/Cuda or Pixelshader(DirectCompute) form though then also memory becomes important as the data additionally created has to be transmitted (bandwidth) and then it depends if the transmit path is short (IGP) or its longer (Discrete) so in general many factors play a role in overall performance.

Williamete,Sandy Bridge, AMD APU, ION1/2 (also IGP solutions on mainboards but it depends on the chipset if it's directly connected or not) have the smallest transfer paths so that overhead is very small and should be especially preferred for example rendering Aero also the Performance of the Decoder should be better and create less CPU overhead then a Discrete path does (see nevcariels DXVA benchmark from Intels SB Decoder compared vs Nvidia on a Discrete Card and even vs the improved VP5) :)



@CruNcher
Most of the rendering settings works for me on sandy bridge. Are you using a nvidia gpu or intel igp?
Anyway, your sample seems to be a pal interlaced 25fps video. Might the flickering you see be the interlace line? It looks pretty smooth to me when hardware deinterlacing on my ati gpu is enabled.

Intel IGP: Core I5-2400 Intel HD2000




@CruNcher: The internal VSync code is hard to correct. I've been editing it today again, and it's still problematic.
The default setting of all VSync options unticked simply enables the 4-frame queue of the presenter, with a device VSync setting of presenting a frame at least one refresh cycle. It works perfectly well for both windowed mode with desktop composition enabled and the exclusive mode, with a little bit of help in timing frames to refresh cycles in an ordered manner. The only problem is the windowed mode without desktop composition, as the device can't deal with the large distortion in the timing in this mode on its own.
I'll test your samples a bit more, but files are a bit demanding indeed. The 50 Hz interlaced PAL TV sample has broken time stamps on every frame and the opening and ending data slices are incomplete (I demuxed the file).
Even on my setup I can't get it to render at 50 Hz with the VSync functions enabled.
I wonder why the options for 10-, 16- and 32-bit surfaces won't gray out in your case if the GPU doesn't support it.

Yes but the 50 hz sample (that one was to show a problem with lav splitter and microsofts DXVA2 decoder, that the internal MPC-HC splitter doesn't show (macroblocking)) was actually not really meant being problematic with this.
The Film sample sample.ts is what creates major problems here with Vsync and other samples Aero rendering problems with Vsync On i guess it's better if i try to visualize the issue (Evr Sync= Flickering, EVR Custom = Slideshow, System Renderer(EVR) = OK) in a Realtime Video i hope that's easier and not so demanding as it was on XP (my first try of doing high performance Desktop Recording DXVA 60 fps directly into H.264 on XP (the overhead from Process Explorer 0.5 ms was also problematic losing some frames) http://mirror05.x264.nl/CruNcher/pperf/) with DXVA1
Have to recheck but with your build it was actually grayed out i meant with the default trunk it was shown and when tried to be used ended up in a black screen.

I will try to encode the issue directly on Intels Encoder of from the Aero Desktop into H.264 :)

Hera
31st August 2011, 05:44
Ah, weird,
Radeon 4250 + anything but 8-bit output = blackscreen now - except for the graph - the graph works showing that Radeon 4250M is lagging with 16-bit ??

Will report more tomorrow.

ForceX
31st August 2011, 06:05
Ah, weird,
Radeon 4250 + anything but 8-bit output = blackscreen now - except for the graph - the graph works showing that Radeon 4250M is lagging with 16-bit ??

Will report more tomorrow.
You seem to need to have a pre-resize shader on for it to work. Use the default pass-through shader as a pre-resize shader.

JanWillem32
31st August 2011, 08:02
I noticed that a few days ago. The bilinear chroma up-samplers were set in the wrong spot in the renderer chain (those are the only ones that are single-pass).

CruNcher
1st September 2011, 19:53
@JanWillem32
I made a Quicksync & Aero powered Video of this EVR Sync & Custom issue with the sample.ts http://mirror05.x264.nl/CruNcher/mpc-hc/

JanWillem32
1st September 2011, 21:30
That's not even close to how badly the internal timing/VSync functions performed in some of my alpha builds.:D
There are a few factors that can cause problems in this case.
One factor is the color format. EVR CP is in your case accepting RGB32, EVR Sync is accepting YUY2.
Both color formats are pretty rare to find in actual video. Most videos are Y'CbCr 4:2:0, these two formats are not.
Common formats for progressive Y'CbCr 4:2:0 video are: YV12, I420/IYUV and NV12. For interlaced video only NV12 is commonly used.
To get surfaces from Y'CbCr 4:2:0 to Y'CbCr 4:2:2 (YUY2) or RGB X:4:4:4 (RGB32), a color conversion step before the mixer is required. That's bound to cost performance and quality compared to the direct input of a compatible format to the mixer. It's even worse if the source is interlaced.
I wonder what is forcing this conversion.
Other factors that count are the integrity of the file container or file streams (your other sample was pretty broken in this aspect), and the load on the renderer itself. The default settings with the trunk build are bad, and the renderer was in a bad state when I first started to work on it.

JanWillem32
3rd September 2011, 05:33
I've added motion adaptive modes to the frame interpolation. The 3 filters are equally (very) heavy, just with different parameters, and should have better quality than the basic form.
I've also been tracing the exit sequences of various parts, and found multiple problems (mostly outside the renderer parts). It will take some time before those are fixed.
Lastly, I added some minor optimizations.

Hera
3rd September 2011, 07:08
EDIT: On Radeon 4250M, performance is better than ION. Frame Interp is way too much processing it seems though.
EDIT: 8-bit locked as well

Can't switch from 8-bit integer surfaces on ION? Expected behavior?

Previous version, with the new menus, allows me to do this.

Found a way to force it - laggs like hell for some reason, seriously like hell. I would have probably had to hard reboot my netbook if I went with 32-bit ...

Last version that appears to work just fine with Output A32B32G32R32F and A16B16G16R16F: 1.5.3.3682
Next version gives a black screen with these and makes system almost completely stuck.
Current version gives picture and makes system almost completely stuck.

I am going to restart and see if that fixes the performance issues.

EDIT: Yeah, performance did go DERP DERP
EDIT: Why am I seeing V-Sync in the Statistics when I have V-Sync disabled?

JanWillem32
3rd September 2011, 10:51
Too many immediate bugs in this build... I've scrapped it.

CruNcher
3rd September 2011, 12:18
@jan this is to funny

http://forum.doom9.org/showthread.php?p=1523337#post1523337

see how i fixed the telecine issues on MPC-HC EVR Custom (with HD2000), bellow that :P

the sample is available here http://home.halden.net/mordor/evil_trees.7z

No go with Hardware Playback though tried virtually everything and Directvobsub causes DXVA2 to fail :(

KoD
3rd September 2011, 12:34
Hi, I've just tried the r3698 x64 build, and it has a very odd behavior:

- EVR Sync mode does not show the video image, but only a white/gray surface, while the audio is playing
- EVR CP mode displays the video image properly only when using 8 bit surfaces, or any of the higher bit surfaces but with "Disable RGB gamma linearization" on, otherwise a gray video is shown instead of the video image, with audio playing normally, and subtitles being rendered on top of the gray video also correctly and on time; unfortunately, disabling the RGB gamma linearization for higher than 8 bit surfaces completely destroys the gray balance of the image, the image is washed out, so the high bit surface modes are actually unusable
- when using EVR CP, it's best to disable "VSync", otherwise playback is not smooth at all; "Alternative Sync" causes awful tearing; "Accurate sync" is the only option that gives smooth playback, but playback is smooth even when no VSync option is enabled at all (VSync is not forced in the Nvidia Control Panel either)
- not using D3D Full-screen mode causes small jerky spikes of the playback, but that's the case with the official MPC build as well

This build can only show the image correctly when using EVR CP with 8 bit surfaces, in D3D full-screen, and with VSync disabled (or AccurateVsync enabled). Any other mode is useless. :(

The resizing algo I'm using is the old bicubic one, so nothing special.

The official MPC build is less picky, and I'm actually running that one in EVR Sync mode (Sync video to display), in full-screen mode for best playback.

I'm using Win7 x64 OS running on a dual-core laptop (so, the CPU is not that powerful to hide any issue like it's the case with more powerful ones), and a 9600M GT GPU.

These builds really need more people to test them. ;)

JanWillem32
3rd September 2011, 21:33
I revised many menu items to fix the latest set of bugs, and in the process I got rid of some useless ones. I rearranged the VSync, jitter and other functions around the standard present method for better performance. The present method for the frame interpolator is still incompatible with "Alternative VSync" and "Flush GPU before VSync". In my case it also requires D3D fullscreen exclusive mode, else frames are dropped (can be seen using the tearing test).

Hera
4th September 2011, 03:07
Well... on the positive side, you fixed the renderer settings.
It still fully kills ION performance with anything but 8-bit surfaces.

CruNcher
4th September 2011, 03:26
@jan

here are some hardware performance data :)



HD2000:

Windows 7 x64 Service Pack 1
Intel(R) HD Graphics Family

D3D9 Surface StretchRect:
D3DFMT_R8G8B8: creating GPU texture failed
D3DFMT_A8R8G8B8: lossy (1-255)
D3DFMT_X8R8G8B8: lossy (1-255)
D3DFMT_A8B8G8R8: creating GPU texture failed
D3DFMT_X8B8G8R8: creating GPU texture failed
D3DFMT_A2R10G10B10: lossy (1-255)
D3DFMT_A2B10G10R10: lossy (1-255)
D3DFMT_A16B16G16R16: lossy (1-255)
D3DFMT_A16B16G16R16F: lossy (1-255)
D3DFMT_A32B32G32R32F: StretchRect failed

D3D9 Surface VideoProcessor:
D3DFMT_R8G8B8: creating GPU texture failed
D3DFMT_A8R8G8B8: lossy (1-255)
D3DFMT_X8R8G8B8: lossy (1-255)
D3DFMT_A8B8G8R8: creating GPU texture failed
D3DFMT_X8B8G8R8: creating GPU texture failed
D3DFMT_A2R10G10B10: lossy (1-255)
D3DFMT_A2B10G10R10: lossy (1-255)
D3DFMT_A16B16G16R16: lossy (1-255)
D3DFMT_A16B16G16R16F: lossy (1-255)
D3DFMT_A32B32G32R32F: lossy (0-0)

DXVA Surface StretchRect:
D3DFMT_R8G8B8: creating GPU texture failed
D3DFMT_A8R8G8B8: lossy (1-255)
D3DFMT_X8R8G8B8: lossy (1-255)
D3DFMT_A8B8G8R8: creating GPU texture failed
D3DFMT_X8B8G8R8: creating GPU texture failed
D3DFMT_A2R10G10B10: lossy (1-255)
D3DFMT_A2B10G10R10: lossy (1-255)
D3DFMT_A16B16G16R16: lossy (1-255)
D3DFMT_A16B16G16R16F: lossy (1-255)
D3DFMT_A32B32G32R32F: StretchRect failed

DXVA Surface VideoProcessor:
D3DFMT_R8G8B8: creating GPU texture failed
D3DFMT_A8R8G8B8: lossy (1-255)
D3DFMT_X8R8G8B8: lossy (1-255)
D3DFMT_A8B8G8R8: creating GPU texture failed
D3DFMT_X8B8G8R8: creating GPU texture failed
D3DFMT_A2R10G10B10: lossy (1-255)
D3DFMT_A2B10G10R10: lossy (1-255)
D3DFMT_A16B16G16R16: lossy (1-255)
D3DFMT_A16B16G16R16F: lossy (1-255)
D3DFMT_A32B32G32R32F: lossy (0-0)

D3D9 Surface speed test:
NV12: upload 277 fps, download 15 fps, trick download failed
YV12: upload 150 fps, download 19 fps, trick download failed
A8R8G8B8: upload 148 fps, download 7 fps, trick download failed

DXVA Surface speed test:
NV12: upload 310 fps, download 15 fps, trick download failed
YV12: CreateSurface failed
A8R8G8B8: CreateSurface failed

A8R8G8B8 Texture speed test:
default: upload 263 fps, download 147 fps
dynamic: upload 261 fps, download 10 fps, trick download 152 fps



9800 GT

Windows XP Service Pack 3
NVIDIA GeForce 9800 GT

D3D9 Surface StretchRect:
D3DFMT_R8G8B8: creating GPU texture failed
D3DFMT_A8R8G8B8: lossy (16-255)
D3DFMT_X8R8G8B8: lossy (16-255)
D3DFMT_A8B8G8R8: creating GPU texture failed
D3DFMT_X8B8G8R8: creating GPU texture failed
D3DFMT_A2R10G10B10: lossy (16-255)
D3DFMT_A2B10G10R10: lossy (16-255)
D3DFMT_A16B16G16R16: StretchRect failed
D3DFMT_A16B16G16R16F: lossy (16-191)
D3DFMT_A32B32G32R32F: StretchRect failed

D3D9 Surface VideoProcessor:
D3DFMT_R8G8B8: creating GPU texture failed
D3DFMT_A8R8G8B8: lossy (16-255)
D3DFMT_X8R8G8B8: lossy (16-255)
D3DFMT_A8B8G8R8: creating GPU texture failed
D3DFMT_X8B8G8R8: creating GPU texture failed
D3DFMT_A2R10G10B10: lossy (16-255)
D3DFMT_A2B10G10R10: lossy (16-255)
D3DFMT_A16B16G16R16: lossy (16-255)
D3DFMT_A16B16G16R16F: lossy (16-191)
D3DFMT_A32B32G32R32F: lossy (16-192)

DXVA Surface StretchRect:
D3DFMT_R8G8B8: creating GPU texture failed
D3DFMT_A8R8G8B8: lossy (16-190)
D3DFMT_X8R8G8B8: lossy (16-190)
D3DFMT_A8B8G8R8: creating GPU texture failed
D3DFMT_X8B8G8R8: creating GPU texture failed
D3DFMT_A2R10G10B10: lossy (16-191)
D3DFMT_A2B10G10R10: lossy (16-191)
D3DFMT_A16B16G16R16: StretchRect failed
D3DFMT_A16B16G16R16F: lossy (16-178)
D3DFMT_A32B32G32R32F: StretchRect failed

DXVA Surface VideoProcessor:
D3DFMT_R8G8B8: creating GPU texture failed
D3DFMT_A8R8G8B8: lossy (16-190)
D3DFMT_X8R8G8B8: lossy (16-190)
D3DFMT_A8B8G8R8: creating GPU texture failed
D3DFMT_X8B8G8R8: creating GPU texture failed
D3DFMT_A2R10G10B10: lossy (16-191)
D3DFMT_A2B10G10R10: lossy (16-191)
D3DFMT_A16B16G16R16: lossy (16-190)
D3DFMT_A16B16G16R16F: lossy (16-178)
D3DFMT_A32B32G32R32F: lossy (16-178)

D3D9 Surface speed test:
NV12: upload 440 fps, download 554 fps, trick download failed
YV12: upload 76 fps, download 17 fps, trick download failed
A8R8G8B8: upload 431 fps, download 262 fps, trick download failed

DXVA Surface speed test:
NV12: upload 442 fps, download 555 fps, trick download failed
YV12: upload 76 fps, download 17 fps, trick download failed
A8R8G8B8: upload 429 fps, download 261 fps, trick download failed

A8R8G8B8 Texture speed test:
default: upload 462 fps, download 414 fps
dynamic: upload 535 fps, download 9 fps, trick download 258 fps

460 GTX:

Windows XP Service Pack 3
NVIDIA GeForce GTX 460

D3D9 Surface StretchRect:
D3DFMT_R8G8B8: creating GPU texture failed
D3DFMT_A8R8G8B8: lossy (16-210)
D3DFMT_X8R8G8B8: lossy (16-210)
D3DFMT_A8B8G8R8: creating GPU texture failed
D3DFMT_X8B8G8R8: creating GPU texture failed
D3DFMT_A2R10G10B10: lossy (16-210)
D3DFMT_A2B10G10R10: lossy (16-210)
D3DFMT_A16B16G16R16: StretchRect failed
D3DFMT_A16B16G16R16F: lossy (16-255)
D3DFMT_A32B32G32R32F: lossy (16-255)

D3D9 Surface VideoProcessor:
D3DFMT_R8G8B8: creating GPU texture failed
D3DFMT_A8R8G8B8: lossy (16-210)
D3DFMT_X8R8G8B8: lossy (16-210)
D3DFMT_A8B8G8R8: creating GPU texture failed
D3DFMT_X8B8G8R8: creating GPU texture failed
D3DFMT_A2R10G10B10: lossy (16-210)
D3DFMT_A2B10G10R10: lossy (16-210)
D3DFMT_A16B16G16R16: lossy (16-210)
D3DFMT_A16B16G16R16F: lossy (16-255)
D3DFMT_A32B32G32R32F: lossy (16-255)

DXVA Surface StretchRect:
D3DFMT_R8G8B8: creating GPU texture failed
D3DFMT_A8R8G8B8: lossy (16-190)
D3DFMT_X8R8G8B8: lossy (16-190)
D3DFMT_A8B8G8R8: creating GPU texture failed
D3DFMT_X8B8G8R8: creating GPU texture failed
D3DFMT_A2R10G10B10: lossy (16-191)
D3DFMT_A2B10G10R10: lossy (16-191)
D3DFMT_A16B16G16R16: StretchRect failed
D3DFMT_A16B16G16R16F: lossy (16-178)
D3DFMT_A32B32G32R32F: lossy (16-178)

DXVA Surface VideoProcessor:
D3DFMT_R8G8B8: creating GPU texture failed
D3DFMT_A8R8G8B8: lossy (16-190)
D3DFMT_X8R8G8B8: lossy (16-190)
D3DFMT_A8B8G8R8: creating GPU texture failed
D3DFMT_X8B8G8R8: creating GPU texture failed
D3DFMT_A2R10G10B10: lossy (16-191)
D3DFMT_A2B10G10R10: lossy (16-191)
D3DFMT_A16B16G16R16: lossy (16-190)
D3DFMT_A16B16G16R16F: lossy (16-178)
D3DFMT_A32B32G32R32F: lossy (16-178)

D3D9 Surface speed test:
NV12: upload 687 fps, download 917 fps, trick download failed
YV12: upload 867 fps, download 21 fps, trick download failed
A8R8G8B8: upload 459 fps, download 366 fps, trick download failed

DXVA Surface speed test:
NV12: upload 688 fps, download 912 fps, trick download failed
YV12: upload 873 fps, download 21 fps, trick download failed
A8R8G8B8: upload 456 fps, download 370 fps, trick download failed

A8R8G8B8 Texture speed test:
default: upload 453 fps, download 526 fps
dynamic: upload 870 fps, download 9 fps, trick download 318 fps


also trying your build currently

Hera
5th September 2011, 00:28
Interesting is the difference between NV ION and Radeon 4250M:

Before the menu changes:
4250M: 16-bit OK, 32-bit NO GO
ION: 16-bit OK, 32-bit OK

After the menu changes:
4250M: 16-bit OK, 32-bit NO GO
ION: 16-bit NO GO, 32-bit KILL ME NOW

Also frame interpolator kills 4250M performance, causes severe ghosting (even of the Info Graph!), and makes things a bit too bright. I cannot compare on ION as only 8-bit and 10-bit seem to work ATM.

The ATI Up-Sampling Fix Options are editable for NV ION.
I think your NVIDIA detection system is broken.

CruNcher
5th September 2011, 13:19
@ Jan
This FFdshow Quicksync Decoder http://forum.doom9.org/showthread.php?t=162442 (done by a Intel Dev) fixes the DXVA2 EVR Custom issue http://mirror05.x264.nl/CruNcher/mpc-hc/ with my sample.ts on Quicksync :)

No Vsync no Exlusive mode just Aero and Quicksync :)

http://img706.imageshack.us/img706/8000/novsyncjustaeroquicksyn.png

ikarad
5th September 2011, 13:42
Where can I post bug for your test build?

JanWillem32
5th September 2011, 14:52
@Hera: Can you test the previous version on the ION with "gamma conversion of video RGB to linear RGB for floating point surfaces" set pre-resize, "gamma conversion of linear RGB to video RGB for floating point surfaces" set post-resize, and subtitles disabled?
The only change in the renderer that could cause a slowdown is that I've forced these two shader passes (plus one extra on the subtitle pass) for 10, 16 and 32-bit surfaces.
The frame interpolator is much too heavy for low-end video cards. Once it starts dropping frames, it only creates artifacts.
I just tested the detection system for the menu option, it correctly grays out during playback in my case. I can't gray the items out before the renderer is active.

@CruNcher: Does everything seem to be fixed by that decoder (such as the color format and timing)?

@ikarad: Here will be fine, as long as it's specific to what I've implemented. The usual bug reports should be posted on trac.

ikarad
5th September 2011, 16:08
@ikarad: Here will be fine, as long as it's specific to what I've implemented. The usual bug reports should be posted on trac.

Thanks
a bug with subrenderer in mpc-hc is specific at your version or not?

Hera
5th September 2011, 16:39
I only see an option to "Disable RGB Gamma Linearization" -
Checking that does allow me to use "16-bit Floating Point Surfaces" without any obvious frame dropping.

Disabling subtitles does nothing.

CruNcher
5th September 2011, 17:19
@Hera: Can you test the previous version on the ION with "gamma conversion of video RGB to linear RGB for floating point surfaces" set pre-resize, "gamma conversion of linear RGB to video RGB for floating point surfaces" set post-resize, and subtitles disabled?
The only change in the renderer that could cause a slowdown is that I've forced these two shader passes (plus one extra on the subtitle pass) for 10, 16 and 32-bit surfaces.
The frame interpolator is much too heavy for low-end video cards. Once it starts dropping frames, it only creates artifacts.
I just tested the detection system for the menu option, it correctly grays out during playback in my case. I can't gray the items out before the renderer is active.

@CruNcher: Does everything seem to be fixed by that decoder (such as the color format and timing)?

@ikarad: Here will be fine, as long as it's specific to what I've implemented. The usual bug reports should be posted on trac.

Nope the timing for that evil_tree sample is still wrong (the timing for my sample.ts is perfect 23.976 like in the screenshot above) compared to Software Playback.
The mixer still shows RGB32 as output and input is NV12 (default) from that FFDshow Quicksync Decoder.

PS: After looking @ the bugs i looked more into efficiency and i almost fall of my chair http://forum.doom9.org/showthread.php?p=1523906#post1523906 ;) (the current overhead is huge)

JanWillem32
5th September 2011, 18:41
@ikarad: You're referring to the failure in the subtitle renderer to combine multiple bitmapped subtitles? I know what's causing it, but it's a bit hard to edit.

The host for the video and subtitle renderers is ISubPic (called by the graph builder). It has subtitle delegates for DirectX 7, DirectX 9 and VSfilter. There's no way I'm going to work on all three of those. I already have permission to remove all internal DirectX 7 items. I'll remove VSfilter from my branch completely once I start working on the subtitle renderer, it should never had a share in ISubPic in the first place. That part of VSfilter is always compiled into MPC-HC because of that, but nothing in the internal code uses it. VSfilter can be preserved by a full split in projects.
Once I only have to edit 2 parts of one subtitle renderer (vector and bitmapped graphics), I'll also break compatibility with all video renderers (again), except the shared renderer and maybe EVR Sync (once I've merged code to it). I'm not going to just edit the subtitle renderer without changing its interfaces, the fundamental flaws are just too big for that.
For reference on what's going wrong, see EVR sync with vector subtitles in my builds. If a subtitle of one line is to be rendered, the subtitle renderer exports a texture with that one line, and hardly any translucent parts around it. If a subtitle of one line in the top and one line in the bottom of the screen is to be displayed, it exports one picture with both lines and a lot of translucent filler in between both lines. The subtitle renderer can't export multiple textures to the video renderer.
If I compare that to the stats screen, it's a different case. When the video window is initialized, the font for the stats screen is rasterized with the correct size, and a texture+geometry set for each glyph are exported to the video card. The stats screen itself generates a sting of text, which is converted to vertex data linked to the set of textures. Only the vertex data is sent to the video card, to order it to blend up to a few hundred textures to an output surface during the rendering of the video. The vertex data is usually under a KB, no additional surfaces or textures are generated. The subtitle renderer regenerates one A8R8G8B8 texture with every frame, so in the case of a 1920×1080 output, it can send a 1920×1080×32÷8÷1024÷1024 = 7.91015625 MB texture with every input frame (it skips rendering if there's no subtitle to display). On top of that, the subtitle renderer also only uses a single-threaded approach to rendering glyphs and only uses the CPU in a pretty dumb way to paint the glyphs onto the subtitle textures.

I know what code I have to edit to get things working properly. However, that will be a major change to the code, and needs a lot of testing. I'm okay with editing superficial parts of the subtitle rendering system, like I've done for a while already. I won't be editing more fundamental parts while "internal renderer fixes" isn't in the trunk.

@Hera: I meant setting two pixel shaders. I suspect the ION can't handle the two or three extra rendering stages for linearizing the gamma internally.

@CruNcher: I've demuxed that evil_tree sample, and the video track is reported as completely broken when muxing it again to MKV (nothing reported during demuxing this time). A mixer input of NV12 is perfectly all right with interlaced Y'CbCr 4:2:0 video like this. I assume the stats screen reports "Mixer format: Input NV12, Output X8R8G8B8"? I thought I removed the setting to display alternative names for formats like "RGB32" some time ago.

nevcairiel
5th September 2011, 18:53
@CruNcher: I've demuxed that evil_tree sample, and the video track is reported as completely broken when muxing it again to MKV (nothing reported during demuxing this time).

mkvmerge has some issues muxing telecined MPEG content, which may account for the breakage.

JanWillem32
5th September 2011, 18:59
Ah, I'll try to detect and remove the pulldown to test. Thanks for the hint. If it's really a problem with pulldown, it's a mixer issue.

ikarad
5th September 2011, 19:07
@ikarad: You're referring to the failure in the subtitle renderer to combine multiple bitmapped subtitles? I know what's causing it, but it's a bit hard to edit.
.
I don't know if it's the same problem but I speak about this problem
https://sourceforge.net/apps/trac/mpc-hc/ticket/48#comment:20

JanWillem32
5th September 2011, 21:29
That's indeed the one I'm referring to.

In regard to the evil trees sample, I've removed the pulldown, the flags were set rather wrong. The damaged parts of the stream are now clearly visible when enforcing a strict mode. That explains why I kept getting pauses during playback with all renderers I tried. The original stream was 24/1.001 fps progressive as 48/1.001 fps weave interlaced.

Xaurus
6th September 2011, 18:05
In regard to the evil trees sample, I've removed the pulldown, the flags were set rather wrong. The damaged parts of the stream are now clearly visible when enforcing a strict mode: http://www.mediafire.com/?5y2wtp5j0pxehu9 . That explains why I kept getting pauses during playback with all renderers I tried. The original stream was 24/1.001 fps progressive as 48/1.001 fps weave interlaced.

JanWillem,

What program did you use to remove the pulldown of the evil_tree sample?
I would like to convert it to progressive and be able to watch it without problems but I've yet to find (free) suitable software.

JanWillem32
6th September 2011, 19:08
That's easy: pulldown.exe.
The command line syntax to re-order the internal fields to 48/1.001 fps weave interlaced is:pulldown.exe input.m2v output.m2v -norff -nopulldown -aspect_ratio 16:9 -drop_frame false -prog_frames p -prog_seq i -tff odd -framerate 23.976This will use a non-strict mode, so the broken parts visible in the version I uploaded will not be so visible, it will simply skip fields instead. Using pulldown.exe doesn't convert the video data itself, so the conversion is lossless (and really fast). Re-encoding the video to a true progressive will be slow and lossy. I don't recommend it, as the damage to the split chroma planes in 4:2:0 video will be quite severe and the damaged parts may may pop up as artifacts during the encoding process.
After using pulldown.exe, I used mkvmerge to combine the tracks again. For the video track, a manual override to 24000/1001 FPS is recommended (mkv has better internal timing than a raw .m2v). For the audio track, I correctly set the delay reported during demuxing. For the small sample it was +5 ms, I believe. The new mkv video played very well at 24/1.001 fps (weave deinterlacing was automatically used by the mixer).
If you need any help with the command line, demuxing or muxing, I'll gladly help.

Xaurus
6th September 2011, 21:24
JanWillem,

Thanks alot for your information. What tools do you use to demux a m2ts or ts file? Do you know if tsMuxer leaves the quality untouched?
The reason I ask is because the original files are in .ts

I have downloaded mkvtoolnix and it seems very nice, but alas it doesn't support ts files so this will be used when packing it together again.

As for pulldown.exe, I found 0.99d, do you know if this is the latest version?

Thanks!

JanWillem32
6th September 2011, 21:44
There are several tools you can use for a standard demux, tsMuxer or eac3to will work fine. A demux doesn't alter any video data, so you don't really have to worry about that. I also used pulldown.exe v0.99d.

janos666
6th September 2011, 22:16
I have a problem with 3709: I can't use your shader to convert the full range RGB to limited range RGB for display output.
I guess it has something to do with the new built-in linearization and gamma weighting steps.

Everything looks fine with zero custom shaders and the optimal renderer settings preset.
1: If I tick the "Disable RGB Gamma Linearization" then it actually looks linearized
2: If I add the level conversion shader the grayscale gets horrible: obviously wrong gradation and oviously wrong levels
- If I do both then it's just worse.

I tried different combinations (ticked and unticked Disable RGB Gamma Linearization combined with custom linearization and/or gamma weighting shaders) but I couldn't find any working set, nor I could fully figure out the nature of the problem.
But I guess you forgot to disable a gamma processing shader with the "Disable RGB Gamma Linearization" and you didn't consider level conversions when you placed these shaders in the chain.


I think you should include a level conversion shader in the menu and place the conversion at an appropriate place to make it work until it's sorted out.

janos666
7th September 2011, 14:19
Ok, problem isolated: It's in you latest pixel shader pack. The limited->full and full->limited shaders are both limited->full.

Hovewer, it still needs some attention. If I want level conversion then I have to disable the built-in linearization step, apply the chroma resizer (ATI fix) as pre-resize and the level conversion as post-resize shaders and add a gamma linearization shader at the end because the gamma weighting step is hard-locked at the very end of the chain.

One solution would be to write a new level conversion shader which works on the linearized RGB data and the other is to place the current level conversion shader at the very end of the chain and place a trigger in the options to activate it.

JanWillem32
7th September 2011, 16:20
Don't worry, I've been planning to add the level conversion shader code to the final pass for a while already. When compressing levels with a post-resize shader, the subtitles, OSD and stats screen won't be converted. On top of that, I've adjusted the random dither to always expect linear RGB input for its noise/edge detection, so any level changes should not be included with that.
The basic layout for the renderer;
8-bit mode: mixer output (obligated), pre-resize shaders, resizing/positioning (obligated), post-resize pixel shaders, subtitle blending, stats screen blending, OSD blending, present (obligated).
10-, 16-, or 32-bit mode: mixer output (obligated), chroma fix + gamma linearization, pre-resize shaders, resizing/positioning (obligated), post-resize pixel shaders, linear gamma subtitle blending, stats screen blending, OSD blending, gamma de-linearization (obligated) + color management + dithering, present (obligated).

I'm also trying to implement a proper way to eliminate resizing/positioning if it's not required. It's a bit of a waste if you're watching a 1080p video with 1:1 pixel mapping to the screen. The resizing/positioning step does properly disable the resizer pixel shader to only use nearest neighbor in that case. There are also special cases for skipping one of two passes when a two-pass resizer is used with only horizontal or vertical resizing.

G_M_C
7th September 2011, 16:56
JanWillem, sorry if this question hes been put before;

You've adapted the renderer to be able to output 10bit. Do you builds also suport 10 bit input, straight to output ?

Qaq
7th September 2011, 17:18
Jan, did you test your builds under XP?
I' stick with XP last few weeks and can't get normal playback since 3329. Just tried 3709 SSE. No matter if I reset renderer settings to default or optimal, its only shows black screen. Thats too sad cause I use XP often and want to follow the development. I'm an ATI (5450) user.

JanWillem32
7th September 2011, 18:57
@G_M_C: Within D3D there are color formats for Y'CbCr and RGB, with various precision and ordering. In reality these all resolve to some device buffer format with parameters set for the device so it can convert the buffer to 32-bit floating point values when sampling from that buffer. The RGB types are usually pretty much the same as the buffer form, and support for over a hundred different configurations for precision and ordering is very common.
The Y'CbCr types are more difficult. Because of chroma down-sampling, the resolution of the Y' channel isn't the same as those for Cb and Cr. The driver simply can't assign a buffer in the same way as for RGB. There's also the issue of color conversion. For each Y'CbCr texture, the video card's driver has to provide methods to convert it to RGB.
Therefore the support for Y'CbCr formats is very limited, there's only NV12 for 4:2:0, YUY2 for 4:2:2 and UYVY for 4:2:2.
Nvidia video cards additionally support YV12 and I420/IYUV for 4:2:0 progressive video. Older Intel video cards are known to not support any 4:2:0 formats. This can all be seen using DXVA Checker.
These formats are all 8-bit, and I don't see driver support coming for any other formats anytime soon.

For VMR-9, the mixer has a few emulation options for supporting some 8-bit Y'CbCr types not supported by the device driver, this is not the case for EVR.
Both mixers will happily skip some internal conversions if the input format is the same as the output format (X8R8G8B8, A2R10G10B10, A16B16G16R16F or A32B32G32R32F). Conversion from one RGB format to another by the mixer alone will often fail, though.
10-bit or better input from RGB surface types isn't much of a problem.
10-bit or better input from Y'CbCr surface types isn't happening with these mixers with the current video card drivers.

It's possible to create a custom mixer to allocate RGB surfaces and write raw Y'CbCr data on it. A custom mixer has to handle all Y'CbCr surface conversions by itself, instead of letting the video card driver do it. That does allow other formats to be used. Examples of custom mixers can be found in Haali Renderer (YUY2 and RGB32 input) and MadVR (several modes).

@Qaq: I don't have a license for Windows XP for either of my computers. Are you sure it's since 3329? I can't even remember what I exactly changed after that one, it's so long ago.
Do you get any warnings or errors (also look at the system logs)? Does it happen in both windowed and exclusive mode? What can you see in the filters section? Do you get any audio or subtitles? Can you force a screen to render by activating the stats screen and pause/unpause? Does deleting all settings in the registry and external .INI help?
Do others have this specific problem?
We might need a debug build or session to solve this problem.

janos666
7th September 2011, 22:16
10-bit or better input from Y'CbCr surface types isn't happening with these mixers with the current video card drivers

I remember that mixer output is strictly RGB now because Cb and Cr levels were incorrect with YCC type mixer outputs.
Was the renderer in that particular test builds set to ask more than 8-bit YCC input from the mixer?

If it could also be a VGA driver and not MS Windows EVR related bug and bitdepths also played a role then may be we should check it again if a proper 8-bit YCC output is possible from the mixer now. (It was a relatively long time ago...)

JanWillem32
8th September 2011, 00:14
I've tested some of the Y'CbCr types as output quite some time ago, as expected the mixers won't work with those as output. It would be rather wrong to make the mixer use those anyway. The bit depth and reading/writing speeds of those types is pathetic.
The EVR "16-235" item was removed for two reasons: it compressed RGB levels to [16, 235] after the mixer color conversion stage, instead of trying to create something similar to Y'CbCr [16, 235], [16, 240], [16, 240], and the internal rendering stages are for full range RGB or R'G'B' (gamma requirements vary from type to type). The internal workings of the mixers are a bit silly, the only way to access the values below 16 and above 235 or 240 (for xvYCC for example) is by using floating-point surfaces. The values are then written beyond the regular [0, 1] floating point intervals.

Hera
8th September 2011, 03:43
With bad-peforming builds, using 8-bit surfaces,
1. Pre-Resize Denoise = Bad Performance just like 16-bit floating point surfaces
2. Post-Resize Sharpen Complex 2 = Bad performance when window is maximized - but not when just resized by dragging corner.

Shaders also lagg like hell on 1.5.3.3682 (build which has great HFP and FFP performance)

G_M_C
8th September 2011, 08:17
@G_M_C: Within D3D there are color formats for Y'CbCr and RGB, with various precision and ordering. In reality these all resolve to some device buffer format with parameters set for the device so it can convert the buffer to 32-bit floating point values when sampling from that buffer. The RGB types are usually pretty much the same as the buffer form, and support for over a hundred different configurations for precision and ordering is very common.
The Y'CbCr types are more difficult. Because of chroma down-sampling, the resolution of the Y' channel isn't the same as those for Cb and Cr. The driver simply can't assign a buffer in the same way as for RGB. There's also the issue of color conversion. For each Y'CbCr texture, the video card's driver has to provide methods to convert it to RGB.
Therefore the support for Y'CbCr formats is very limited, there's only NV12 for 4:2:0, YUY2 for 4:2:2 and UYVY for 4:2:2.
Nvidia video cards additionally support YV12 and I420/IYUV for 4:2:0 progressive video. Older Intel video cards are known to not support any 4:2:0 formats. This can all be seen using DXVA Checker.
These formats are all 8-bit, and I don't see driver support coming for any other formats anytime soon.

For VMR-9, the mixer has a few emulation options for supporting some 8-bit Y'CbCr types not supported by the device driver, this is not the case for EVR.
Both mixers will happily skip some internal conversions if the input format is the same as the output format (X8R8G8B8, A2R10G10B10, A16B16G16R16F or A32B32G32R32F). Conversion from one RGB format to another by the mixer alone will often fail, though.
10-bit or better input from RGB surface types isn't much of a problem.
10-bit or better input from Y'CbCr surface types isn't happening with these mixers with the current video card drivers.

It's possible to create a custom mixer to allocate RGB surfaces and write raw Y'CbCr data on it. A custom mixer has to handle all Y'CbCr surface conversions by itself, instead of letting the video card driver do it. That does allow other formats to be used. Examples of custom mixers can be found in Haali Renderer (YUY2 and RGB32 input) and MadVR (several modes).
[...]

Thanx for the in-depth answer. It clarifies things for me. To make sure though we understand each other: My reason for asking this question was much simpler.

Atm many decoders offer higher than 8-bit output. CoreAVC has released it's new version that does it too, LAV and ffdshow being others. Also x264 is available in higher than 8 bit versions, preparing to extent this to 4:4:4 in the future. And because of this 10-bit encoded video is starting to appear (Anime-type movies are named).

So an ability to recognize and accept higher than 8 bit input, and rendering that input without loss of quality (or as less loss of quality as possible), to me personally, seems something important for the near future.

And that is why / where my question came from :)

pirlouy
8th September 2011, 17:50
JFYI, I tested one of your builds today, at work: deviated from revision 3709; x86 SSE2.

EVR Custom** and WMR9** cause a crash at startup, when launching a file.
I don't know if it's linked but I have a Intel Gpu from motherboard (don't remember the version), and EVR Custom** and WMR9** cause a crash at startup, when launching a file.

Sorry, maybe it's useless for you, but it's just in case you try to improve the stability...

JanWillem32
8th September 2011, 20:13
@Hera: If a single extra shader pass produces that kind of a problem, there's simply not much processing headroom left. Those two shaders are pretty heavy though. The three passes I added are more like the "BT.601 -> BT.709" shader in terms of instruction and sampling count.

@G_M_C: I've been wanting to replace the mixers with a custom one for a while already, to overcome these kinds of problems, but that's pretty hard if you know very little about DirectShow programming.

@pirlouy: I actually have very little information about the processing capabilities of older video cards and previous generations of IGP types. I think it's hardware vertex processing that's the culprit here, but I can't simply assume that. I'll try to look for a tool that can check a video card's DirectX 9 caps so it can be tested, else I'll write something for it myself.

pirlouy
8th September 2011, 21:59
No problem.
But it's just in case it's meant to replace the current EVR Custom renderer. It will have to work with these old IGP (not that much old, since there's a Core 2 processor), so there should be a (default ?) option for these cards. Of course, it's just for debug, I won't watch a movie with this computer...