View Full Version : New ffdshow build (?)
regeszter
18th July 2006, 18:48
No, h.264 decoding is not multithreaded at all yet (and it's a huge bottleneck). The entire filter chain itself is what's multithreaded, and resizing gets its own extra threads. It usually ends up a rather lopsided 80%/30% on mine, with AVC, since decoding is all in the main thread.
Try taking your favorite xvid video and resizing to 1280x1024, plus your other filters. See if that brings it up to using both cores.
Also make sure that in misc options "queue output samples" is on.
source file is a divx6 avi
the queue output samples is ON
This options were used:
- levels
- Blur & NL (denoise3d) <- this use about 30% from total (48%)
- Subtitles
- Resize & aspect 720*576 -> 1024*768
- Sharpen
and here is the result
http://kepfeltoltes.hu/060718/kep1_www.kepfeltoltes.hu_.jpg
without Blur & NL
http://kepfeltoltes.hu/060718/kep2_www.kepfeltoltes.hu_.jpg
I think the denoise3d does not use 2 cores.
haruhiko_yamagata
18th July 2006, 23:36
regeszter, read this (http://sourceforge.net/tracker/index.php?func=detail&aid=1472926&group_id=53761&atid=471491) and you'll know what's multithreaded.
CPU usage is not importantl at all.
The frame rate is most important.
Compare ffdshow older than 20060418 and the newest. With the files that have frame rate more than 20 with old one, you will see improvement. Enable OSD item "Queued samples" and confirm your video renderer support multithreading. WMP, the old renderer and some VMR9 renderer(intel 82865G etc) does not support queue. In this case test MPC + VMR7 or VMR9 renderless.
foxyshadis
19th July 2006, 12:35
I think he's right. I'm going to have to find some older versions of ffdshow to test against, but by checking out process explorer it's obvious that one single thread (splitter.ax, oddly, not ffdshow.ax) uses 50% cpu and desync or stuttering starts occuring. Queued samples is 0 almost all the time, even with all filters off; occasionally it jumps to 1 for a single frame. I know for sure that it used to queue better. (And I have an even stronger cpu now!) I have to test whether it's only vm9's builds or not.
MPC (latest), ffdshow build 20060714, sample queuing is on. It occurs at least with all the july builds.
videomixer9
19th July 2006, 12:57
Can mplayer and ffmpeg use multithreading on anything else than MPEG1/2 yet? as long as ffmpeg doesn't implement decoder multithreading you won't see much spreaded load on h264 decoding.
haruhiko_yamagata
19th July 2006, 12:59
I think he's right. I'm going to have to find some older versions of ffdshow to test against, but by checking out process explorer it's obvious that one single thread (splitter.ax, oddly, not ffdshow.ax) uses 50% cpu and desync or stuttering starts occuring. Queued samples is 0 almost all the time, even with all filters off; occasionally it jumps to 1 for a single frame. I know for sure that it used to queue better. (And I have an even stronger cpu now!) I have to test whether it's only vm9's builds or not.
MPC (latest), ffdshow build 20060714, sample queuing is on. It occurs at least with all the july builds.
What is the video renderer? Some of the VMR9 renderer does not give us buffer soon. In such case, the queue is not effective. Please test other video renderer (VMR9 renderless or VMR7).
Calling video renderer's "Receive" from the thread that is differnt to the one called "GetBuffer" is not documented feature. Some renderer may fail to give us buffer, and ffdshow have to wait and re-try. As far as I know, WMP(DMO wrapper is connected to ffdshow) and VMR9(i82865G) is the case.
As for i82865G, MPC + VMR7 or VMR9 renderless is successfull.
G400 was OK with all the renderer except for the old one.
//EDIT
Of course, when video is too heavy, ffdshow can't queue. The frame rate should be better than 70% with the single thread version.
foxyshadis
19th July 2006, 13:19
Can mplayer and ffmpeg use multithreading on anything else than MPEG1/2 yet? as long as ffmpeg doesn't implement decoder multithreading you won't see much spreaded load on h264 decoding.
Well, okay, I'll give you that, but it's 0 no matter how many or how few filters, no matter whether the cpu usage is 5% or 50% or anywhere in between.
haruhiko, normally it's haali's, but it's exactly the same with VMR9 modes. (VMR7 simply doesn't work now, wtf?) It's a X1300 card. Maybe I'll try a different graphics driver, it's possible the latest omega drivers broke something. Weird.
haruhiko_yamagata
19th July 2006, 13:25
Can mplayer and ffmpeg use multithreading on anything else than MPEG1/2 yet? as long as ffmpeg doesn't implement decoder multithreading you won't see much spreaded load on h264 decoding.
Multithreading of decoder is very important, of course.
I played the 1080p superman trailer with mplayer. It seems to use multithreading. Updating libavcodec is really worth while(but I don't have time now...).
videomixer9
19th July 2006, 13:28
Well I get no queued samples either except if I use Haali Renderer where it occasionally jumps to 1.
thuan
19th July 2006, 13:29
Mine Semprom 2200+ XP, with a G4MX4000 91.28 no queue in OverlayMixer with any file, with VMR9 renderless depend on how heavy the files and how many filters I use it goes from 0 to 9 queue sample.
I found one weird things in vm9 latest nosee build, in VMR9 renderless and only renderless mode aac decoding seems to be broken with any types whether it's SBR or not. clsid build works fine, it's maybe ICL fault again.
haruhiko_yamagata
19th July 2006, 13:45
I found one weird things in vm9 latest nosee build, in VMR9 renderless and only renderless mode aac decoding seems to be broken with any types whether it's SBR or not. clsid build works fine, it's maybe ICL fault again.
clisd's build doesn't include the queue AFAIK. It may be single CPU/multithreading(thread priority) problem. Does disabling "Queue output samples" improve anything?
thuan
19th July 2006, 14:02
No, it doesn't matter whether ffdshow is used as video decoder or not same for queue output sample just as long as there's aac with vmr9 renderless and bang. Internal MPC's aac decoder works fine whether ffdshow video decoder is used or not. I think ICL must be the culprit.
regeszter
19th July 2006, 18:08
Here is my another test.
a h264 file, MPC with vmr9 renderless, queue output samples is on
resize is on, NR is off, the move is smooth
http://kepfeltoltes.hu/060719/623361222k_p1_www.kepfeltoltes.hu_.jpg
resize is on, NR is on, the move stutter
http://kepfeltoltes.hu/060719/k_p2_www.kepfeltoltes.hu_.jpg
So there is power in cpu but the move stutter, the multithreading does not work well. :(
_xxl
19th July 2006, 21:25
InnoSetup with CPU detection:
http://rapidshare.de/files/26330053/FFdshow-20060716-rev2546.exe.html
libavcodec.dll & libmplayer.dll are GCC 3.4.2
Kostarum Rex Persia
20th July 2006, 03:40
Guys, what's going on with 64-bit build of FFDSHOW? I am very nervous about lacking 64-bit support.
Can anyone send me a private message with an idea how to contact Milan, author of FFDSHOW?
celtic_druid
20th July 2006, 04:05
Same thing that is going on with 32bit ffdshow... Not much. 64bit build does work though.
Doesn't look like Milan is around or at least isn't spending any time on ffdshow. Which is fine. Once again, all the source is there for you to work on.
Liisachan
20th July 2006, 04:26
In case anyone needs the link to celtic_druid's build (64-bit):
http://ffdshow.faireal.net/mirror/ffdshow/ffdshow64-rev2546.exe
celtic_druid
20th July 2006, 04:51
How many 64bit Media Players anyway?
Windows Vista x64 Edition features native 64-bit wmp11. New wmp11 for Windows XP Pro x64 Edition and for Windows Server 2003 x64 Editions is supposed to 64-bit. I hope that all codecs will continue to work in 64-bit directshow player? I will have to test newest Vista build and see.
celtic_druid
20th July 2006, 06:58
Last time I had a look at the WMP11 download it was 32bit only. So the current Vista beta includes 64bit WMP11?
haruhiko_yamagata
22nd July 2006, 04:45
OSD item "Video delay"
ffdshow has IQualityControl since rev2411(just after the release of the ffdshow-20051115.exe). When the video is delayed more than 1500ms, ffdshow drops a frame. It improves audio-video sync. H.264 (and perhaps some other codecs) cannot(?) show the next frame untill it gets "sync point". Next sync point may be very far, so sometimes ffdshow cannot show a new frame for seconds.
Resize setting - automatic vertical size setting
Improved? "non-square pixel" support (Bug fix).
As for MPC + MKV, use proper external mkv spliter and disable MPC internal matroska source filter for non-square pixel support.
Avoid multithreading of resize(swscaler) if the CPU is Pentium4-HTIt is not faster at all and use more CPU. (Swscaler depends much on MMX and P4HT have only one MMX unit.)
Use more multithreading of resize(swscaler) if the CPU is dual core or multi-CPU.
Imported changes of swscaler from mplayer/libswscale.
Rewrited multithreading of swscaler to synronize to mplayer/libswscale (doesn't mean it fixes the holizontal line problem).
[Patch] (http://sourceforge.net/tracker/index.php?func=detail&aid=1472926&group_id=53761&atid=471491) ffdshow_multithread_060721.patch is PATCH to PATCH.
To rev2546 + ...060517 + ...060601 + ...060604 + ...060625 + ...060709.patch
haruhiko_yamagata
22nd July 2006, 04:52
This time's patch is trivial for most users.
Btw, the newest original libavcodec's H.264 is not only faster, but also support quality control("skiploopfilter/skipidct/skipframe decoder options for very fast H.264 decoding"), so I would like to use it. Anyway, I would like to update libavcodec.dll but it looks very difficult.
The OSD item "Video delay" explains how ffdshow controls audio-video sync. There's room for improvement.
Could you guide me "How to import changes from ffmpeg"?
haruhiko_yamagata
22nd July 2006, 10:35
ffdshow-20060722-Q.exe
Needs SSE
http://www.mytempdir.com/818768
libmplayer.dll, libavcodec.dll and libmpeg2.dll are by GCC 4.0.3.
The rest are by MSVC8(VS2005).
applied patches
ffdshow_vorbis6ch.patch
inttypes.diff
ffdshow_accuracy.diff
dts.patch
ffdshow_multithread_060721.patch
TsampleFormat.patch
_xxl
22nd July 2006, 10:45
InnoSetup with CPU detection:
MMX, SSE & SSE2.
http://rapidshare.de/files/26603725/FFdshow-20060722-rev2546.exe.html
libavcodec.dll & libmplayer.dll are GCC 4.0.3
NULUSIOS
22nd July 2006, 10:50
Now waiting for videomix? :D
LoRd_MuldeR
22nd July 2006, 11:29
When will all those patches be in the SVN ???
We are at r2546 for over 2 month now...
haruhiko_yamagata
22nd July 2006, 11:44
When will all those patches be in the SVN ???
We are at r2546 for over 2 month now...
I asked milan 2 weeks ago, still waiting for reply...
Liisachan
22nd July 2006, 11:53
mirrored
ffdshow-20060722-Q.exe (http://ffdshow.faireal.net/mirror/Misc%20(not%20by%20celtic_druid)/ffdshow-20060722-Q.exe)
12c0100c8cdc1a0698fb4cc709cda7a4
FFdshow-20060722-rev2546.exe (http://ffdshow.faireal.net/mirror/Misc%20(not%20by%20celtic_druid)/FFdshow-20060722-rev2546.exe)
bc0d6679017209bce1054e6d0cba480d
haruhiko_yamagata
22nd July 2006, 11:59
Liisachan, thank you very much.
videomixer9
22nd July 2006, 12:11
Btw. oddly as non-admin resizing stopped to work for me at all, whyever that is. However I updated too, found to be here (http://ffdshow.pytalhost.eu/current.php).
Triple play: MSVC 2005 for ffdshow.ax, libmplayer and libavcodec gcc 4.1.1 and the rest ICL 9.1
haruhiko_yamagata
22nd July 2006, 12:32
oddly as non-admin resizing stopped to work for me at all
What does this mean? Please explain.
Thank you for your build.
videomixer9
22nd July 2006, 12:37
I fiddled some more and it seems to be only that it only works with output queue enabled, if output queue is disabled it doesn't work, just my admin account had output queue enabled while I had it disabled in my user account. Though I just noticed it doesn't often work with it enabled either.
So short said, as admin user resizing always works, but as regular user it often doesn't no matter which settings. Users that permanently work as admin are imo big time newbs so this issue is kinda annoying to me. Any functions used that need admin rights? It worked without problems before usually.
Tested some more vids and it just randomly occurs.
Imported changes of swscaler from mplayer/libswscale.
Rewrited multithreading of swscaler to synronize to mplayer/libswscale (doesn't mean it fixes the holizontal line problem).
BTW, the dev for swcaler asked for additional info. Remind me the link to that QT file which clearly shows the problem, and also mplayer CLI settings for it to show both cases (i.e. with and without the bug).
haruhiko_yamagata
22nd July 2006, 13:13
BTW, the dev for swcaler asked for additional info. Remind me the link to that QT file which clearly shows the problem, and also mplayer CLI settings for it to show both cases (i.e. with and without the bug).
Thank you for reporting the problem.
The snapshot was posted by LoRd_MuldeR .
http://forum.doom9.org/showthread.php?p=852128#post852128
Direct link
http://img65.imageshack.us/img65/8697/snapshot0so.png
Command line to show bug
mplayer.exe -vf scale=1024:480 filename.movI don't have sample command line not to show bug.
haruhiko_yamagata
22nd July 2006, 13:18
I fiddled some more and it seems to be only that it only works with output queue enabled, if output queue is disabled it doesn't work, just my admin account had output queue enabled while I had it disabled in my user account. Though I just noticed it doesn't often work with it enabled either.
So short said, as admin user resizing always works, but as regular user it often doesn't no matter which settings. Users that permanently work as admin are imo big time newbs so this issue is kinda annoying to me. Any functions used that need admin rights? It worked without problems before usually.
Tested some more vids and it just randomly occurs.
Thank you.
I can't reproduce untill now.
It's you, so it's very unlikely but please confirm the access right to the file(libmplayer.dll) and related registry keys. As for registry key, I added "resizeIsDy0" to \GNU\ffdshow\default.
videomixer9
22nd July 2006, 13:29
Access rights are okay, also I had the registry values wiped. Maybe it's something else. Not that I really care that much. If it's not reproducible it must be sth. with the settings here. Still kind of odd the problem only occurs as user and not as admin. The settings are correctly stored under HKCU and access rights are okay ...
haruhiko_yamagata
22nd July 2006, 13:32
Egh,
If he can't download QT file, the link below is available (posted by LoRd_MuldeR).
http://jfl1974.free.fr/upload/superman.mp4
Resizing to large size is necessary to see it clearly. For holizontal size 1280 or more is recommended.
foxyshadis
22nd July 2006, 20:55
Hmm, I thought I had a cause for the lack of queuing but it didn't pan out. Is there any way that a diagnostic build could log information about what succeeds and what fails when attempting to start and use the queue? I guess I could look through it and try to find it myself.
btw, as to ffmpeg (lavc), I tried it the old fashioned way: backup, delete, resync. Only got tons of errors for my trouble. ^^; It's possible gcc would do better if you also used the updated makefiles.
haruhiko_yamagata
22nd July 2006, 22:18
Hmm, I thought I had a cause for the lack of queuing but it didn't pan out. Is there any way that a diagnostic build could log information about what succeeds and what fails when attempting to start and use the queue? I guess I could look through it and try to find it myself.
btw, as to ffmpeg (lavc), I tried it the old fashioned way: backup, delete, resync. Only got tons of errors for my trouble. ^^; It's possible gcc would do better if you also used the updated makefiles.
The queue is tryed unless the video renderer is the old one, the application is WMP, the dialog is unchecked. If the OSD item "Queued samples" sometimes show 1, the queue is being tryed. One main reason the queue fails is error on GetBuffer. See TffdshowDecVideo::initializeOutputSample/Tffdecoder.cpp. It has DPRINTF(_l("GetDeliveryBuffer returned %x"),hr); after GetDeliveryBuffer. If you can use MSVC's debugger, you'll see the error code. Typically 0x80040223 if your video renderer does not support this special multithread feature.
GetBuffer and Receive may be supposed to be called from the same thread synchronously. It is not writtern in document (may or may not). So we might be better to consider it as bonus. If it have been working on your system with the same video card, there's room for work around though.
NULUSIOS
22nd July 2006, 22:24
...
are you sure your links work?
have a hard time connecting
videomixer9
22nd July 2006, 22:33
duh lame hosters always dying:
http://www.mooload.com/new/file.php?file=files/220706/1153566369/ffdshow-20060722-rev2546.exe
http://www.mytempdir.com/819217
http://s16.simpleupload.de/febe8be78/ffdshow-20060722-rev2546.exe.html
http://ultrashare.net/hosting/fl/73229f29f3/
http://rapidshare.de/files/26609917/ffdshow-20060722-rev2546.exe.html
http://files.to/get/142940/16908/ffdshow-20060722-rev2546.exe
oh well, not much you can expect from drunken linux kiddies trying to run a hosting service and also for free. Can be only hours till some of those drunkards notice.
NULUSIOS
22nd July 2006, 22:45
np dude :)
videomixer9
22nd July 2006, 23:01
okay updated to other host, again but this time with direct downloads too.
foxyshadis
23rd July 2006, 01:35
[5552] Receive Returned 0
[5552] GetDeliveryBuffer returned 0
[5552] COutputQueue queued a sample
[5552] CTransformInputPin::Receive 5688
[5552] CTransformInputPin::Receive 5780
[5552] CTransformInputPin::Receive 5780
Odd, it seems to be working fine. Resize multithreading works fine, loading both cores when I turn it way up, but no matter what I do everything else seems to be processed within a single thread. The first core is always around 5% while the second is near 100%. This is going to drive me mad until I figure it out. Oh well, I'll see whether I can find out why on my own.
haruhiko_yamagata
23rd July 2006, 01:59
Odd, it seems to be working fine. Resize multithreading works fine, loading both cores when I turn it way up, but no matter what I do everything else seems to be processed within a single thread. The first core is always around 5% while the second is near 100%. This is going to drive me mad until I figure it out. Oh well, I'll see whether I can find out why on my own.
To see the first core is nealy 100%, the movie may be just too heavy. But probably you mean smaller size picture doesn't queue too...
The second core's CPU usage may be small just because your video renderer is too fast.
haruhiko_yamagata
23rd July 2006, 02:55
The picture attached shows a case queue is effective.
In this case, CPU usage of second core is very low, but it drops no frame. When queue is off, there are some frame drops.
foxyshadis
23rd July 2006, 06:50
Ah, I think it came down to using your builds instead of vm9's; with his I get full utilization of both cores and can watch much larger video (or add more filters) before it starts to lag. I'm glad it was something like that, I take it gcc's multithreading just isn't that hot.
_xxl
23rd July 2006, 09:49
haruhiko_yamagata:
Minimum CPU requirement: SSE
libmplayer.dll, libavcodec.dll and libmpeg2.dll are by GCC 4.0.3.
The rest are by MSVC8(VS2005).
http://www.mytempdir.com/818768
http://ffdshow.faireal.net/mirror/Misc%20(not%20by%20celtic_druid)/ffdshow-20060722-Q.exe
videomixer9:
Minimum CPU requirement: SSE
http://ffdshow.da.cx/
http://www.mooload.com/new/file.php?file=files/220706/1153566369/ffdshow-20060722-rev2546.exe
http://www.mytempdir.com/819217
http://s16.simpleupload.de/febe8be78/ffdshow-20060722-rev2546.exe.html
http://ultrashare.net/hosting/fl/73229f29f3/
http://rapidshare.de/files/26609917/ffdshow-20060722-rev2546.exe.html
http://files.to/get/142940/16908/ffdshow-20060722-rev2546.exe
clsid:
Minimum CPU requirement: MMX
Compilers used:
- MSVC71 (ffdshow.ax, etc)
- GCC 3.4.5 (libavcodec.dll, kerneldeint.dll, TomsMoComp_ff.dll)
- GCC 4.0.3 (mplayer.dll)
rev2543:
http://rapidshare.de/files/26625385/ffdshow_rev2543_20060722.exe.html
rev2546:
http://rapidshare.de/files/26625534/ffdshow_rev2546_20060722.exe.html
XXL:
Minimum CPU requirement: MMX
InnoSetup with CPU detection:
MMX, SSE & SSE2.
libavcodec.dll & libmplayer.dll are GCC 4.0.3 (mmx, sse & sse2)
http://rapidshare.de/files/26603725/FFdshow-20060722-rev2546.exe.html
http://ffdshow.faireal.net/mirror/Misc%20(not%20by%20celtic_druid)/FFdshow-20060722-rev2546.exe
videomixer9
23rd July 2006, 10:28
use services like xs.to, imageshack or others for pictures, attachments take ages to get approved here. As to multithreading it is sad that ICLs /Qparallel switch makes things crash on several systems.
clsid
23rd July 2006, 15:31
I have made a special test build of ffdshow containing 13 different versions of libavcodec.dll. One of these can (if supported by your cpu) be selected during installation. The rest of the components are taken from my latest rev2546 build.
This build is for testing the pure decoding speed of libavcodec when using certain optimization switches and GCC versions. Testing other stuff, like filters, is pointless since that isn't done by libavcodec.
The following libavcodec.dll builds are included:
* GCC 3.4.5 [default makefile]
* GCC 4.0.3 [default makefile]
* GCC 4.1.1 [default makefile]
* GCC 3.4.5 [-march=i686 -mtune=i686 -mmmx] (MMX)
* GCC 3.4.5 [-march=i686 -mtune=i686 -mmmx -msse -mfpmath=sse] (SSE)
* GCC 3.4.5 [-march=i686 -mtune=i686 -mmmx -msse -msse2 -mfpmath=sse] (SSE2)
* GCC 3.4.5 [-march=athlon -mtune=athlon]
* GCC 3.4.5 [-march=athlon-xp -mtune=athlon-xp -mfpmath=sse]
* GCC 3.4.5 [-march=k8 -mtune=k8 -mfpmath=sse] (AMD Athlon 64)
* GCC 3.4.5 [-march=pentium3 -mtune=pentium3 -mfpmath=sse]
* GCC 3.4.5 [-march=pentium4 -mtune=pentium4 -mfpmath=sse]
* GCC 3.4.5 [-march=pentium-m -mtune=pentium-m -mfpmath=sse]
* GCC 3.4.5 [-march=prescott -mtune=prescott -mfpmath=sse] (New P4 models)
download (http://www.mytempdir.com/821659)
Some results I got when testing an H.264 file on an AMD Athlon:
* GCC 3.4.5 is the fastest for a plain build. Closely followed by GCC 4.0.3. GCC 4.1.1 is the slowest.
* MMX build is fastest, Athlon build comes in second.
videomixer9
23rd July 2006, 20:12
try to make -O3 into -O2 and just add any default parameters -O3 enables except for loop unswitching, note GCC 3.x and 4.x those are different :P Other than that it seems to be in favor of what I always stated, ffdshow libavcodec is runtime cpu detection enabled by default thus almost all other optimizations are not making much of a difference.
vBulletin® v3.8.11, Copyright ©2000-2025, vBulletin Solutions Inc.