Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion. Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules. |
|
|
Thread Tools | Search this Thread | Display Modes |
4th January 2010, 15:55 | #442 | Link |
Registered User
Join Date: Apr 2002
Location: Germany
Posts: 4,926
|
I hope it doesn't get compiled with Intels Compiler as this is still causing performance issues on AMD CPUs (going less efficient compilation paths) ? the issue is very well known
but i guess for those parts that create the speed for a Video Decoder case (the hand optimized ASM) it makes not such a big difference on the pure communication level between Windows and the Decoder that are non hand optimized ?
__________________
all my compares are riddles so please try to decipher them yourselves :) It is about Time Join the Revolution NOW before it is to Late ! http://forum.doom9.org/showthread.php?t=168004 Last edited by CruNcher; 4th January 2010 at 15:59. |
4th January 2010, 16:27 | #443 | Link | |
Registered User
Join Date: Nov 2005
Posts: 497
|
Quote:
But I plan to use it for speed on intel cpu. some performance-critical parts are optimized by asm such as motion-compensation, idct, deblock, cabac, while the pure communication level between Windows and the Decoder use the functions in run-time library, for example, _beginthreadex.
__________________
The Next Generation Internet Video Codec project.[/url]. |
|
4th January 2010, 16:36 | #444 | Link | |
Compiling Encoder
Join Date: Jan 2007
Posts: 1,348
|
Quote:
Intel's compiler is faster than MSVC which is faster than gcc when targeting x86 or x86_64, even on my AMD. ICL defaults to general SSE2 generation, you have to specifically specify an option to optimize for intel cpus, and that would only cause it to crash on AMD cpus which don't have the corresponding instruction set. doing something like /arch:sse /QaxSSSE3 would have it use sse and then make a special SSSE3 cpu path. |
|
4th January 2010, 16:43 | #445 | Link |
Software Developer
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,248
|
It should be noted that builds compiled with "/arch:see" or "/arch:see2" (default) won't run on non-SSE2 capable CPU's. So people will complain
As "/Qax" can be used more than once, the combination "/arch:ia32 /QaxSSE /QaxSSE2 /QaxSSSE3" should produce a build that runs on any CPU and still uses optimized SSE/SSE2/SSSE3 code, if supported. Also it seems the fastest builds for Intel CPU's can only be achieved with "/QxSSE2" or "/QxSSSE3", but that is only feasible if you want to offer special "Intel only" builds.
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊ Last edited by LoRd_MuldeR; 4th January 2010 at 16:47. |
4th January 2010, 16:49 | #446 | Link | |
Registered User
Join Date: Nov 2005
Posts: 497
|
Quote:
The current DiAVC requires SSE2.
__________________
The Next Generation Internet Video Codec project.[/url]. |
|
4th January 2010, 16:53 | #447 | Link |
Software Developer
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,248
|
There is an explicit warning to NOT link code with Intel-specific optimizations ("/Qx") against code that doesn't use those optimizations.
I think that's because it would bypass Intel's CPU check they add to the main() function. So the binary would run Non-Intel CPU's although it contains Intel-only code. This can lead to "undefined" behavior. Normally a binary with Intel-specific optimizations will exit with error message on Non-Intel CPU's.
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊ |
4th January 2010, 18:04 | #448 | Link |
Registered User
Join Date: Apr 2002
Location: Germany
Posts: 4,926
|
http://www.brightsideofnews.com/news...substance.aspx <- im referring to this lawsuit
__________________
all my compares are riddles so please try to decipher them yourselves :) It is about Time Join the Revolution NOW before it is to Late ! http://forum.doom9.org/showthread.php?t=168004 |
4th January 2010, 18:51 | #449 | Link | |
Registered User
Join Date: Apr 2008
Posts: 1,181
|
Quote:
Intel's "cripple AMD" function |
|
4th January 2010, 19:11 | #450 | Link | ||
Software Developer
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,248
|
Quote:
Quote:
However the latter is the default in the current ICC version, so I assume such claim would date back to an old outdated version... See also: http://software.intel.com/sites/prod...tm#option_arch
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊ Last edited by LoRd_MuldeR; 4th January 2010 at 19:21. |
||
4th January 2010, 19:18 | #451 | Link |
*****
Join Date: Feb 2005
Posts: 5,647
|
ICL can be patched to skip the GenuineIntel checks so that it also uses SSE/SSE2/etc on AMD CPUs.
I suggest not to worry about compilers. Just make one generic build with your own assembly optimizations for all performance critical parts. This also allows making your own choices about which function implementations get used on which processors.
__________________
MPC-HC 2.2.1 Last edited by clsid; 4th January 2010 at 19:21. |
8th January 2010, 13:41 | #452 | Link |
Registered User
Join Date: Nov 2005
Posts: 497
|
Could somebody help me the test on single-core old cpu such as PentiumM, athlon64?
http://di-avc.com/testd.7z. Download the file and uppack it, run the run.bat and feedback the results. Recently I have finished all yuv scaling and yuv2rgb functions and am working on the bug about the seeking and timestamp.
__________________
The Next Generation Internet Video Codec project.[/url]. |
8th January 2010, 14:18 | #453 | Link |
Registered User
Join Date: Mar 2006
Posts: 55
|
Code:
nothrd.exe basketball720x576.264 99 frames decoded totally. 3928067 counters used by decoder and 13161 counters used by others. Decoding speed: 90 fps sglthrd.exe basketball720x576.264 99 frames decoded totally. 4263451 counters used by decoder and 14818 counters used by others. Decoding speed: 83 fps nothrd.exe basketball720x576.264 99 frames decoded totally. 3918364 counters used by decoder and 13008 counters used by others. Decoding speed: 90 fps sglthrd.exe basketball720x576.264 99 frames decoded totally. 4124521 counters used by decoder and 12798 counters used by others. Decoding speed: 85 fps pause |
8th January 2010, 16:43 | #454 | Link |
Registered User
Join Date: Oct 2006
Posts: 150
|
Single thread Athlon64
Code:
nothrd.exe basketball720x576.264 99 frames decoded totally. 3684944 counters used by decoder and 12321 counters used by others. Decoding speed: 96 fps sglthrd.exe basketball720x576.264 99 frames decoded totally. 3545769 counters used by decoder and 12152 counters used by others. Decoding speed: 99 fps nothrd.exe basketball720x576.264 99 frames decoded totally. 3406643 counters used by decoder and 12143 counters used by others. Decoding speed: 104 fps sglthrd.exe basketball720x576.264 99 frames decoded totally. 3523578 counters used by decoder and 12225 counters used by others. Decoding speed: 100 fps pause |
8th January 2010, 17:50 | #455 | Link |
*****
Join Date: Feb 2005
Posts: 5,647
|
The test fails (with Illegal Instruction error) on an old AMD Athlon Thunderbird (1.33 Ghz). That CPU only has MMX and MMXext, but no SSE1.
__________________
MPC-HC 2.2.1 |
8th January 2010, 18:38 | #456 | Link |
Registered User
Join Date: Nov 2005
Posts: 497
|
The DiAVC requires SSE2.
__________________
The Next Generation Internet Video Codec project.[/url]. |
8th January 2010, 19:04 | #457 | Link | |
Registered User
Join Date: Dec 2009
Posts: 25
|
Quote:
|
|
9th January 2010, 03:07 | #458 | Link | |
Registered User
Join Date: Apr 2002
Location: Germany
Posts: 4,926
|
Athlon 64 X2 (Toledo) SSE (1,2,3) 3DNOW(+) MMX(+)
Quote:
__________________
all my compares are riddles so please try to decipher them yourselves :) It is about Time Join the Revolution NOW before it is to Late ! http://forum.doom9.org/showthread.php?t=168004 |
|
9th January 2010, 10:23 | #459 | Link |
Registered User
Join Date: Oct 2001
Posts: 33
|
Code:
nothrd.exe basketball720x576.264 99 frames decoded totally. 3754152 counters used by decoder and 17488 counters used by others. Decoding speed: 94 fps sglthrd.exe basketball720x576.264 99 frames decoded totally. 3986075 counters used by decoder and 17523 counters used by others. Decoding speed: 88 fps nothrd.exe basketball720x576.264 99 frames decoded totally. 3927613 counters used by decoder and 17756 counters used by others. Decoding speed: 90 fps sglthrd.exe basketball720x576.264 99 frames decoded totally. 3995219 counters used by decoder and 17091 counters used by others. Decoding speed: 88 fps |
12th January 2010, 19:26 | #460 | Link |
Registered User
Join Date: Nov 2005
Posts: 497
|
A new beta version is available.
Changes: 20100112 fix two bugs on timestampt. Only using the main thread for old intel cpu. The functions about scaling and yuv2rgb are finished for some days, but I must firstly add a propertypage before I integrate them. I heard that using the same thread in frame parallel instead of creating new thread for every frame can improve performance, I will try it. As regarding to the compatibility with the elecard splitter, I found that the elecard splitter sends half-baked nalu to the decoder. Fixing it is easy, but a buffer is needed to buffer the nalues. As the bug influences raw 264 bitstreams only, I don't fix it in this release.
__________________
The Next Generation Internet Video Codec project.[/url]. |
Tags |
avc, diavc, fastest decoder, h.264, software |
|
|