Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Video Encoding > MPEG-4 AVC / H.264

Reply
 
Thread Tools Search this Thread Display Modes
Old 4th January 2010, 10:34   #441  |  Link
LigH
German doom9/Gleitz SuMo
 
LigH's Avatar
 
Join Date: Oct 2001
Location: Germany, rural Altmark
Posts: 6,779
Then I can even check it on an AMD Duron-800...
__________________

New German Gleitz board
MediaFire: x264 | x265 | VPx | AOM | Xvid
LigH is offline   Reply With Quote
Old 4th January 2010, 15:55   #442  |  Link
CruNcher
Registered User
 
CruNcher's Avatar
 
Join Date: Apr 2002
Location: Germany
Posts: 4,926
I hope it doesn't get compiled with Intels Compiler as this is still causing performance issues on AMD CPUs (going less efficient compilation paths) ? the issue is very well known
but i guess for those parts that create the speed for a Video Decoder case (the hand optimized ASM) it makes not such a big difference on the pure communication level between Windows and the Decoder that are non hand optimized ?
__________________
all my compares are riddles so please try to decipher them yourselves :)

It is about Time

Join the Revolution NOW before it is to Late !

http://forum.doom9.org/showthread.php?t=168004

Last edited by CruNcher; 4th January 2010 at 15:59.
CruNcher is offline   Reply With Quote
Old 4th January 2010, 16:27   #443  |  Link
schweinsz
Registered User
 
Join Date: Nov 2005
Posts: 497
Quote:
Originally Posted by CruNcher View Post
I hope it doesn't get compiled with Intels Compiler as this is still causing performance issues on AMD CPUs (going less efficient compilation paths) ? the issue is very well known
but i guess for those parts that create the speed for a Video Decoder case (the hand optimized ASM) it makes not such a big difference on the pure communication level between Windows and the Decoder that are non hand optimized ?
The DiAVC is compiled by vc2008, and the icl is never used.
But I plan to use it for speed on intel cpu.

some performance-critical parts are optimized by asm such as motion-compensation, idct, deblock, cabac, while the pure communication level between Windows and the Decoder use the functions in run-time library, for example, _beginthreadex.
schweinsz is offline   Reply With Quote
Old 4th January 2010, 16:36   #444  |  Link
kemuri-_9
Compiling Encoder
 
kemuri-_9's Avatar
 
Join Date: Jan 2007
Posts: 1,348
Quote:
Originally Posted by CruNcher View Post
I hope it doesn't get compiled with Intels Compiler as this is still causing performance issues on AMD CPUs (going less efficient compilation paths) ? the issue is very well known
but i guess for those parts that create the speed for a Video Decoder case (the hand optimized ASM) it makes not such a big difference on the pure communication level between Windows and the Decoder that are non hand optimized ?
what?
Intel's compiler is faster than MSVC which is faster than gcc when targeting x86 or x86_64, even on my AMD.

ICL defaults to general SSE2 generation,
you have to specifically specify an option to optimize for intel cpus, and that would only cause it to crash on AMD cpus which don't have the corresponding instruction set.
doing something like
/arch:sse /QaxSSSE3 would have it use sse and then make a special SSSE3 cpu path.
__________________
custom x264 builds & patches | F@H | My Specs
kemuri-_9 is offline   Reply With Quote
Old 4th January 2010, 16:43   #445  |  Link
LoRd_MuldeR
Software Developer
 
LoRd_MuldeR's Avatar
 
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,248
It should be noted that builds compiled with "/arch:see" or "/arch:see2" (default) won't run on non-SSE2 capable CPU's. So people will complain

As "/Qax" can be used more than once, the combination "/arch:ia32 /QaxSSE /QaxSSE2 /QaxSSSE3" should produce a build that runs on any CPU and still uses optimized SSE/SSE2/SSSE3 code, if supported.

Also it seems the fastest builds for Intel CPU's can only be achieved with "/QxSSE2" or "/QxSSSE3", but that is only feasible if you want to offer special "Intel only" builds.
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊

Last edited by LoRd_MuldeR; 4th January 2010 at 16:47.
LoRd_MuldeR is offline   Reply With Quote
Old 4th January 2010, 16:49   #446  |  Link
schweinsz
Registered User
 
Join Date: Nov 2005
Posts: 497
Quote:
Originally Posted by kemuri-_9 View Post
what?
Intel's compiler is faster than MSVC which is faster than gcc when targeting x86 or x86_64, even on my AMD.

ICL defaults to general SSE2 generation,
you have to specifically specify an option to optimize for intel cpus, and that would only cause it to crash on AMD cpus which don't have the corresponding instruction set.
doing something like
/arch:sse /QaxSSSE3 would have it use sse and then make a special SSSE3 cpu path.
Good news for me! The DiAVC will be free of two version for amd and intel. I recently plan to use icl, but is it feasible that some files use msvc while others use icl? Is it consistant about the link-time code generation between msvc and icl?
The current DiAVC requires SSE2.
schweinsz is offline   Reply With Quote
Old 4th January 2010, 16:53   #447  |  Link
LoRd_MuldeR
Software Developer
 
LoRd_MuldeR's Avatar
 
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,248
There is an explicit warning to NOT link code with Intel-specific optimizations ("/Qx") against code that doesn't use those optimizations.

I think that's because it would bypass Intel's CPU check they add to the main() function. So the binary would run Non-Intel CPU's although it contains Intel-only code.

This can lead to "undefined" behavior. Normally a binary with Intel-specific optimizations will exit with error message on Non-Intel CPU's.
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊
LoRd_MuldeR is offline   Reply With Quote
Old 4th January 2010, 18:04   #448  |  Link
CruNcher
Registered User
 
CruNcher's Avatar
 
Join Date: Apr 2002
Location: Germany
Posts: 4,926
http://www.brightsideofnews.com/news...substance.aspx <- im referring to this lawsuit
__________________
all my compares are riddles so please try to decipher them yourselves :)

It is about Time

Join the Revolution NOW before it is to Late !

http://forum.doom9.org/showthread.php?t=168004
CruNcher is offline   Reply With Quote
Old 4th January 2010, 18:51   #449  |  Link
roozhou
Registered User
 
Join Date: Apr 2008
Posts: 1,181
Quote:
Originally Posted by CruNcher View Post
And a good article about this:

Intel's "cripple AMD" function
roozhou is offline   Reply With Quote
Old 4th January 2010, 19:11   #450  |  Link
LoRd_MuldeR
Software Developer
 
LoRd_MuldeR's Avatar
 
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,248
Quote:
Originally Posted by Agner`s CPU blog
Sounds nice, but the truth is that the CPU dispatcher didn't support SSE or SSE2 or any higher SSE in AMD processors
Which probably was intentional and perfectly legitimate, because of:

Quote:
Originally Posted by Dark Shikari
The Athlon 64's SSE unit is so slow that it's generally worse than MMX. Most operations are done by splitting the instruction in half and sending them off to the MMX unit, making the whole thing a complete waste of time.
And who still claims ICC doesn't support SSE2 for Non-Intel CPU's probably used "/QxSSE2", which indeed won't work for AMD CPU's, instead of "/arch:SSE2"

However the latter is the default in the current ICC version, so I assume such claim would date back to an old outdated version...

See also:
http://software.intel.com/sites/prod...tm#option_arch
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊

Last edited by LoRd_MuldeR; 4th January 2010 at 19:21.
LoRd_MuldeR is offline   Reply With Quote
Old 4th January 2010, 19:18   #451  |  Link
clsid
*****
 
Join Date: Feb 2005
Posts: 5,647
ICL can be patched to skip the GenuineIntel checks so that it also uses SSE/SSE2/etc on AMD CPUs.

I suggest not to worry about compilers. Just make one generic build with your own assembly optimizations for all performance critical parts. This also allows making your own choices about which function implementations get used on which processors.
__________________
MPC-HC 2.2.1

Last edited by clsid; 4th January 2010 at 19:21.
clsid is offline   Reply With Quote
Old 8th January 2010, 13:41   #452  |  Link
schweinsz
Registered User
 
Join Date: Nov 2005
Posts: 497
Could somebody help me the test on single-core old cpu such as PentiumM, athlon64?
http://di-avc.com/testd.7z.
Download the file and uppack it, run the run.bat and feedback the results.

Recently I have finished all yuv scaling and yuv2rgb functions and am working on the bug about the seeking and timestamp.
schweinsz is offline   Reply With Quote
Old 8th January 2010, 14:18   #453  |  Link
the_corona
Registered User
 
Join Date: Mar 2006
Posts: 55
Code:
nothrd.exe   basketball720x576.264
99 frames decoded totally.
3928067 counters used by decoder and 13161 counters used by others.

Decoding speed: 90 fps

sglthrd.exe   basketball720x576.264
99 frames decoded totally.
4263451 counters used by decoder and 14818 counters used by others.

Decoding speed: 83 fps

nothrd.exe   basketball720x576.264
99 frames decoded totally.
3918364 counters used by decoder and 13008 counters used by others.

Decoding speed: 90 fps

sglthrd.exe   basketball720x576.264
99 frames decoded totally.
4124521 counters used by decoder and 12798 counters used by others.

Decoding speed: 85 fps

pause
AMD Sempron 3000+ (Single Core 1800mhz)
the_corona is offline   Reply With Quote
Old 8th January 2010, 16:43   #454  |  Link
ForceX
Registered User
 
Join Date: Oct 2006
Posts: 150
Single thread Athlon64
Code:
nothrd.exe   basketball720x576.264
99 frames decoded totally.
3684944 counters used by decoder and 12321 counters used by others.

Decoding speed: 96 fps

sglthrd.exe   basketball720x576.264
99 frames decoded totally.
3545769 counters used by decoder and 12152 counters used by others.

Decoding speed: 99 fps

nothrd.exe   basketball720x576.264
99 frames decoded totally.
3406643 counters used by decoder and 12143 counters used by others.

Decoding speed: 104 fps

sglthrd.exe   basketball720x576.264
99 frames decoded totally.
3523578 counters used by decoder and 12225 counters used by others.

Decoding speed: 100 fps

pause
ForceX is offline   Reply With Quote
Old 8th January 2010, 17:50   #455  |  Link
clsid
*****
 
Join Date: Feb 2005
Posts: 5,647
The test fails (with Illegal Instruction error) on an old AMD Athlon Thunderbird (1.33 Ghz). That CPU only has MMX and MMXext, but no SSE1.
__________________
MPC-HC 2.2.1
clsid is offline   Reply With Quote
Old 8th January 2010, 18:38   #456  |  Link
schweinsz
Registered User
 
Join Date: Nov 2005
Posts: 497
Quote:
Originally Posted by clsid View Post
The test fails (with Illegal Instruction error) on an old AMD Athlon Thunderbird (1.33 Ghz). That CPU only has MMX and MMXext, but no SSE1.
The DiAVC requires SSE2.
schweinsz is offline   Reply With Quote
Old 8th January 2010, 19:04   #457  |  Link
horvathd
Registered User
 
Join Date: Dec 2009
Posts: 25
Quote:
d:\Letöltések\testd>nothrd.exe basketball720x576.264
99 frames decoded totally.
5409485 counters used by decoder and 18457 counters used by others.

Decoding speed: 65 fps

d:\Letöltések\testd>sglthrd.exe basketball720x576.264
99 frames decoded totally.
5520247 counters used by decoder and 16373 counters used by others.

Decoding speed: 64 fps

d:\Letöltések\testd>nothrd.exe basketball720x576.264
99 frames decoded totally.
5454162 counters used by decoder and 16909 counters used by others.

Decoding speed: 64 fps

d:\Letöltések\testd>sglthrd.exe basketball720x576.264
99 frames decoded totally.
5669338 counters used by decoder and 16629 counters used by others.

Decoding speed: 62 fps
On Intel Celeron M 330
horvathd is offline   Reply With Quote
Old 9th January 2010, 03:07   #458  |  Link
CruNcher
Registered User
 
CruNcher's Avatar
 
Join Date: Apr 2002
Location: Germany
Posts: 4,926
Athlon 64 X2 (Toledo) SSE (1,2,3) 3DNOW(+) MMX(+)

Quote:
C:\testd\testd>nothrd.exe basketball720x576.264
99 frames decoded totally.
3066867 counters used by decoder and 8285 counters used by others.

Decoding speed: 115 fps

C:\testd\testd>sglthrd.exe basketball720x576.264
99 frames decoded totally.
2034767 counters used by decoder and 7709 counters used by others.

Decoding speed: 174 fps

C:\testd\testd>nothrd.exe basketball720x576.264
99 frames decoded totally.
3203470 counters used by decoder and 7944 counters used by others.

Decoding speed: 110 fps

C:\testd\testd>sglthrd.exe basketball720x576.264
99 frames decoded totally.
1995043 counters used by decoder and 7543 counters used by others.

Decoding speed: 177 fps
__________________
all my compares are riddles so please try to decipher them yourselves :)

It is about Time

Join the Revolution NOW before it is to Late !

http://forum.doom9.org/showthread.php?t=168004
CruNcher is offline   Reply With Quote
Old 9th January 2010, 10:23   #459  |  Link
dZeus
Registered User
 
Join Date: Oct 2001
Posts: 33
Code:
nothrd.exe   basketball720x576.264
99 frames decoded totally.
3754152 counters used by decoder and 17488 counters used by others.

Decoding speed: 94 fps

sglthrd.exe   basketball720x576.264
99 frames decoded totally.
3986075 counters used by decoder and 17523 counters used by others.

Decoding speed: 88 fps

nothrd.exe   basketball720x576.264
99 frames decoded totally.
3927613 counters used by decoder and 17756 counters used by others.

Decoding speed: 90 fps

sglthrd.exe   basketball720x576.264
99 frames decoded totally.
3995219 counters used by decoder and 17091 counters used by others.

Decoding speed: 88 fps
Pentium M dothan 2GHz (single core)
dZeus is offline   Reply With Quote
Old 12th January 2010, 19:26   #460  |  Link
schweinsz
Registered User
 
Join Date: Nov 2005
Posts: 497
A new beta version is available.
Changes:
20100112
fix two bugs on timestampt.
Only using the main thread for old intel cpu.

The functions about scaling and yuv2rgb are finished for some days, but I must firstly add a propertypage before I integrate them.

I heard that using the same thread in frame parallel instead of creating new thread for every frame can improve performance, I will try it.

As regarding to the compatibility with the elecard splitter, I found that the elecard splitter sends half-baked nalu to the decoder. Fixing it is easy, but a buffer is needed to buffer the nalues. As the bug influences raw 264 bitstreams only, I don't fix it in this release.
schweinsz is offline   Reply With Quote
Reply

Tags
avc, diavc, fastest decoder, h.264, software

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 14:37.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.