View Full Version : CoreCodec/H.264 Codec "CoreAVC"
Dark Shikari
22nd December 2009, 00:39
already coreavc 1.9.5 is the fastest decoder on the dual core opteron cpu i have. but the new coreavc 2.0 has left me well and truly disapointed.That's not bad at all, though I suspect there may be some small improvements to be had.
Do remember: all the new assembly optimizations were SSE2 through SSSE3, which don't exist or are near-useless on Athlon 64s. Additionally, the weightp fix cost some speed.
I'm actually surprised it's still marginally faster.
the_corona
22nd December 2009, 00:50
AMD opteron users have seen the great decrease in overall speed with CoreAVC 2.0 from the reports so far.
the_corona... take your continued attacks/flames elsewhere, its really getting old.
My question was regarding how you measure the speed of CoreAVC if timecodec cannot be trusted and can thus declare it the "world's fastest H264 Codec".
How is that flaming/attacking you? Do you not have an answer?
Cyber-Mav
22nd December 2009, 00:57
My question was regarding how you measure the speed of CoreAVC if timecodec cannot be trusted and can thus declare it the "world's fastest H264 Codec".
How is that flaming/attacking you? Do you not have an answer?
timecodec does the job, i use longer video clips to try and get more accurate/consistant results.
Cyber-Mav
22nd December 2009, 00:59
That's not bad at all, though I suspect there may be some small improvements to be had.
Do remember: all the new assembly optimizations were SSE2 through SSSE3, which don't exist or are near-useless on Athlon 64s. Additionally, the weightp fix cost some speed.
I'm actually surprised it's still marginally faster.
my opteron supports sse2 and sse3, supplimental sse3 is not supported on me opteron 170.
seeing as how the q9650 cpu shows bigger speed difference with coreavc 2.0 it supports your statement that ssse3 and sse4 optimisations are what boost speed.
Dark Shikari
22nd December 2009, 01:01
my opteron supports sse2 and sse3, supplimental sse3 is not supported on me opteron 170."Support" doesn't mean it's useful. The Athlon 64's SSE unit is so slow that it's generally worse than MMX. Most operations are done by splitting the instruction in half and sending them off to the MMX unit, making the whole thing a complete waste of time.seeing as how the q9650 cpu shows bigger speed difference with coreavc 2.0 it supports your statement that ssse3 and sse4 optimisations are what boost speed.Not at all. I tested myself; the SSSE3 optimizations gave relatively little compared to the benefit of SSE2 over MMX on a Core 2.
Cyber-Mav
22nd December 2009, 01:06
odd, how come my laptop, the pentium M740 cpu that has sse2 saw now speed increase, if anything it was a 1fps decrease over coreavc 1.9.5?
Dark Shikari
22nd December 2009, 01:10
odd, how come my laptop, the pentium M740 cpu that has sse2 saw now speed increase, if anything it was a 1fps decrease over coreavc 1.9.5?The Pentium-M has the worst SSE unit ever created, far worse than the Athlon 64. It is so bad that in x264, we simply had the CPU detection routines pretend that it didn't even support it at all.
In the next CoreAVC version, we should probably make an equivalent to the x264 routines for this.
Cyber-Mav
22nd December 2009, 01:15
so in essence coreavc is just optimised for the latest intel cpus, which are the processors that least need speed increases. i would have thought that the focus would have been to get more performance out of older hardware, and energy efficient hardware such as the intel atom cpu. odd.
Dark Shikari
22nd December 2009, 01:24
so in essence coreavc is just optimised for the latest intel cpus, which are the processors that least need speed increases. i would have thought that the focus would have been to get more performance out of older hardware, and energy efficient hardware such as the intel atom cpu. odd.Not true either. CoreAVC currently follows the more naive strategy of simply loading whatever functions the CPU supports. It would benefit, as I mentioned, from a more complicated strategy like x264's.
The Atom supports all the way through SSSE3, so that point is moot. The assembly is not fully optimized for Atom though; there is currently work on making it better, though note coding for Atom is much more difficult than any out-of-order CPU.
lnatan25
22nd December 2009, 01:26
so in essence coreavc is just optimised for the latest intel cpus, which are the processors that least need speed increases. i would have thought that the focus would have been to get more performance out of older hardware, and energy efficient hardware such as the intel atom cpu. odd.
Shush, you are talking "off-topic", and might hurt CoreAVC sales!!! :rolleyes:
Cyber-Mav
22nd December 2009, 01:27
ahh thats understandable, do you believe it is possible to extract any more decoding speed from cpu's like athlon64/ athlonxp / opteron / pentium-m / pentium 4 ? or do you believe coreavc has done all it can for those old timers?
Dark Shikari
22nd December 2009, 01:33
ahh thats understandable, do you believe it is possible to extract any more decoding speed from cpu's like athlon64/ athlonxp / opteron / pentium-m / pentium 4 ? or do you believe coreavc has done all it can for those old timers?Do note there is still some assembly code that CoreAVC is missing: there is no SSE version of the 8x8 idct or the deblocking code. This will open the door for significant speed boosts with all SSE-supporting CPUs except for the P-M/Core1 (x264's numbers show that these functions are faster than MMX on Athlon 64). All the notes below are about existing asm.
Pentium-M/Core 1 would benefit significantly from simply turning off all the SSE code.
Pentium 4 probably can't gain any more; in theory some of the assembly code could be munged all over the place to give slight benefits due to the Pentium 4's retardation, but it's not worth bothering and the benefit would be very slight.
Athlon 64 would benefit a bit from disabling whatever SSE code is slower on A64. Maybe a few percent.
Atom could benefit from both more manual asm reordering, and a special Atom version of the motion compensation functions. Specifically, SSSE3 motion compensation uses two tricks to improve performance: pmaddubsw and palignr. The former hurts a lot on Atom, but the latter helps a lot. Making a version that only used the latter trick would improve performance there.
Of course, if CoreCodec implemented all of the above ideas, the release would be 6 months later and everyone would be complaining ;)
Cyber-Mav
22nd December 2009, 01:39
how come they didnt turn off the sse code for pentium-m if it would benefit speed? or is this something you have pointed out just recently to them shikari?
Dark Shikari
22nd December 2009, 01:41
how come they didnt turn off the sse code for pentium-m if it would benefit speed? or is this something you have pointed out just recently to them shikari?I didn't get around to it. CoreAVC 2 was released during my school's final exam week. Blame me. ;)
Cyber-Mav
22nd December 2009, 01:45
will coreavc remove sse support for pentium-m in the future? or is it something we will have to wait and see.
Disabled
22nd December 2009, 02:08
What exactly is your relationship with CoreAVC Dark Shikari? How long will you stay on the team and what ASM optimizations will you implement in that time? (Ie will you implement everything you just told us was possible?)
khat17
22nd December 2009, 02:27
I wanna know how long before ATI will be supported. My wife's machine has my old 8800GTS, and CoreAVC 2.0 looks nice with the whole CUDA support - so I can use that if I need to test - but when will red team get some support?
And could you give an idea of what's the lowest specs that you think CoreAVC would run on and still give decent results? How about say...........a DURON 1.6 with some board and onboard graphics?
Dark Shikari
22nd December 2009, 03:13
What exactly is your relationship with CoreAVC Dark Shikari? How long will you stay on the team and what ASM optimizations will you implement in that time? (Ie will you implement everything you just told us was possible?)I periodically come in and work on things when I have time; I'm not really responsible for much of anything with a hard deadline. I'll try to get that done sometime this winter break though.
Chumbo
22nd December 2009, 03:52
CoreCodec is proud to present ...More info here: http://corecodec.com/products/coreavc .
CoreAVC™ 2.0 Professional Edition Decoder
* Supports Windows 7
* 32/64 bit Support
* NVIDIA CUDA GPU support
* Supports up to 16 CPU Cores
* QuadHD Resolution Support
* Uses Directshow for MKV
* Includes the Haali Media Splitter
* Full Interlaced support
...
Would love to see ATI Stream support soon please. Thank you. I'm anxiously waiting for my upgrade email. :)
Mixer73
22nd December 2009, 04:45
I wanna know how long before ATI will be supported. My wife's machine has my old 8800GTS, and CoreAVC 2.0 looks nice with the whole CUDA support - so I can use that if I need to test - but when will red team get some support?
Would love to see ATI Stream support soon please. Thank you. I'm anxiously waiting for my upgrade email. :)
The answer to your question is perhaps never. Read this post for more information:
http://forum.doom9.org/showthread.php?p=1347303&highlight=ati#post1347303
dimitrik
22nd December 2009, 10:26
"Support" doesn't mean it's useful. The Athlon 64's SSE unit is so slow that it's generally worse than MMX. Most operations are done by splitting the instruction in half and sending them off to the MMX unit, making the whole thing a complete waste of time.Not at all. I tested myself; the SSSE3 optimizations gave relatively little compared to the benefit of SSE2 over MMX on a Core 2.
Pardon my ignorant question but what about SSE4?
I read recently that Intel's &AMD's SSE implementations were similar until SSE4, were they branched off to SSE4 (Intel) & SS4a (AMD). It was my understanding the differences were such that it was necessary to optimize for each one separately.
I recently tested some x264 decoders (CoreAVC 1.95, DivX, ffmpeg-mt) on an Intel Quad @ 2.26, 2.33, 2.5 & 3GHz and an AMD Phenom-II 945@3GHz and noted that the AMD@3GHz was slower than the Intel at all speeds above 2.33GHz.
My test were not scientific but I wonder if this could be due to different SSE4 versions?
roozhou
22nd December 2009, 10:56
Pardon my ignorant question but what about SSE4?
I read recently that Intel's &AMD's SSE implementations were similar until SSE4, were they branched off to SSE4 (Intel) & SS4a (AMD). It was my understanding the differences were such that it was necessary to optimize for each one separately.
I recently tested some x264 decoders (CoreAVC 1.95, DivX, ffmpeg-mt) on an Intel Quad @ 2.26, 2.33, 2.5 & 3GHz and an AMD Phenom-II 945@3GHz and noted that the AMD@3GHz was slower than the Intel at all speeds above 2.33GHz.
My test were not scientific but I wonder if this could be due to different SSE4 versions?
More instruction set does not mean faster decoding. A good example is Intel Atom.
IMO speed boost brought by SSE4 is trivial. The performance of SSE2 instructions has a noticeable impact on decoders.
dimitrik
22nd December 2009, 11:44
More instruction set does not mean faster decoding. A good example is Intel Atom.
IMO speed boost brought by SSE4 is trivial. The performance of SSE2 instructions has a noticeable impact on decoders.
So the difference I noted was due to different SSE2 units? That's very interesting, thanks.
BetaBoy
22nd December 2009, 15:39
I know it was mentioned in this thread.... but I wanted to clarify it further.
If you want to disable both CoreAVC and Haali's splitter for AVC content and using Directshow in favor of MediaFoundation in Windows 7 (not that we want you to do that ;-), You will need to deactivate use of the "CCV1" mediatype that they both share.
To do this: Open Haali's Media Splitter properties: Go to Options-> Output-> Use custom media type for H.264, and set it to "No".
This then reverts WMP/MCE to use MF for AVC. To re-enable it, simply set it back to 'yes'.
I have just added this to our KB as well.
Chumbo
22nd December 2009, 21:58
The answer to your question is perhaps never. Read this post for more information:
http://forum.doom9.org/showthread.php?p=1347303&highlight=ati#post1347303
Thanks for the info Mixer73. That really blows chunks. ATI needs to get their heads straight, but that's been said for a long time now unfortunately. They make a great product but shouldn't "lock it down" like that. Sigh...
BetaBoy
22nd December 2009, 23:05
All, on a side note.... I am looking for a small group of ppl to test our upcoming CoreAAC and CoreASP directshow filters (CoreMVC is not till later next year). Email: betagroup@corecodec.com with your system specs, including specifics on the video and audio cards you are using. We will email acceptances in January. Thank you ahead of time!
Tom-Cat
23rd December 2009, 06:46
Hi.
I have tested quite a few mkv's with the new 2.0 version and have found that some have problems with 2.0 that played perfectly fine on 1.9.5. These have problems only when CUDA is enabled, they play fine on 2.0 without CUDA (by "fine" I mean there aren't any artifacts, but the system needs HW accel for smooth playback)! And even when they have problems they are pretty much random, some parts of the movie would play perfectly fine one time, but have huge artifacts the next time.
System: Asrock ION 330 (Nvidia ION, 2Gb memory, Atom 330 @ 1.6Ghz, Windows 7 32bit, MPC-HC - also tested with Mplayer2 and BSPlay).
MKV Files (88 Mb zip) : http://pc.sux.org/tomcat/mkv.zip
ALL Screenshots: http://pc.sux.org/tomcat/jpg.zip
Just two smaller ones as attachments.
squid_80
23rd December 2009, 07:12
I tried playing the clips multiple times with CUDA active and couldn't get any artifacts to happen. Which driver version do you have installed?
Tom-Cat
23rd December 2009, 07:34
I tried playing the clips multiple times with CUDA active and couldn't get any artifacts to happen. Which driver version do you have installed?
Unless they have been "silently" installed in the background by Microsoft Update, then they are about 2 months old or there about. Will check in 10 hours or so.
Thanx for testing them !
Grmpf
23rd December 2009, 08:42
Hello Betaboy,
i mailed you at the address you gave a few pages back on monday for the 2.0 upgrade notification, but its still missing (i have a coreavc licence from the beginning...). I would love to buy 2.0 with all the discounts while the x-mas offer is still running, but without the new password i can not log into your new site...
THX-UltraII
23rd December 2009, 11:28
Just tried the new Haali Renderer with a .m2ts file. Only the secondary audio is played (directors comment) and the first audio cannot be selected when I right-click in MPC-HC and look at the filter.
What am I doing wrong here?
JeffBDVS
23rd December 2009, 13:48
CoreAVC 2.0 (CUDA or not) crashes VirtualDub when the source video is an AviSynth script that uses the MT filter for multithreading. 1.9.5 did not.
Vista 64
GTX 280 with the 191.07 drivers
-Jeff
Tom-Cat
23rd December 2009, 15:54
I tried playing the clips multiple times with CUDA active and couldn't get any artifacts to happen. Which driver version do you have installed?
Thank you very much! I had 190.xx and after installing the newest driver (195.xx) it works without any problems. :)
BetaBoy
23rd December 2009, 17:26
Hello Betaboy,
i mailed you at the address you gave a few pages back on monday for the 2.0 upgrade notification, but its still missing (i have a coreavc licence from the beginning...). I would love to buy 2.0 with all the discounts while the x-mas offer is still running, but without the new password i can not log into your new site...
Send me a PM and I'll take care of it for you.
JeffBDVS
23rd December 2009, 20:35
Updated to the 195 drivers and the crash is still an issue with AviSynth and MT.
Disabling MT had some interesting results. I have dual quad-core Xeons. I used CoreAVC to decode an HD MTS video clip for downscaling to SD. Lagarith was used as the output codec from VirtualDub. It turns out that CoreAVC runs faster (about 29 fps) without CUDA enabled. Using CUDA maxed out at about 24 fps during the conversion. Is this a reasonable result?
-Jeff
Blue_MiSfit
23rd December 2009, 20:45
Yep! CUDA is going to always be capped to a reasonable speed, since the ASIC on the video card is handling it, and can only process so fast.
~MiSfit
Asmodian
23rd December 2009, 20:52
Updated to the 195 drivers and the crash is still an issue with AviSynth and MT.
Disabling MT had some interesting results. I have dual quad-core Xeons. I used CoreAVC to decode an HD MTS video clip for downscaling to SD. Lagarith was used as the output codec from VirtualDub. It turns out that CoreAVC runs faster (about 29 fps) without CUDA enabled. Using CUDA maxed out at about 24 fps during the conversion. Is this a reasonable result?
-Jeff
That sounds reasonable to me, CUDA uses a chip on the video card to decode the video and with that fast of a computer the CPU decode could be faster. However I can get >45fps using the DGNV tools (which uses the same chip on the video card) doing blu-ray to 480p with any preset fast to slower for x264 but I was doing the resize on the video card in that case.
What is your CPU usage during the encode? Maybe there is a bandwidth issue to/from the video card on the dual Xeon motherboard (I assume it isn't optimized for graphics the way a single socket consumer board is?). i7 or core2 Xeons and is it a NUMA aware system?
JeffBDVS
23rd December 2009, 21:25
I can get >45fps using the DGNV tools (which uses the same chip on the video card) doing blu-ray to 480p with any preset fast to slower for x264 but I was doing the resize on the video card in that case.
I tried DGNV tools b7, but it maxed out at 24 fps also. How did you set up the resize on the card? Is it an option when using x264?
What is your CPU usage during the encode? Maybe there is a bandwidth issue to/from the video card on the dual Xeon motherboard
35% to 40%, which I interpret as not quite maxing out 4 cores out of 8.
-Jeff
Asmodian
23rd December 2009, 21:37
On card resize is a option when loading in avisynth (h_resize, v_resize - I think).
It would be interesting to see the %CPU usage per-core. (crtl-alt-esc can show this for windows)
JeffBDVS
23rd December 2009, 22:14
Here's a screenshot of the Performance graphs near the end of the operation.
CPU Usage (http://bellunevideo.com/client1/CPU_Use.jpg)
-Jeff
potatochobit
24th December 2009, 05:28
hi guys
I have a quick question, didn't see anyone talking about it
but is coreavc 2.0 worth upgrading to?
I have 1.9.5 but i had to disable it in windows 7 due to the corruption issue with newer h264
I did get the email that I could upgrade for 5$, but does the new coreavc performance worth having at all over CCCp and MT enabled?
Dark Shikari
24th December 2009, 07:15
hi guys
I have a quick question, didn't see anyone talking about it
but is coreavc 2.0 worth upgrading to?
I have 1.9.5 but i had to disable it in windows 7 due to the corruption issue with newer h264
I did get the email that I could upgrade for 5$, but does the new coreavc performance worth having at all over CCCp and MT enabled?Is it fast enough for you to play the videos you want? Then CCCP+MT is fine. Is it not fast enough? Then CCCP+MT probably isn't fine ;)
THX-UltraII
24th December 2009, 09:10
Just tried the new Haali Renderer with a .m2ts file. Only the secondary audio is played (directors comment) and the first audio cannot be selected when I right-click in MPC-HC and look at the filter.
What am I doing wrong here?
talen9
24th December 2009, 10:48
I suppose you tried Haali's Splitter, not the renderer, else you wouldn't report the "problem" with the audio streams ... which, btw, is NOT a problem :)
Actually, you're not using MPC-HC correctly: when using an external splitter, you cannot select the audio stream anymore from the "Audio" submenu: you have to use the "Navigate" menu, from there you can select audio/subtitle stream and jump to a different chapter, if the feature is present in the file you're playing.
THX-UltraII
24th December 2009, 12:49
I suppose you tried Haali's Splitter, not the renderer, else you wouldn't report the "problem" with the audio streams ... which, btw, is NOT a problem :)
Actually, you're not using MPC-HC correctly: when using an external splitter, you cannot select the audio stream anymore from the "Audio" submenu: you have to use the "Navigate" menu, from there you can select audio/subtitle stream and jump to a different chapter, if the feature is present in the file you're playing.
I meant the Splitter of course, sorry.
Ill try your tip tonight. Is the Haali Splitter preferable above the internal MPEG PS/TS/PVA filter? If so, why?
talen9
24th December 2009, 17:09
Once Haali's one was less CPU-intensive if compared with MPC-HC's internal one; i don't know if this is still true, though.
dZeus
24th December 2009, 17:53
Do note there is still some assembly code that CoreAVC is missing: there is no SSE version of the 8x8 idct or the deblocking code. This will open the door for significant speed boosts with all SSE-supporting CPUs except for the P-M/Core1 (x264's numbers show that these functions are faster than MMX on Athlon 64). All the notes below are about existing asm.
-cut-
ok, so that might explain the results I get when I compare CoreAVC to DivX on my Pentium 3. CoreAVC shows a far greater difference in speed when enabling deblocking compared to the DivX AVC decoder (see the large speed increase when going from CoreAVC deblock to no deblock).
Other than the fact that CoreAVC 2.0 seems to be slightly slower than v1.8.5 on a P3, I wonder why there is such a difference in speed between rendering to 'null' or to 'overlay' with CoreAVC, where DivX doesn't seem to give much difference (CoreAVC performs about 10% less on Overlay/Null compared to DivX AVC).
Why would there be a difference at all between different decoders when they're all set to output the same colourspace to the renderer (in this case YUY2)?
Nevertheless, on a P3 CoreAVC 2.0 still is faster than DivX AVC decoder, though it's a bit slower than v1.8.5.
blubberbirne
25th December 2009, 23:01
New Haali sucks here like hell.
F:\test.mkv::Video
Media Type 0:
--------------------------
Video: CCV1 1920x1080
AM_MEDIA_TYPE:
majortype: MEDIATYPE_Video {73646976-0000-0010-8000-00AA00389B71}
subtype: Unknown GUID Name {31564343-0000-0010-8000-00AA00389B71}
formattype: FORMAT_MPEG2_VIDEO {E06D80E3-DB46-11CF-B4D1-00805F6CBBEA}
bFixedSizeSamples: 0
bTemporalCompression: 0
lSampleSize: 1
cbFormat: 158
Video: CCV1 1920x1080 <- what the hell, its AVC and not CCV1.
If i switch so internal MKV Splitter in MPC-HD all works finel.
MediaInfo says:
General
Complete name : F:\test_.mkv
Format : Matroska
File size : 300 MiB
Duration : 3mn 42s
Overall bit rate : 11.3 Mbps
Encoded date : UTC 2009-12-25 21:58:14
Writing application : mkvmerge v3.0.0 ('Hang up your Hang-Ups') gebaut am Dec 12 2009 15:20:35
Writing library : libebml v0.7.9 + libmatroska v0.8.1
Video
ID : 1
Format : AVC
Format/Info : Advanced Video Codec
Format profile : High@L4.0
Format settings, CABAC : Yes
Format settings, ReFrames : 4 frames
Muxing mode : Container profile=Unknown@5.1
Codec ID : V_MPEG4/ISO/AVC
Duration : 3mn 42s
Bit rate : 10.7 Mbps
Width : 1 920 pixels
Height : 1 080 pixels
Display aspect ratio : 16:9
Frame rate : 25.000 fps
Standard : Component
Resolution : 24 bits
Colorimetry : 4:2:0
Scan type : Interlaced
Bits/(Pixel*Frame) : 0.206
Stream size : 284 MiB (95%)
Title : PID 4113
Language : German
Color primaries : BT.709-5, BT.1361, IEC 61966-2-4, SMPTE RP177
Transfer characteristics : BT.709-5, BT.1361
Matrix coefficients : BT.709-5, BT.1361, IEC 61966-2-4 709, SMPTE RP177
Audio
ID : 2
Format : AC-3
Format/Info : Audio Coding 3
Codec ID : A_AC3
Duration : 3mn 42s
Bit rate mode : Constant
Bit rate : 384 Kbps
Channel(s) : 2 channels
Channel positions : L R
Sampling rate : 48.0 KHz
Video delay : 15ms
Stream size : 10.2 MiB (3%)
Title : PID 4352
Language : German
LoRd_MuldeR
25th December 2009, 23:05
New Haali sucks here like hell.
F:\test.mkv::Video
Media Type 0:
--------------------------
Video: CCV1 1920x1080
AM_MEDIA_TYPE:
majortype: MEDIATYPE_Video {73646976-0000-0010-8000-00AA00389B71}
subtype: Unknown GUID Name {31564343-0000-0010-8000-00AA00389B71}
formattype: FORMAT_MPEG2_VIDEO {E06D80E3-DB46-11CF-B4D1-00805F6CBBEA}
bFixedSizeSamples: 0
bTemporalCompression: 0
lSampleSize: 1
cbFormat: 158
Video: CCV1 1920x1080 <- what the hell, its AVC and not CCV1.
I think that's their "hack" to avoid the Microsoft decoders and allow CoreAVC 2.0 to be used inside WMP/WMC ;)
And you can turn it off in Haali's options :p
BetaBoy
25th December 2009, 23:09
blubberbirne.... How about reading this thread instead of posting in a manner that does nothing nothing to help you.
Now... Go to Haali's settings and select 'no' for custom mediatype.
vBulletin® v3.8.11, Copyright ©2000-2025, vBulletin Solutions Inc.