Log in

View Full Version : CoreCodec/H.264 Codec "CoreAVC"


Pages : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 [135] 136 137 138 139 140 141 142 143 144

Stephen R. Savage
7th September 2011, 22:08
Useless ad hominem and fabricated numbers.

I purchased CoreAVC 3.0 because my work requires evaluating decoding software. I am certainly not going to download gigabytes of special-case content (3840x2160? seriously?) to test as I have neither the CPU cycles nor the bandwidth. Moreover, I am only interested in scenarios that have a basis in reality. Your defensiveness betrays the weakness of your claims.

You don't even know what you're talking about, and you expect everyone to take you seriously!

Coreavc is NOT a GPU decoder. It uses CUDA, only to access the built-in video decoder (Purevideo). It works almost like DXVA.

Perhaps I should have put "GPU Decoding" in quotes. Rest assure that I know full well how the "GPU" decoding is implemented. Note that this entire thread was in an uproar when this was announced, as it was certainly not what CoreCodec had been advertising with their "GPU decoding" marketing claim. DXVA is useless because it doesn't do memory copies of frames, and if you just wanted to see a picture, there are dozens of free applications to do that.

And why exactly should that matter?

It's typical usage, you're watching a 10bit encoded video. Surely that's something that quality control should catch... It's not some obscure bug that takes some really weird set of steps to reproduce. From what I'm hearing basically every 10bit video is broken so it shouldn't really be all that hard to reproduce.

If this is true, that puts CoreAVC's already low 10-bit performance under even more scrutiny. Not only is it slow, it's also wrong.

BetaBoy
7th September 2011, 22:25
I purchased CoreAVC 3.0 because my work requires evaluating decoding software.
:eek: I am truly speechless given your analyses and continued ignorance in posting biased opinions rather then well rounded evaluations like most d9's do.

Good or bad we want to hear whats going on so we can address it.... oh yeah we get the OSS argument in fact most of our developers here at CoreCodec are the ones working on libav and x264... there is room for all of us here.

BetaBoy
7th September 2011, 22:44
From what I'm hearing basically every 10bit video is broken so it shouldn't really be all that hard to reproduce.
We are adding more regression videos to cover a wider range of 10bit content... but overflow errors are harder to catch naturally.

End result is each release as it relates to 10bit should be better then the last.... especially the next 2-3 releases as we are focusing in on performance (on top of bugs).

TheRyuu
7th September 2011, 22:48
We are adding more regression videos to cover a wider range of 10bit content... but overflow errors are harder to catch naturally.

End result is each release as it relates to 10bit should be better then the last.... especially the next 2-3 releases as we are focusing in on performance (on top of bugs).

No one thought to just watch a video?

Pretty much any 10bit video will break it (from what I hear) so I don't really understand your first statement.

benus
7th September 2011, 22:59
May I just ask, since there are some voices claiming poor performance of new CoreAVC 3.0

Is there any chance for implementing Cuda support in decoding 9 and 10-bit files to offload any given CPU?

If yes, will it still be a part of 3.x generation so I can safely purchase it now or are we talking about further development.

Please let me know. Thanks.

Stephen R. Savage
7th September 2011, 23:26
:eek: I am truly speechless given your analyses and continued ignorance in posting biased opinions rather then well rounded evaluations like most d9's do.

Good or bad we want to hear whats going on so we can address it.... oh yeah we get the OSS argument in fact most of our developers here at CoreCodec are the ones working on libav and x264... there is room for all of us here.

It's kind of hard to take you seriously when you complain about numbers being biased. It's also unclear what agenda you're trying to accuse me of promoting.

Edit: Actually, you are half right. My analyses are misleading and I should retract them. Just now, I opened a 10-bit file that I didn't use for my benchmark and found that CoreAVC shifted the image right by 16 pixels, with the right-most pixels reappearing on the left. If I had spent more time verifying the functionality of this alpha-quality software, I would not even have bothered to include it.

BetaBoy
7th September 2011, 23:57
It's...
Please reference my post to you:
http://forum.doom9.org/showpost.php?p=1524492&postcount=6678

Stephen R. Savage
8th September 2011, 00:04
Please reference my post to you:
http://forum.doom9.org/showpost.php?p=1524492&postcount=6678

Impersonating moderators is a bannable offense, BetaBoy. This isn't the first time someone has warned you about this behavior.

TheRyuu
8th September 2011, 00:07
Please reference my post to you:
http://forum.doom9.org/showpost.php?p=1524492&postcount=6678

Please forgive us for talking about broken software being broken in a thread specifically about said software.

Are you trying to pass blame to the user?
"Please avoid playing these files since it doesn't work with them."

TheFluff
8th September 2011, 00:14
:eek: I am truly speechless given your analyses and continued ignorance in posting biased opinions rather then well rounded evaluations like most d9's do.

Well, regardless of how much you complain about drama, the fact (or well rounded evaluation, if you prefer) is that until a bugfix release comes out, upgrading from 2.6 to 3.0 is definitely a bad idea. The killer feature most people were waiting for (10-bit decoding) is very broken, and for many testers I've spoken with the decoding performance is slower than in 2.6 (not to mention that in some cases it's actually slower than libavcodec as well).

oh yeah we get the OSS argument in fact most of our developers here at CoreCodec are the ones working on libav and x264... there is room for all of us here.

I'm not sure if you've realized the difference between you and libavcodec. People expect libavcodec to be broken. They expect libavcodec developers to be rude, hard to reach and slow to react to bug reports. That's just how open source is, after all.
However, when the same people encounter you, they tend to have different expectations, because you're a representative of a for-profit corporation. As such, you're expected to have certain QA standards, as well as have a customer support that consists of something more than whining about forum drama or blaming broken software on anime watchers.

pirlouy
8th September 2011, 00:23
Sorry to be such an ass-kisser, but even if you don't like this new version, you don't have to insult developers like that. It's not if there were a spyware... :/

AzraelNewtype
8th September 2011, 00:31
Sorry to be such an ass-kisser, but even if you don't like this new version, you don't have to insult developers like that. It's not if there were a spyware... :/

You're right, it's not spyware. It is, however, a product that they took our money for that does not work.

Oh, and I know BetaBoy has since backpedaled on his weird assertion that the bugs are anime related, but for reference this is not anime (http://i26.lulzimg.com/a864e7.png). If the developer is going to be this defensive of his product when it does not work (it should look like this (http://imgur.com/wcovB.png)), and is in fact shifting blame back onto the paying customers, there is little reason to be polite.

mandarinka
8th September 2011, 00:38
To be honest people sure seem to enjoy that ("paid-for"?) right to complain a bit too much. It could have been expressed less internet-like, imho. True that if CC tested a bit more thoroughly, they woudln't give the crowd the chance, but mistakes happen.

BetaBoy
8th September 2011, 00:46
Oh, and I know BetaBoy has since backpedaled on his weird assertion that the bugs are anime related
??? I was referencing Anime community as being early adopters and reporting the issues before anyone else.

and for many testers I've spoken with the decoding performance is slower than in 2.6 hearsay.. If you can provide us with content that shows this speed difference it would be appreciated.

Please forgive us..
Not at all.... I've said this 100x in this thread that we value everyone's support and awesome feedback. Having someone like Stephan however jump in and continue to post biased FUD brings nothing of value to the discussions.

That being said we are reacting faster the ever before to our customers as the past few releases show and we continue to do so with 3.x (and with the fact we are about to releases CoreMVC 3D as well).

SEt
8th September 2011, 01:06
BetaBoy, if you are started to fix bugs, please correct your wrong mediatype on DXVA2 connections: it should be subtype/compression 'NV12'/'dxva' not the 'NV12'/'NV12'. Sure I can detect it as DXVA2 later but it's still wrong and produce strange effects like the following one:

Set DXVA mode in CoreAVC, allow only YV12 output format. Now connect it to filter that rejects DXVA but accepts NV12 - result is connection with NV12.

PS: After writing for some time for DirectShow you understand that like every single filter has obvious bugs, haha (sad)... For example CyberLink that was in DXVA field forever won't work with you if you follow only standards.

BetaBoy
8th September 2011, 01:09
SEt... thx for the report I posted it for the devs to look into.

TheShadowRunner
8th September 2011, 01:10
A couple extremely simple questions as the 3.0 "what's new" doesn't make it clear ("CUDA" doesn't even appear anywhere!):

1. Is hi10 decoding in CoreAVC 3.0 strictly (=100%) done by the CPU?

2. If so, is it because CUDA acceleration CANNOT technically be implemented for hi10 sources or is it a limitation of the current 3.0 build?

3. If limitation of current build, can we imagine a scenario in the future where hi10 decoding gets hybrid: CPU + partial CUDA offloading?

Thanks for these infos..

BetaBoy
8th September 2011, 01:10
We just fixed the 'high depth bug' re: blockiness, for the next release.

Keiyakusha
8th September 2011, 01:21
If limitation of current build, can we imagine a scenario in the future where hi10 decoding gets hybrid: CPU + partial CUDA offloading?

Maybe a little OT but... do you know any decoders that do "partial CUDA offloading" which makes you expect something like that from CoreAVC? And why you asking only for 10bit but not for 8bit?

TheShadowRunner
8th September 2011, 01:27
Maybe a little OT but... do you know any decoders that do "partial CUDA offloading" which makes you expect something like that from CoreAVC? And why you asking only for 10bit but not for 8bit?

No, i certainly don't, just wondering.
And I don't ask for 8bit, because i know CUDA acceleration takes care of it already :P

Betaboy, please at least answer question 1/2!

A couple extremely simple questions as the 3.0 "what's new" doesn't make it clear ("CUDA" doesn't even appear anywhere!):

1. Is hi10 decoding in CoreAVC 3.0 strictly (=100%) done by the CPU?

2. If so, is it because CUDA acceleration CANNOT technically be implemented for hi10 sources or is it a limitation of the current 3.0 build?
Thanks,

TSR

cyberbeing
8th September 2011, 01:31
and for many testers I've spoken with the decoding performance is slower than in 2.6 hearsay.. If you can provide us with content that shows this speed difference it would be appreciated.

This had me a bit concerned, so I tested a few random 8bit videos. Thankfully I can say that CoreAVC 3.0.0 x86 is consistently about 1% faster than 2.6.1 on my old AMD X2. Nothing to write home about, but a speed-up is a speed-up. So if there is an issue, it doesn't seem to affect old AMD processors.

68.5356 CoreAVC 3.0.0
67.9791 CoreAVC 2.6.1

65.3254 CoreAVC 3.0.0
64.5945 CoreAVC 2.6.1

62.0784 CoreAVC 3.0.0
61.3808 CoreAVC 2.6.1

64.3769 CoreAVC 3.0.0
63.8261 CoreAVC 2.6.1

183.4490 CoreAVC 3.0.0
182.8061 CoreAVC 2.6.1

157.7797 CoreAVC 3.0.0
157.0646 CoreAVC 2.6.1

LoRd_MuldeR
8th September 2011, 01:32
The term "CUDA decoding" or "CUDA offloading" is misleading. Actually I am not aware of any H.264 decoder that is implemented as a CUDA kernel, i.e. running on the GPU. Instead all those "CUDA" enabled/accelerated decoders simply use the hardwired H.264 decoder chip that is integrated on the graphics card. That decoder chip is a dedicated piece of hardware. It may be accessed via the "CUDA Video API" (CUVID), but that is very different from a "real" CUDA kernel. The latter would be running on the programmable GPU (shader cores). Also the decoder chip is the very same one that would be used by a "DXVA" enabled/accelerated decoder. This also means: What can (or can not) be decoded "in hardware" does not depend on the individual decoder software (may it be CoreAVC or something else), but on the decoder chip on the graphics card! If, for example, the decoder chip on the graphics card does not support 10-Bit H.264, then there is nothing that the decoder software could do about that - except for falling back to pure software decoding, of course. Hardware decoder chips can not be updated. If you want new hardware decoder features, then you have to wait for the next generation of graphics cards...

Keiyakusha
8th September 2011, 01:35
And I don't ask for 8bit, because i know CUDA acceleration takes care of it already :P
Actually no... not really. What i mean is, you asked about partial acceleration (for weaker hardware i guess?), which means some real CUDA-based decoder should be created. Current implementation only passes video to the chip, located on the videocard and nvidia does the rest, so its nowhere near any partial stuff.
EDIT: oops i'm so slow... so much of good stuff was written before me...

SEt
8th September 2011, 01:35
If you are interested, I can also confirm DXVA2 cpu usage regression. GraphStudio speed test to Null Renderer, some random 720p anime (fps/cpu):

DXVA2:
86 13.5% CoreAVC 3.0.0
103 1% CoreAVC 2.5.5
103 1% FFDS

CUDA:
87 1.5% CoreAVC 3.0.0 and 2.5.5
102 2% LAV

Software:
1640 CoreAVC 3.0.0
1573 LAV

As you can see from software speed it's a lot of wasted CPU resources.

ajp_anton
8th September 2011, 01:37
But what if this 8-bit hardware decoder in fact does output *something* from a 10-bit clip that is not pure garbage, with enough real data that it's worth patching the bad spots in software.

STaRGaZeR
8th September 2011, 01:38
This thread is always so fun to read, specially after a release. :)

TheShadowRunner
8th September 2011, 01:46
This also means: What can (or can not) be decoded "in hardware" does not depend on the individual decoder software (may it be CoreAVC or something else), but on the decoder chip on the graphics card! If, for example, the decoder chip on the graphics card does not support 10-Bit H.264, then there is nothing that the decoder software could do about that - except for falling back to pure software decoding, of course.

Thank you very much, finally.
So CoreAVC 3.0's hi10 decoding cannot and never will be "CUDA accelerated" with the current nVidia cards in the market.
CoreAVC's 3.0 hi10 decoding is 100% CPU.

Actually no... not really. What i mean is, you asked about partial acceleration (for weaker hardware i guess?), which means some real CUDA-based decoder should be created. Current implementation only passes video to the chip, located on the videocard and nvidia does the rest, so its nowhere near any partial stuff.
EDIT: oops i'm so slow... so much of good stuff was written before me...

Roger that, I didn't know. I though CoreAVC's CUDA implementation just used the horsepower of the chip on the nVidia card but instructed it HOW to decode, which made it different from DXVA.

But what if this 8-bit hardware decoder in fact does output *something* from a 10-bit clip that is not pure garbage, with enough real data that it's worth patching the bad spots in software.

That was pretty much my question too, maybe I formulated it badly ^^;

SEt
8th September 2011, 01:53
Nothing useful can be done with wrong decoding result of h264. But you have good chances to find something in hardware that can do correctly part of algorithm (like, stream decoding, IDCT,...) and use that hardware part with the rest of software. It will be partial hardware acceleration. How much can it help to free cpu? Who knows...

LoRd_MuldeR
8th September 2011, 01:54
But what if this 8-bit hardware decoder in fact does output *something* from a 10-bit clip that is not pure garbage, with enough real data that it's worth patching the bad spots in software.

A hardware decoder chip either supports 10-Bit or it does not. In the latter case it would probably reject 10-Bit streams and not output anything at all. And, if it does not reject the unsupported stream for some reason, the output will simply be "undefined" (or in other words: random garbage). This cannot be "patched" in software, except by throwing away the hardware decoder's broken output and properly decoder the source in software! Also: Even if it was possible to recover the broken output somehow, why should anybody want to use a hardware decoder for something that it never was designed for (and therefore cannot handle correctly), just to spend a whole lot of CPU cycles for "repairing" the broken output afterwards? Sounds like a rather bizarre idea to me...

But you have good chances to find something in hardware that can do correctly part of algorithm (like, stream decoding, IDCT,...) and use that hardware part with the rest of software. It will be partial hardware acceleration. How much can it help to free cpu? Who knows...

Doesn't the whole decoding pipeline have to support 10-Bit precision to properly decode 10-Bit streams?

SEt
8th September 2011, 02:01
I'm not expert in h264 internals, but likely no. Decoder has to use >8 bit internal precision for decoding 8 bit content anyway and it means likely 16 bit precision that could be ok for 10 bit too.

LoRd_MuldeR
8th September 2011, 02:07
Decoder has to use >8 bit internal precision for decoding 8 bit content anyway.

I don't think so. H.264 uses integer math. And when decoding 8-Bit streams the decoder has to truncate everything to exactly 8-Bit internally for correct output.

(It is the other way around: Using more than 8-Bit internal precision is beneficial, even for 8-Bit sources. But it requires a "high bit-depth" capable H.264 encoder + decoder).

TheShadowRunner
8th September 2011, 02:12
BetaBoy, will decoding hi10p contents benefit from the current acceleration methods (CUDA, DXVA..) in CoraAVC 3.0, or will it be 100% software only?
I can imagine DXVA is a no go, but I wonder about CUDA..
Thanks for the info.

Ill comment once 3.0 is out, but you're not far off.

So, I guess I was REALLY far off after all...

SEt
8th September 2011, 02:30
Encoder/decoder be it software or hardware 'truncates everything' as rare as it can, likely only at the very last stage. And 8 bit is not enough for processing 8 bit with something more complex than inverse or average of two values.

BetaBoy
8th September 2011, 02:41
This had me a bit concerned, so I tested a few random 8bit videos. Thankfully I can say that CoreAVC 3.0.0 x86 is consistently about 1% faster than 2.6.1 on my old AMD X2.

You should see some additional 8-bit speed increases soon enough as we had been working on some more (8-bit) optimization's but because of time constraints they did not make it into 3.0.

LoRd_MuldeR
8th September 2011, 02:45
This is getting off-topic, but for proper output all decoders have to truncate all intermediate results in the same way. Otherwise it couldn't be guaranteed that all decoders deliver the identical (correct) output for a given input stream. So when decoding 8-bit streams, all intermediate results are truncated to 8-bit internally. That is also the reason why using an higher internal precision (e.g. 9-bit or 10-bit) gives better encoding efficiency, even if the source was "only" 8-bit: There are fewer rounding errors in the encoding/decoding pipeline. But, as encoder and decoder have to agree on the internal precision (and must truncate accordingly), 9-bit or 10-bit encoding also requires a 9-bit or 10-bit decoder. Moreover 8-bit internal precision mainly is a shortcut for speed improvement. So if the stream was encoded with 8-bit precision, you want to decode with 8-bit precision (and not some higher precision, even if you could), simply because it is faster. That's also the reason why hardware decoders that don't officially support more than 8-bit precision will probably not be prepared for 9-bit or 10-bit precision internally. Why would you waste silicon for something that you don't expose/sell as a feature?

Fadeout
8th September 2011, 03:23
Can we also have DXVA fixed since it was working fine in 2.5 and it's not involved in hi10?

Current problems: high CPU usage and non-smooth performance on some complex videos.

BetaBoy
8th September 2011, 10:34
We just fixed the DXVA memory resources bug.... We are onto the reports of high bitrate Hi10p videos next, but given that it might take some time to work on the assembly code (and address the Haali MP4 bug), we will likely release an update ASAP. CoreAVC 3.0.1 incoming ;-)

eddman
8th September 2011, 10:56
People, how about providing detailed bug reports? Simply saying that it's not working or it's slow for me won't really help. It should be something like this:

1. Describe the problem throughly and provide video sample if possible, if not then at least provide screenshots (if applicable)
2. OS - Win XP SP3
3. CPU - Core 2 Duo E6700
4. GPU - Geforce 9600 GT
5. Video Driver version - 260.89
6. Player - MPC-HC 1.5.0.2827
7. Renderer - VMR-9 windowed
8. Splitter - Haali 1.10.262.12
9. Codec - CoreAVC 2.6.1
10. Output - YV12
11. Acceleration - DXVA
12. Other specific things that might help.

CruNcher
8th September 2011, 11:17
@Dan
there are several bitstreams where the Full (Intel 2nd Generation) Performance (also in terms of Power output) of CoreAVC 3.0 8 bit 64 bit @ least is not that good in direct compare :(

Here is one of those examples (Yv12):

http://img35.imageshack.us/img35/5793/crysis2gb.png

and overall it losses to DiAVC 64 practically all the time (except some rare bitstreams 4k) and sometimes Lav Video (libav) can take over it too (though lav video never makes it to get to DiAVC Performance)

hajj_3
8th September 2011, 12:13
I'm quite suprised that CoreAVC haven't bought out DiAVC to be honest, he said he'd be willing to sell it for $200k and for the sourcecode to be opensourced, i would have thought that coreavc must be making quite a bit of money from commercial usage of their products to buy DiAVC, that would be a seriously fast decoder and there would be no real competition as people don't want to install all the junk divx bundles, they just want a codec to play xvid/x264 videos with low cpu utilisation.

With hd decoding on modern intel/amd/arm processors the uses of coreavc will reduce more and more. Divx could buy DiAVC too, that would be very lucrative for them seeing as thought divx is in so many devices, its a no-brainer for them.

CruNcher
8th September 2011, 12:36
The Decision to buy DiAVC IP would be up to Rovi ;) though Mainconcept/Elecard Engineers (which the core of the whole DivX H.264 Framework is based on + Matroska (CoreCodec)) aren't bad either and one of the oldest MPEG standard based digital video codec implementers in the World (they had 10 bit support much earlier then you might imagine) and most probably will reach that Performance soon (or maybe have reached it already) so not sure if they have a real interest in buying his code also buying his code would be not really efficient, employing him directly would be much better ;)

And to your other question, thats Graphstudio from Radscorpion (RadLight) though you have to know what you do as it's based on Directshow if you want to give it a try http://blog.monogram.sk/janos/tools/monogram-graphstudio/

BetaBoy
8th September 2011, 15:58
We have fixed the Haali Splitter Bug MP4 recognition and it will be included in our next CoreAVC 3.0.1 release.

hajj_3
8th September 2011, 16:01
i presume 2.6.2 will follow that?

betaking
8th September 2011, 16:14
We have fixed the Haali Splitter Bug MP4 recognition and it will be included in our next CoreAVC 3.0.1 release.
Next week or tomorrow?:thanks:

BetaBoy
8th September 2011, 16:16
i presume 2.6.2 will follow that?

Correct however out of respect for past purchasers we might release it before 3.0.1

BetaBoy
8th September 2011, 16:16
Next week or tomorrow?:thanks: Likely later today or tomorrow after we QA.

betaking
8th September 2011, 16:33
Likely later today or tomorrow after we QA.

:thanks:

cyberbeing
8th September 2011, 17:09
Likely later today or tomorrow after we QA.

To avoid huge memory leaks like this in CoreAVC, i would however recommend to free the memory buffers when the input pin is disconnected, and not wait until the filter is destructed. (Which is why it doesn't show up on other filters, they do exactly that)

Did what nevcairiel suggested make it into 3.0.1 to fix the memory leak issue with madVR 0.74? Would doing so break something else or be undesirable for some reason?

And then there is also the 10-bit to P010/P016 conversion being slower than it should be, at least on my AMD X2, to keep in mind.

dead_screem
8th September 2011, 17:37
A couple bugs. Haali will cause Graphstudio video decoder benchmark to crash on the last pass or when I press stop (when the graph is deconstructed I assume). I have to switch to the MPC-HC standalone filters.

When DXVA1 is used CoreAVC is using a weird output 4cc H¾ then two characters that cant render here on the forum, a middle dot and then a left pointing arrow. Instead of "dxva" like MPC-HC. I have no idea if this is really a bug or if it actually affects anything.

BetaBoy
8th September 2011, 17:58
Did what nevcairiel suggested make it into 3.0.1 to fix the memory leak issue with madVR 0.74? Would doing so break something else or be undesirable for some reason?

And then there is also the 10-bit to P010/P016 conversion being slower than it should be, at least on my AMD X2, to keep in mind.
The issue with madvr is that it should release the CoreAVC instance. We can add memory releases but the point is we shouldn't as it has the side effect of causing more bad then good, like:
- Having multiple monitors hang
- Causing multiple tray icons
- Possible setting conflicts

Was madshi doing a fix for this?

ok... on the noted conversion slowdown.