Log in

View Full Version : CoreCodec/H.264 Codec "CoreAVC"


Pages : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 [82] 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144

STaRGaZeR
7th October 2008, 11:41
What GPU was used in that comparison?

Disabled
7th October 2008, 12:23
And please tell us, if youre using the VP2 decoder engine, or if G80 GPUs (or every gpu supporting Cuda) is supported.

madshi
7th October 2008, 12:25
@BetaBoy, those are interesting numbers. I'm wondering: Are you planning to offer a way to combine both CPU + GPU power for decoding? Wouldn't that result in 136.6 + 69.1 = ca. 200 fps? Also: Will image quality be 100% identical with CUDA compared to the software based CoreAVC? Thanks!

Dark Shikari
7th October 2008, 12:53
Dark Shikari... we looked at this more and looks like I was wrong about it being a bug and that lossy and predictive lossless switching _is_ slow atm in CoreAVC and that we will add it to the todo for 2.0.

BTW... do you have a sample(s) we can test?From the Incredibly Obnoxious Conformance Vectors department, I've created a test clip (http://www.mediafire.com/?l5zn5yijg5e). I cannot guarantee it is absolutely correct; in fact, my current ffmpeg patch for predictive lossless fails to correctly decode it (I suspect the issue might be clipping or similar)! CoreAVC appears to work correctly, however.

This has the following:

1. Deblocking
2. CABAC
3. B-frames (pyramidal, with weights)
4. All macroblock types, including 8x8dct and PCM.
5. Random distribution of QPs, with 50% lossless and 50% ranging from 1 to 31.

Have fun :cool:

Cyber-Mav
7th October 2008, 15:39
Define tolerent as I thing if we handle it it should parse just fine..... Also CUDA only has one option atm... that's on or off.... We have not optimized it in this first phase for 1.9.x. but take a look at some numbers;


As you can see CoreAVC scales with the number of cpus, while CUDA is limited by the gfx card. But note that we are not showing it above but the offload from the CPU to the GPU in general makes this more then worth the effort.

ahh for tolerance i ment that videos encoded using high number of b-frames or other options like referencing frames greater than 6 or 7 which tends to usually break dxva acceleration in other hardware based decoders.

is coreavc's implementation of cuda based around the purevideo vpu? or is it going in the direction of utilising the stream processors for custom data processing?

im under the assumption that its the latter since if coreavc were to use the vpu then cuda would not be the best method since it would mean more work in implementing a seperate method for gfx cards that dont support cuda.

but if using cuda for general purpose processing then i can see a lot more control being gained in the decoding stages. either way it shows that hardware acceleration is beneficial....... hold on a second.. (lol im typing as im thinking.)

you mentioned that speed will vary depending on the graphics card. a lot of the newer nvidia cards have the same purevideo engine 2 or so in them, so even a 8400gs would give the same decode acceleration as a 9800gtx.. but if you say speed will depend on the graphics card then it seems like you will be using the latter approach i mentioned above, which is to have the stream processors on a graphics card do the general purpose processing that you require them to do. hence a card with more stream processors will perform better.

Cyber-Mav
7th October 2008, 15:44
@BetaBoy, those are interesting numbers. I'm wondering: Are you planning to offer a way to combine both CPU + GPU power for decoding? Wouldn't that result in 136.6 + 69.1 = ca. 200 fps? Also: Will image quality be 100% identical with CUDA compared to the software based CoreAVC? Thanks!

from my work with cuda there is always some cpu usage going on when doing gpu off-loading. so im going to assume that coreavc on its own could use e.g 70% cpu but with cuda the cpu could use something like 40% and the rest offloaded to the gfx card.
im just guessing here and those percentages are not real, just some made up examples, betaboy will be able to answer that question with more accuracey and detail.

me7
7th October 2008, 18:29
Do mobile nVidia cards support PureVideo (paricular the 8400M GS)?

BetaBoy
7th October 2008, 19:08
From the Incredibly Obnoxious Conformance Vectors department, I've created a test clip (http://www.mediafire.com/?l5zn5yijg5e). I cannot guarantee it is absolutely correct; in fact, my current ffmpeg patch for predictive lossless fails to correctly decode it (I suspect the issue might be clipping or similar)! CoreAVC appears to work correctly, however.

This has the following:

1. Deblocking
2. CABAC
3. B-frames (pyramidal, with weights)
4. All macroblock types, including 8x8dct and PCM.
5. Random distribution of QPs, with 50% lossless and 50% ranging from 1 to 31.

Have fun :cool:

Yummy, a challenge... ;-)

Cyber-Mav
7th October 2008, 19:09
Do mobile nVidia cards support PureVideo (paricular the 8400M GS)?

yes it does.

BetaBoy
7th October 2008, 19:17
What GPU was used in that comparison?

Intel Core 2 Quad Q6600 2.4GHz running a 9600gt with VP2.

Cyber-Mav
8th October 2008, 02:47
im guessing gpu acceleration would be more beneficial to those who use slower single core cpus?

BetaBoy
8th October 2008, 02:53
Well we are far from done.... this is first round work. There are some long term (lower level) CUDA goals for us we are planning for both CoreAVC as well as support in CorePlayer. Lets get this VP2 release out the door first..... but no matter what CUDA does as advertised and offloads CPU cycles, allowing for more CPU intensive AVC features and aggressive bitrates.

lucassp
8th October 2008, 07:13
So, in the end, it's still VP2 support brought to CoreAVC.

BetaBoy
8th October 2008, 07:25
So, in the end, it's still VP2 support brought to CoreAVC.

What? Please elaborate.

Disabled
8th October 2008, 10:29
What he meant was youre only using the VP2 decoder, so only g90+ based GPUs are supported.

lucassp
8th October 2008, 10:30
Well, you said CoreAVC is going to use CUDA but not VP2 because its limitations. And I thought you're going to write a CUDA based decoder for doing things faster. Please correct me if I understood you wrong. And please give us more details on it :)

PS: I'm not a native English speaker and maybe sometimes I understand things worng :)

CiNcH
8th October 2008, 11:56
You got it right. They use the CUDA Video API to access PV2 and not the shaders to offload some complex computation. Still CUDA is far less restrictive than DXVA given the possibility to read back the decoded frames into main memory, but restricted concerning AVC/H.264 levels and profiles of course...

Inventive Software
8th October 2008, 13:22
CUDA supports (or will with a coming driver update) L5.1. Check the DGAVCDecodeNV thread.

ashlar42
8th October 2008, 14:33
@BetaBoy, those are interesting numbers. I'm wondering: Are you planning to offer a way to combine both CPU + GPU power for decoding? Wouldn't that result in 136.6 + 69.1 = ca. 200 fps? Also: Will image quality be 100% identical with CUDA compared to the software based CoreAVC? Thanks!I'm interested too in answers to these questions. Is CoreAVC gonna combine CPU+GPU or you'll have to choose which one to use. I hope for the former, as it would free up resources to do heavy post processing.

BlackSun
8th October 2008, 21:48
We're looking into a way to combine both, but that is doubtful. So possibly you would have to choose. The CPU still handle some operations such as parsing, blitting (color conversion/deinterlacing).

lucassp
9th October 2008, 06:09
The CPU still handle some operations such as parsing, blitting (color conversion/deinterlacing).

Maybe you could do the color conversion on CUDA, and you could also enable the VP2 deinterlacer as an alternative to the VMR Deinterlacer.

squid_80
9th October 2008, 11:12
Using the CPU to do the color-conversion/deinterlacing is better as it can be done while the gpu is decoding more frames. Also VP2 deinterlacer = VMR9 deinterlacer (exact same thing).

CUDA still has limitations on the number of possible reference frames, similar to DXVA... The technical limit is 15 references, due to 15 (ref frames) + 1 (output frame) = 16 which is the maximum number of surfaces allocatable under DirectX. In reality this limit may be lower due to out-of-order frames waiting to be displayed. AFAIK this is not a limitation of the hardware but of DirectX, which CUDA is unfortunately still tied to at the moment.

laserfan
10th October 2008, 21:35
How to I make PowerDVD Ultra from Cyberlink use the CoreAVC codec for h264, instead of its own CL264Dec.ax??? I've checked "Preferred decoder" in CoreAVC config, which didn't work. Next I un-registered CL's codec, and PDVD still doesn't pick-up on the CoreAVC decoder, in fact it won't play video at all.

Is there a way to make PDVD use CoreAVC?

Jay Bee
11th October 2008, 00:13
Is there a way to make PDVD use CoreAVC?

No...

laserfan
11th October 2008, 14:57
Is there a PC-based BD player that WILL work with CoreAVC? I mean, one which recognizes the disc structure and from which I can see & play the menus and extras etc.

I know it works with MPC, and CorePlayer looks like it will play individual files, but how to play a disc "as programmed"?

shon3i
11th October 2008, 15:24
@laserfan, why you need that? Using CoreAVC for BD decoder i think is not good idea, right now because Cyberlink and Arcosft have DXVA(for both ATI and Nvidia, and all streams (MPEG2, AVC, VC-1), and in software mode are fast too, not fast as CoreAVC but fast enough for any Dual Core :)

laserfan
11th October 2008, 16:50
@laserfan, why you need that? Using CoreAVC for BD decoder i think is not good idea, right now because Cyberlink and Arcosft have DXVA(for both ATI and Nvidia, and all streams (MPEG2, AVC, VC-1), and in software mode are fast too, not fast as CoreAVC but fast enough for any Dual Core :)
For my setup I have found that NOT ONLY is CoreAVC faster, but it is also (oddly) much BETTER for playback from a quality POV. I get blockiness & posterization with Cyberlink and ffdshow decoders, and perfection w/CoreAVC (using Media Player Classic).

Would like to achieve same good results with a full-featured disc playback program.

Yes I need to upgrade my P4/AGP videocard platform, but given current economics I can't rationalize it (yet). I can however easily rationalize the <$20 bucks for CoreAVC.

Quark.Fusion
11th October 2008, 21:48
This stream is correctly decodes with ffdshow, but CoreAVC produces some garbage on screen — is that decoder bug or error in stream?

http://www.savefile.com/files/1833602
http://www.fileqube.com/shared/vIRND131162

rahzel
13th October 2008, 06:16
Can someone help me with some settings?
First of all, I have a HTPC connected via HDMI to my LCD TV and i have an ATI 780G HD3200 IGP. MPC HC is my player of choice and i have MPC set so that it uses its internal DXVA filter for DXVA compliant videos, and CoreAVC (using 1.8.0.0) for everything else. I'm using Overlay Mixer Renderer as i see tearing using VMR9 or haali.

1) Should i uncheck all Output formats except for YUY2, or should i leave them all checked?

2) Should I set deinterlacing to Hardware deinterlacing?

3) Is there any negative effects for deblocking, like softening the image, or is this best left on, and if so, which is the best setting? Standard?

4) To play it safe, should i set Output levels to TV (16-235)? I found that previous to the driver i'm using, it output to the incorrect levels at Auto, but it seems ok with the latest video drivers.

TIA.

Jay Bee
13th October 2008, 21:10
2) If you watch interlaced content then HW deint should produce the highest quality. It isn't as good as when using Cyberlink though. I'm hoping that the upcoming adition of NV12 colorspace may change this.

3) Deblocking is only performed if the encoder intended it to be used. Disabling it in this case saves some CPU time but causes ugly artifacts.

BetaBoy
17th October 2008, 13:52
We are about to push the publish button.... here is the v1.8.5 Changelog for CoreAVC Professional and Standard Editions.

CoreAVC H.264 Video Codec - Version 1.8.5.0 (20081017)
- Add: NV12 output
- Add: Option to disable/enable system tray icon
- Add: Filter is registered with preferred priority
- Fix: Proper seeking for streams with one IDR frame
- Fix: Decoder priority adjustable by limited user accounts
- Fix: Fix weighted prediction with MBAFF
- Fix: Options dialog wrong size with large fonts
- Fix: Fixed output format priority saving
- Fix: Improved dynamic reconnection
- Fix: Explicitly reject streams with unsupported resolutions
- Fix: Tray Icon stability
- Fix: Fix Force VMR AR correction function
- Fix: Help tab text highlight bug

Cyber-Mav
17th October 2008, 15:29
whats the benefit of NV12 output compared to the other output methods?

LigH
17th October 2008, 15:39
Some Nvidia graphic cards prefer this FourCC for planar YUV 4:2:0 output (the usual chroma subsampling type used in MPEG 1/2/4 Main profiles), and may not support the FourCC "YV12" correctly instead.

BlackSun
17th October 2008, 17:52
We are releasing the 1.8.5 !

CoreAVC H.264 Video Codec - Version 1.8.5.0 (20081017)
- Add: NV12 output
- Add: Option to disable/enable system tray icon
- Add: Filter is registered with preferred priority
- Fix: Proper seeking for streams with one IDR frame
- Fix: Decoder priority adjustable by limited user accounts
- Fix: Fix weighted prediction with MBAFF
- Fix: Options dialog wrong size with large fonts
- Fix: Fixed output format priority saving
- Fix: Improved dynamic reconnection
- Fix: Explicitly reject streams with unsupported resolutions
- Fix: Tray Icon stability
- Fix: Fix Force VMR AR correction function
- Fix: Help tab text highlight bug

megalith6
17th October 2008, 21:01
hi

warning - newbie - warning :))

notice the deblocker - does this mean i can pass some of my blocky mpegs through CoreCodec and clean them up please ~ i have not seen deblock filters mentioned in other decoder softwares?

thanks

Ric

clsid
17th October 2008, 21:07
CoreAVC only decodes H.264 video.

If you want to deblock your videos, use ffdshow.

bmnot
17th October 2008, 21:09
CUDA support not included this time?

dead_screem
17th October 2008, 21:29
A bug in 1.8.5, Thumbnails no longer extract for avi files... They still extract fine when haali is used for mkv and mp4 for instance. I havn't tried using haali for avi to see if that would fix it, but i'm not doing that anyway.

hajj_3
17th October 2008, 23:37
this fixes my delay when playing 720p .mkv files with x264 video and ac3 audio, used to take 8 secs or so on my fast pc for some .mkv files. HOWEVER, it doesnt fix the delay for .mkv files with aac audio that i create from my mpeg2 soccer broadcasts that i record and archive in x264 + aac. Hope that can be fixed in next version.

My brother's old laptop now uses 10% less cpu when playing 720p tv shows now compared to previous version.

keep up the good work guys!

qyqgpower
18th October 2008, 04:12
The NV12 output of 1.8.5 still presents wrong field order in decoding TFF MBAFF/PAFF when connected to Enhanced Video Renderer and VMR9.
sample clips:
http://www.mediafire.com/file/zwm5nndnmig/MBAFF-TFF.mkv (may have seeking issue since it is cut from my encode, but it's enough to reproduce the field order issue)
http://www.mediafire.com/file/nnjdjmxddkm/PAFF-TFF-Sample.mp4

More strangely, while ffdshow-rev2099_20080903_clsid_sse_icl10 produces correct field order, like CyberLink H.264/AVC Decoder.
ffdshow-rev2202_20081010_clsid_sse_icl10 and ffdshow-rev2210_20081012_clsid produces inverted field order, just like the behavior of CoreAVC.

squid_80
18th October 2008, 04:24
@squid_80
If VMR9 is a complete gamble at field order, why other decoders could work with it (and EVR) correctly.(Emphasis added by me to make my own point stand out.)
More strangely, while ffdshow-rev2099_20080903_clsid_sse_icl10 produces correct field order, like CyberLink H.264/AVC Decoder.
ffdshow-rev2202_20081010_clsid_sse_icl10 and ffdshow-rev2210_20081012_clsid produces inverted field order, just like the behavior of CoreAVC.:rolleyes:
Why is this a CoreAVC problem when other codecs randomly show the same issue? Wouldn't that indicate the renderer is at fault?

qyqgpower
18th October 2008, 05:02
Not RANDOMLY indeed, because
CyberLink H.264/AVC Decoder(PDVD8)
Mainconcept AVC/H.264 Video Decoder
ArcSoft Video Decoder
ffdshow-rev2099_20080903_clsid_sse_icl10

always do the correct communication between renderer and decoder.

Recent ffdshow build can only be considered as degradation.

Jay Bee
18th October 2008, 10:08
2) If you watch interlaced content then HW deint should produce the highest quality. It isn't as good as when using Cyberlink though. I'm hoping that the upcoming adition of NV12 colorspace may change this.


Nice. The new NV12 output did indeed fix some HW deinterlacing quality problems. Screens attached. Have a look at the sawteeth on the white lines.

YUY2:
http://img186.imageshack.us/img186/8255/yuy2km8.jpg

NV12:
http://img186.imageshack.us/img186/1352/nv12hk1.jpg


One problem that still hasn't gone away though are small stutters while watching true interlaced content (football) live with DVBViewer. Is there any way I can help troubleshoot/debug this?


VMR9 is a complete gamble at field order

It's not random, it worked fine in CoreAVC 1.5. The random thing your'e talking about is a seperate issue that has been fixed by this Microsoft hotfix: http://support.microsoft.com/kb/919071

CiNcH
18th October 2008, 11:17
Think I am slowly running out of mail addresses for the trials +g+

Tried 1.8.5 and am still experiencing high jitter and sync-offset with 1080i and hardware deinterlacing, now also with NV12 (and vector-adaptive deinterlacing). All other deinterlacing methods do not suffer from that problem.

BetaBoy
18th October 2008, 17:10
Based on the early feedback (thx everyone) we are probably going to do a 1.8.6 release after the weekend.... stay tuned.

STaRGaZeR
18th October 2008, 19:20
It seems that when using NV12 and hardware deinterlacing (ATI) the field order is reversed. It's exactly the same thing you see when you use MPC's internal DXVA decoder with interlaced content, but obviously this is not a DXVA problem. Can anybody confirm if this is wrong field order or if it's something else?

Also, when NV12 is selected as the only output colorspace, Single field or Bob deinterlacing don't work.

All this is playing interlaced Blu-ray content muxed into Matroska and splitted with Haali. ffdshow doesn't have this problem, maybe because when NV12 output is selected it uses "NV21,VU" instead.

BTW thanks for the "Use tray icon" button ;)

BetaBoy
18th October 2008, 23:08
CUDA support not included this time?

CUDA is for our 1.9.x releases that then leads to 2.0. It also allows us to work with the NVIDIA engineers now to address some of the issues that we are finding (like Donald here @ D9) before we go public with a release.

squid_80
19th October 2008, 01:42
It seems that when using NV12 and hardware deinterlacing (ATI) the field order is reversed. It's exactly the same thing you see when you use MPC's internal DXVA decoder with interlaced content, but obviously this is not a DXVA problem. Can anybody confirm if this is wrong field order or if it's something else?Does it still happen with VMR7 or Haali's renderer? If not, it's probably the same old VMR9 issue as above.
Also, when NV12 is selected as the only output colorspace, Single field or Bob deinterlacing don't work.The internal deinterlacing methods don't support target formats that use vertical subsampling (NV12, YV12 and I420). The same goes for hardware deinterlacing on most graphics cards, except that most cards now days do support NV12.

STaRGaZeR
19th October 2008, 03:00
Does it still happen with VMR7 or Haali's renderer? If not, it's probably the same old VMR9 issue as above.



Interesting. Results with NV12+hardware deinterlacing, Aero enabled:

System Default, Old Renderer, VMR7 (windowed), VMR7 (Renderless) and Haali's Renderer result in no deinterlacing whatsoever.

VMR9 (windowed), VMR9 (Renderless), EVR and EVR Custom have the field order reversed.

Overlay Mixer N/A.

Let me say again that ffdshow have no problems with any of them except System Default, Old Renderer and VMR7 (windowed), resulting in no deinterlacing.



Results with NV12+hardware deinterlacing, Aero disabled:

System Default, Old Renderer, Overlay Mixer, VMR7 (windowed), VMR7 (Renderless) and Haali's Renderer work perfect.

VMR9 (windowed), VMR9 (Renderless), EVR and EVR Custom have the field order reversed.

And here ffdshow have no problems with any of them except Old Renderer and VMR7 (renderless) resulting in no deinterlacing.


The internal deinterlacing methods don't support target formats that use vertical subsampling (NV12, YV12 and I420). The same goes for hardware deinterlacing on most graphics cards, except that most cards now days do support NV12.

I've tested every colorspace and with EVR/EVR Custom the only one that will do something to deinterlace, apply any of the ATI's postprocessing options (Edge-enhancement and such) or allow DXVA is NV12. Also, NV12 is so far the only format that use vertical subsampling supported by EVR/EVR Custom. Can't talk about NV here.

dead_screem
19th October 2008, 04:13
A small feature request for the next version, before you rewrote the properties dialog/direct show stuff the properties dialog would show which Input 4cc and which output colorspace is active. Can this be readded?