Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Video Encoding > MPEG-4 AVC / H.264

Reply
 
Thread Tools Search this Thread Display Modes
Old 14th December 2008, 15:00   #41  |  Link
Guest
Guest
 
Join Date: Jan 2002
Posts: 21,924
Let's draw the line right here and confine further discussion to the technical evaluation of the decoder.
Guest is offline   Reply With Quote
Old 18th December 2008, 14:11   #42  |  Link
schweinsz
Registered User
 
Join Date: Nov 2005
Posts: 496
The Direct_8x8_inference_flag = 0 has been finished.
I will finish the weighted_bipred_idc in recent days.
After this, I will work about the multi-thread decoding and the DiAVC will work much faster than it is now.
schweinsz is offline   Reply With Quote
Old 24th December 2008, 20:52   #43  |  Link
schweinsz
Registered User
 
Join Date: Nov 2005
Posts: 496
weighted_bipred_idc = 0, 1, 2 is supported now. I will work on the improvement of the CABAC and multi-thread decoding.
schweinsz is offline   Reply With Quote
Old 25th December 2008, 16:48   #44  |  Link
popper
Registered User
 
Join Date: Mar 2006
Posts: 272
Quote:
Originally Posted by schweinsz View Post
I am happy to anounce that the DiAVC, another high profile decoder is released.

http://sites.google.com/site/diavcdecoder/

The supported features:
I Slice, P Slice, B Slice
Custom Quantization Matrix
CAVLC and CABAC
Deblocking Filter
Multi-reference frames
Direct_8x8_inference_flag = 1 or 0
IPCM
Constrained_intra_pred_flag = 1 or 0
transform 8x8 and 4x4
weighted_bipred_idc = 0, 1, 2
Need CPU with SSE2
Windows XP, Windows Vista and Windows 2000 are supported

On coming features:
weighted_pred_flag
interlaced supporting
Multi-Core supporting

Because the DiAVC's development is in early stage, I only provide the dll and the testbed. I'll add the supports on Direct_8x8_inference_flag = 0, weighted_bipred_idc and Multi-Core supporting, and then the interlaced supporting.

I have tested my DiAVC on my laptop with a Core Duo CPU of 1.86GHz using the CABAC high profile bitstream. The throughput is about 19mbps with 50% CPU fullness (single thread on a dual-core CPU). Because the DiAVC's development is in early stage, there are many space to be optimized to get a faster decoder, such as the un-optimized MC, transform8x8 and CABAC decoding.

Besides, where is some directshow samples I can reference?
I want to code a directshow transform filter for my DiAVC.
its very odd that you state you want to release your app as commercial property today ,however you dont even see fit to put MBAFF and PAFF on the top of your must have list, that IS the most required option for many people around the world that have access to DVB-* MBAFF and PAFF encoded content (the UK, EU, HD cams, etc) right now and want to decode it (better than)realtime for processing.....

will you infact be including these essential MBAFF and PAFF decoder options in your base code/app.

http://en.wikipedia.org/wiki/H.264
"Flexible interlaced-scan video coding features, including:
Macroblock-adaptive frame-field (MBAFF) coding, using a macroblock pair structure for pictures coded as frames, allowing 1616 macroblocks in field mode (compared with 168 half-macroblocks in MPEG-2).

Picture-adaptive frame-field coding (PAFF or PicAFF) allowing a freely-selected mixture of pictures coded as MBAFF frames with pictures coded as individual single fields (half frames) of interlaced video.
"

Last edited by popper; 25th December 2008 at 16:58.
popper is offline   Reply With Quote
Old 25th December 2008, 20:39   #45  |  Link
Mr VacBob
Registered User
 
Join Date: Feb 2005
Posts: 141
I'm not surprised you can make a fast and simple decoder by not supporting interlacing, but it gets harder afterwards.

(didn't you post complaining about "redundant operations" in libavcodec? what part of the decoder was that referring to?)
Mr VacBob is offline   Reply With Quote
Old 27th December 2008, 07:57   #46  |  Link
schweinsz
Registered User
 
Join Date: Nov 2005
Posts: 496
Quote:
Originally Posted by popper View Post
its very odd that you state you want to release your app as commercial property today ,however you dont even see fit to put MBAFF and PAFF on the top of your must have list, that IS the most required option for many people around the world that have access to DVB-* MBAFF and PAFF encoded content (the UK, EU, HD cams, etc) right now and want to decode it (better than)realtime for processing.....

will you infact be including these essential MBAFF and PAFF decoder options in your base code/app.

http://en.wikipedia.org/wiki/H.264
"Flexible interlaced-scan video coding features, including:
Macroblock-adaptive frame-field (MBAFF) coding, using a macroblock pair structure for pictures coded as frames, allowing 1616 macroblocks in field mode (compared with 168 half-macroblocks in MPEG-2).

Picture-adaptive frame-field coding (PAFF or PicAFF) allowing a freely-selected mixture of pictures coded as MBAFF frames with pictures coded as individual single fields (half frames) of interlaced video.
"
I will start to work on the interlaced supporting (including the MBAFF and PAFF) after I finished the multi-thread decoding.
schweinsz is offline   Reply With Quote
Old 27th December 2008, 08:08   #47  |  Link
schweinsz
Registered User
 
Join Date: Nov 2005
Posts: 496
Quote:
Originally Posted by Mr VacBob View Post
I'm not surprised you can make a fast and simple decoder by not supporting interlacing, but it gets harder afterwards.

(didn't you post complaining about "redundant operations" in libavcodec? what part of the decoder was that referring to?)
Adding interlacing will not influence the speed of frame decoding because you can use a totally different slice decoding.

Regarding to the "redundant operations" in liavcodec, for example, the zerosleft decoding can be merged with the dequantization instead of two loop, take another example, the algorithm of cabac decoding and context computation is very un-efficient.
schweinsz is offline   Reply With Quote
Old 27th December 2008, 13:33   #48  |  Link
Dark Shikari
x264 developer
 
Dark Shikari's Avatar
 
Join Date: Sep 2005
Posts: 8,690
Quote:
Originally Posted by schweinsz View Post
Adding interlacing will not influence the speed of frame decoding because you can use a totally different slice decoding.

Regarding to the "redundant operations" in liavcodec, for example, the zerosleft decoding can be merged with the dequantization instead of two loop
What are you talking about with the zerosleft decoding? It is merged with dequantization, and is not two loops (and I don't think this was done recently either).
Quote:
Originally Posted by schweinsz View Post
take another example, the algorithm of cabac decoding and context computation is very un-efficient.
Would you like to point me to a specific case so I can fix it rather than complaining?

Any suggestion for improved context calculation performance will probably allow me to make x264's RDO faster as well.

Same with the actual CABAC decoding--are you suggesting that there is a better way to organize LUTs/similar that results in better faster arithmetic decoding than what LAVC already has?

Last edited by Dark Shikari; 27th December 2008 at 13:47.
Dark Shikari is offline   Reply With Quote
Old 27th December 2008, 14:49   #49  |  Link
schweinsz
Registered User
 
Join Date: Nov 2005
Posts: 496
Quote:
Originally Posted by Dark Shikari View Post
What are you talking about with the zerosleft decoding? It is merged with dequantization, and is not two loops (and I don't think this was done recently either).Would you like to point me to a specific case so I can fix it rather than complaining?

Any suggestion for improved context calculation performance will probably allow me to make x264's RDO faster as well.

Same with the actual CABAC decoding--are you suggesting that there is a better way to organize LUTs/similar that results in better faster arithmetic decoding than what LAVC already has?
Yes, I have read the latest libavcodec in ffdshow tryouts, It has been merged.

Regarding to the context calculation, take the coded block flag decoding as a example,

const unsigned int iCabac4x4CnxtAdd[8] = {0x9, 0x18, 0x801, 0x810, 0x9, 0x100008, 0x801, 0x100800};

unsigned int contxt = contxt1+contxt2;
for(int i=0; i<16; i++)
{
int contextcurr = contxt&3;
contxt >>= 2;

if(cbp[i>>2])
{
cbf = cabac_dec_symbol(..., contextcurr);
if(cbf)
{
contxt += iCabac4x4CnxtAdd[i&7];
......
}
}
}
update the contxt1 and contxt2 using the contxt;

Yes, I have a faster arithmetic decoding and I have written part of it into the DiAVC.
schweinsz is offline   Reply With Quote
Old 27th December 2008, 14:54   #50  |  Link
Dark Shikari
x264 developer
 
Dark Shikari's Avatar
 
Join Date: Sep 2005
Posts: 8,690
Quote:
Originally Posted by schweinsz View Post
Yes, I have read the latest libavcodec in ffdshow tryouts, It has been merged.

Regarding to the context calculation, take the coded block flag decoding as a example,

const unsigned int iCabac4x4CnxtAdd[8] = {0x9, 0x18, 0x801, 0x810, 0x9, 0x100008, 0x801, 0x100800};

unsigned int contxt = contxt1+contxt2;
for(int i=0; i<16; i++)
{
int contextcurr = contxt&3;
contxt >>= 2;

if(cbp[i>>2])
{
cbf = cabac_dec_symbol(..., contextcurr);
if(cbf)
{
contxt += iCabac4x4CnxtAdd[i&7];
......
}
}
}
update the contxt1 and contxt2 using the contxt;
You know there is no way in hell that'll work with interlacing, right?
Quote:
Originally Posted by schweinsz View Post
Yes, I have a faster arithmetic decoding and I have written part of it into the DiAVC.
OK, I'll see what I can take from it then.
Dark Shikari is offline   Reply With Quote
Old 27th December 2008, 14:56   #51  |  Link
schweinsz
Registered User
 
Join Date: Nov 2005
Posts: 496
Quote:
Originally Posted by Dark Shikari View Post
What are you talking about with the zerosleft decoding? It is merged with dequantization, and is not two loops (and I don't think this was done recently either).Would you like to point me to a specific case so I can fix it rather than complaining?

Any suggestion for improved context calculation performance will probably allow me to make x264's RDO faster as well.

Same with the actual CABAC decoding--are you suggesting that there is a better way to organize LUTs/similar that results in better faster arithmetic decoding than what LAVC already has?
After I finished the DiAVC, I will improve the x264 and it is opensource. I can improve the computational efficiency of the ME&CABAC at least.

Last edited by schweinsz; 27th December 2008 at 15:02.
schweinsz is offline   Reply With Quote
Old 27th December 2008, 15:00   #52  |  Link
schweinsz
Registered User
 
Join Date: Nov 2005
Posts: 496
Quote:
Originally Posted by Dark Shikari View Post
You know there is no way in hell that'll work with interlacing, right? OK, I'll see what I can take from it then.
Why do you say that the cbf decoding dont work with interlacing? I think the cbf decoding can work with the interlace. It is possible to take a small change to fit into the mbaff. I am not familar with the mbaff.
schweinsz is offline   Reply With Quote
Old 27th December 2008, 15:12   #53  |  Link
Dark Shikari
x264 developer
 
Dark Shikari's Avatar
 
Join Date: Sep 2005
Posts: 8,690
Quote:
Originally Posted by schweinsz View Post
After I finished the DiAVC, I will improve the x264 and it is opensource. I can improve the computational efficiency of the ME&CABAC at least.
Come to #x264dev on Freenode. We welcome optimizations--even just ideas for optimizations that you don't want/don't have time to implement--plus, you could also help with the upcoming MBAFF patch (currently the code has far too many calls to getNeighborAff() )
Quote:
Originally Posted by schweinsz View Post
Why do you say that the cbf decoding dont work with interlacing? I think the cbf decoding can work with the interlace. It is possible to take a small change to fit into the mbaff. I am not familar with the mbaff.
MBAFF rules for CBF selection are AFAIK rather messy...

Last edited by Dark Shikari; 27th December 2008 at 15:28.
Dark Shikari is offline   Reply With Quote
Old 28th December 2008, 07:17   #54  |  Link
akupenguin
x264 developer
 
akupenguin's Avatar
 
Join Date: Sep 2004
Posts: 2,393
Quote:
Originally Posted by schweinsz View Post
Yes, I have read the latest libavcodec in ffdshow tryouts, It has been merged.
Latest?! It was merged in r4617 in 2005.

Last edited by akupenguin; 28th December 2008 at 07:19.
akupenguin is offline   Reply With Quote
Old 28th December 2008, 08:33   #55  |  Link
schweinsz
Registered User
 
Join Date: Nov 2005
Posts: 496
Quote:
Originally Posted by akupenguin View Post
Latest?! It was merged in r4617 in 2005.
I even read a very old livavcodec in this summer. Perhaps a version in 2004.
schweinsz is offline   Reply With Quote
Old 29th December 2008, 17:23   #56  |  Link
schweinsz
Registered User
 
Join Date: Nov 2005
Posts: 496
Quote:
Originally Posted by Dark Shikari View Post
Come to #x264dev on Freenode. We welcome optimizations--even just ideas for optimizations that you don't want/don't have time to implement--plus, you could also help with the upcoming MBAFF patch (currently the code has far too many calls to getNeighborAff() )MBAFF rules for CBF selection are AFAIK rather messy...
I even read the x264. It seems that the SSD in RDO is computed after the reconstraction. But to the best of my knowledge, It can be gotten directly from the transform domain exactly. A scale for coefficients is needed for the transform in H.264. So after de-quantization, the SSD can be gotten. The IDCT and add to mc signal is bypassed. But I have no time now to write it.
schweinsz is offline   Reply With Quote
Old 29th December 2008, 17:29   #57  |  Link
Dark Shikari
x264 developer
 
Dark Shikari's Avatar
 
Join Date: Sep 2005
Posts: 8,690
Quote:
Originally Posted by schweinsz View Post
I even read the x264. It seems that the SSD in RDO is computed after the reconstraction. But to the best of my knowledge, It can be gotten directly from the transform domain exactly. A scale for coefficients is needed for the transform in H.264. So after de-quantization, the SSD can be gotten. The IDCT and add to mc signal is bypassed. But I have no time now to write it.
We actually just happened to discuss this in #x264dev recently, and we concluded that it isn't very useful.

The reason it isn't useful is:

1. iDCT is really fast.
2. Psy-RD requires iDCT to be done anyways. It isn't really worth optimizing too much for the case of no psy-opts.
3. It would make the code much messier.
4. SSD in transform domain is not *exactly* the same because of rounding.
5. SSD in transform domain would require a new asm function to multiply by the correct DCT weighting values for each coefficient.

(By the way, this is the proper approach: suggest ideas before implementing them so that you don't waste time on something that has been concluded to be a bad idea, or so that we can suggest even better ways of doing it.)

Last edited by Dark Shikari; 29th December 2008 at 17:33.
Dark Shikari is offline   Reply With Quote
Old 5th October 2009, 23:29   #58  |  Link
schweinsz
Registered User
 
Join Date: Nov 2005
Posts: 496
I am happy that DiAVC alpha version is released. The diavc.ax and a setting tool is included.
It supports the interlace coding and frame-level parallel.
I have tested the DiAVC and compared to other decoders and found that the DiAVC is faster than the coreavc and divx h.264 decoder on my computer (intel core duo T2350 1.86G, ram 1G, FSB 533MHz, harddisk, 80G, ati mobility radeon x1450).
schweinsz is offline   Reply With Quote
Old 6th October 2009, 01:21   #59  |  Link
Sagekilla
x264aholic
 
Join Date: Jul 2007
Location: New York
Posts: 1,752
@Dark: Regarding the change that would bork interlacing: Would it really be that bad if you created two code paths, one designed solely for progressive video, and another for field based?

I'm sure there's a very good reason why not to, and the only reasons I could think of where because of massive code duplication and messy code.
__________________
You can't call your encoding speed slow until you start measuring in seconds per frame.
Sagekilla is offline   Reply With Quote
Old 6th October 2009, 01:23   #60  |  Link
Dark Shikari
x264 developer
 
Dark Shikari's Avatar
 
Join Date: Sep 2005
Posts: 8,690
Quote:
Originally Posted by Sagekilla View Post
@Dark: Regarding the change that would bork interlacing: Would it really be that bad if you created two code paths, one designed solely for progressive video, and another for field based?

I'm sure there's a very good reason why not to, and the only reasons I could think of where because of massive code duplication and messy code.
CoreAVC already does this. I don't really feel like doing it with x264. It'd be ugly, and hardly worth it for ~1% speed or whatever we'd get.
Dark Shikari is offline   Reply With Quote
Reply

Tags
avc, diavc, fastest decoder, h.264, software

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 16:45.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2017, vBulletin Solutions Inc.