Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Video Encoding > New and alternative video codecs

Reply
 
Thread Tools Search this Thread Display Modes
Old 18th January 2010, 13:42   #10341  |  Link
tal.aloni
Registered User
 
Join Date: Sep 2008
Posts: 496
ok, I'll check some low-res files when I get home,
Thanks,
Tal
tal.aloni is offline   Reply With Quote
Old 18th January 2010, 13:48   #10342  |  Link
HeadlessCow
Registered User
 
Join Date: Nov 2002
Posts: 131
Core i7 920
Radeon 4670

Code:
Format                           : AVC
Format/Info                      : Advanced Video Codec
Format profile                   : High@L4.1
Format settings, CABAC           : Yes
Format settings, ReFrames        : 4 frames
Codec ID                         : avc1
Codec ID/Info                    : Advanced Video Coding
Duration                         : 45mn 29s
Bit rate mode                    : Variable
Bit rate                         : 2 569 Kbps
Maximum bit rate                 : 11.1 Mbps
Width                            : 1 280 pixels
Height                           : 720 pixels
Display aspect ratio             : 16:9
Frame rate mode                  : Constant
Frame rate                       : 23.976 fps
Resolution                       : 24 bits
Colorimetry                      : 4:2:0
Scan type                        : Progressive
Bits/(Pixel*Frame)               : 0.116
Stream size                      : 836 MiB (100%)
Writing library                  : x264 core 68 r1183M f21daff
Encoding settings                : cabac=1 / ref=4 / deblock=1:-1:-1 / analyse=0x3:0x113 / me=umh / subme=9 / psy_rd=1.0:0.0 / mixed_ref=1 / me_range=32 / chroma_me=1 / trellis=2 / 8x8dct=1 / cqm=0 / deadzone=21,11 / chroma_qp_offset=-2 / threads=12 / nr=0 / decimate=1 / mbaff=0 / bframes=3 / b_pyramid=0 / b_adapt=2 / b_bias=0 / direct=3 / wpredb=1 / keyint=250 / keyint_min=25 / scenecut=40 / rc=2pass / bitrate=2569 / ratetol=1.0 / qcomp=0.60 / qpmin=10 / qpmax=51 / qpstep=4 / cplxblur=20.0 / qblur=0.5 / vbv_maxrate=50000 / vbv_bufsize=50000 / ip_ratio=1.40 / pb_ratio=1.30 / aq=1:1.00
Overlay works fine at 24fps.
Full postprocessing drops it to ~18fps and the color is off.

Skipping around in the file at all crashes MPC-HC.
HeadlessCow is offline   Reply With Quote
Old 18th January 2010, 13:55   #10343  |  Link
HeadlessCow
Registered User
 
Join Date: Nov 2002
Posts: 131
Core i7 920
Radeon 4670

Code:
Format                           : AVC
Format/Info                      : Advanced Video Codec
Format profile                   : High@L3.1
Format settings, CABAC           : Yes
Format settings, ReFrames        : 4 frames
Codec ID                         : avc1
Codec ID/Info                    : Advanced Video Coding
Duration                         : 24mn 10s
Bit rate mode                    : Variable
Bit rate                         : 1 151 Kbps
Maximum bit rate                 : 3 059 Kbps
Width                            : 640 pixels
Height                           : 480 pixels
Display aspect ratio             : 4:3
Frame rate mode                  : Constant
Frame rate                       : 23.976 fps
Resolution                       : 24 bits
Colorimetry                      : 4:2:0
Scan type                        : Progressive
Bits/(Pixel*Frame)               : 0.156
Stream size                      : 199 MiB (100%)
Writing library                  : x264 core 80 r1376M 3feaec2
Encoding settings                : cabac=1 / ref=4 / deblock=1:-1:-1 / analyse=0x3:0x113 / me=umh / subme=7 / psy=1 / psy_rd=0.4:0.0 / mixed_ref=1 / me_range=16 / chroma_me=1 / trellis=2 / 8x8dct=1 / cqm=0 / deadzone=21,11 / fast_pskip=1 / chroma_qp_offset=-2 / threads=12 / sliced_threads=0 / nr=0 / decimate=1 / mbaff=0 / constrained_intra=0 / bframes=5 / b_pyramid=0 / b_adapt=2 / b_bias=0 / direct=3 / wpredb=1 / wpredp=2 / keyint=250 / keyint_min=25 / scenecut=40 / rc_lookahead=40 / rc=2pass / mbtree=1 / bitrate=1151 / ratetol=1.0 / qcomp=0.60 / qpmin=10 / qpmax=51 / qpstep=4 / cplxblur=20.0 / qblur=0.5 / vbv_maxrate=17500 / vbv_bufsize=14000 / ip_ratio=1.40 / aq=1:0.60
Overlay is perfect.
Full postprocessing keeps up with the framerate but looks like this:
HeadlessCow is offline   Reply With Quote
Old 18th January 2010, 14:05   #10344  |  Link
tal.aloni
Registered User
 
Join Date: Sep 2008
Posts: 496
Thanks!
I gues I should focus on "merge" for the time being,
It would be perfect for subtitles / OSD / Bitmap overlay.

I'm still wondering why we couldn't get more speed out of the core i7... the intel docs have demonstrated sufficient theoretical speed.
maybe albain could figure it out / optimize.
(perhaps it's not related to speed at at.. maybe the last example tied to colorspace issues)

Tal
tal.aloni is offline   Reply With Quote
Old 18th January 2010, 14:32   #10345  |  Link
albain
Media Control author
 
Join Date: Dec 2006
Location: Paris
Posts: 1,014
It may be due to the colorspace.

I can't test it right now but in my (non working) patch I added 2 conversions functions in src/imgFilters/ffImgfmt.h.

And I called csp_planar2packed(FF_CSP_NV12) which gives FF_CSP_NV12|FF_CSP_FLAGS_YUV_ADJ|FF_CSP_FLAGS_YUV_ORDER. This is (I think) the value that we should use when initializing TffPict

Code:
static __inline int csp_planar2packed(int csp)
{
 switch (csp)
 {
   case FF_CSP_YUY2:return FF_CSP_YUY2;break;
   case FF_CSP_YVYU:return FF_CSP_YVYU;break;
   case FF_CSP_UYVY:return FF_CSP_UYVY;break;
   case FF_CSP_VYUY:return FF_CSP_VYUY;break;
   case FF_CSP_CLJR:return FF_CSP_CLJR;break;
   case FF_CSP_444P:return FF_CSP_444P;break;
   case FF_CSP_422P:return FF_CSP_422P;break;
   case FF_CSP_411P:return FF_CSP_411P;break;
   case FF_CSP_410P:return FF_CSP_410P;break;
   case FF_CSP_Y800:return FF_CSP_Y800;break;
   case FF_CSP_NV12:return FF_CSP_NV12|FF_CSP_FLAGS_YUV_ADJ|FF_CSP_FLAGS_YUV_ORDER;break;
   case FF_CSP_PAL8:return FF_CSP_PAL8|FF_CSP_FLAGS_VFLIP;break;
 }
}

static __inline int csp_packed2planar(int csp)
{
 switch (csp)
 {
   case FF_CSP_YUY2:return FF_CSP_YUY2;break;
   case FF_CSP_YVYU:return FF_CSP_YVYU;break;
   case FF_CSP_UYVY:return FF_CSP_UYVY;break;
   case FF_CSP_VYUY:return FF_CSP_VYUY;break;
   case FF_CSP_CLJR:return FF_CSP_CLJR;break;
   case FF_CSP_444P:return FF_CSP_444P;break;
   case FF_CSP_422P:return FF_CSP_422P;break;
   case FF_CSP_411P:return FF_CSP_411P;break;
   case FF_CSP_410P:return FF_CSP_410P;break;
   case FF_CSP_Y800:return FF_CSP_Y800;break;
   case FF_CSP_NV12|FF_CSP_FLAGS_YUV_ADJ|FF_CSP_FLAGS_YUV_ORDER:return FF_CSP_NV12;break;
   case FF_CSP_PAL8|FF_CSP_FLAGS_VFLIP:return FF_CSP_PAL8;break;
 }
}
EDIT : this code has been inspired from TvideoCodecUncompressed::beginDecompress code where FFDShow receives packed colorspaces as in DXVA.
In the method the packed colorspace is obtained from the input format, including :
Code:
case CODEC_ID_NV12:csp=FF_CSP_NV12|FF_CSP_FLAGS_YUV_ADJ|FF_CSP_FLAGS_YUV_ORDER;break;
albain is offline   Reply With Quote
Old 18th January 2010, 15:46   #10346  |  Link
tal.aloni
Registered User
 
Join Date: Sep 2008
Posts: 496
I found a bug in full processing mode.
Code:
pStore += pitch - width;
(should be divided by 16)

I'll keep working on it in a few hours.
tal.aloni is offline   Reply With Quote
Old 18th January 2010, 18:32   #10347  |  Link
albain
Media Control author
 
Join Date: Dec 2006
Location: Paris
Posts: 1,014
Hi Tal,

back home. I am building it with the fix on 16 division

I'll post results in a moment
albain is offline   Reply With Quote
Old 18th January 2010, 19:03   #10348  |  Link
tal.aloni
Registered User
 
Join Date: Sep 2008
Posts: 496
another bug on full post processing (SSE 2 and 4.1):
some random crashes results from addresses not aligned to 16 bytes. _mm_stream_si128 will crash.

Last edited by tal.aloni; 18th January 2010 at 19:09.
tal.aloni is offline   Reply With Quote
Old 18th January 2010, 19:15   #10349  |  Link
albain
Media Control author
 
Join Date: Dec 2006
Location: Paris
Posts: 1,014
Okay, the video is very slow with compression artifacts (not green lines) and I have a 50% CPU charge on my Q9450 (2.66Ghz).

In overlay mode I have 22% CPU

I saw that you made some colorspaces conversions : there is a converter (with optimized CPU instructions). I don't know if it can handle the source & dest colorspace that you want
albain is offline   Reply With Quote
Old 18th January 2010, 19:15   #10350  |  Link
clsid
*****
 
Join Date: Feb 2005
Posts: 5,647
The FFmpeg devs are working on further improving their H.264 code. Their latest patches are not compatible with the slice based multi-threaded decoding patch that we use in ffdshow's libavcodec.

Here is a patch with the latest changes:
http://pastebin.com/m11865032

Should I remove the slice based multi-threading patch? Or is there anyone willing/able to fix it?

Edit: new patch
__________________
MPC-HC 2.2.1

Last edited by clsid; 18th January 2010 at 22:38.
clsid is online now   Reply With Quote
Old 18th January 2010, 19:33   #10351  |  Link
albain
Media Control author
 
Join Date: Dec 2006
Location: Paris
Posts: 1,014
You are already using optimized code. I don't know what else could be done to improve the perfs. The big problem comes from the GPU to system memory copy


@clsid : this is okay, we are not affected by the patch in DXVA mode (if your question was addressed to us)
albain is offline   Reply With Quote
Old 18th January 2010, 19:35   #10352  |  Link
tal.aloni
Registered User
 
Join Date: Sep 2008
Posts: 496
Quote:
Originally Posted by albain View Post
Okay, the video is very slow with compression artifacts (not green lines) and I have a 50% CPU charge on my Q9450 (2.66Ghz).
also, there are still a few issues with the full processing methods (add support when input buffer pointer not aligned to 64, support of inputs with width which is not mod 64)

I wonder if it's worth the effort (full processing). (I'm also wondering if executing the copy in a different stage of the decoding would give us more time)

Last edited by tal.aloni; 18th January 2010 at 19:38.
tal.aloni is offline   Reply With Quote
Old 18th January 2010, 19:59   #10353  |  Link
THX-UltraII
Registered User
 
Join Date: Aug 2008
Location: the Netherlands
Posts: 851
The ATI HD5670 was released a few days ago and is already in stock here in the Netherlands. Will this card give any problems with Bitstream support?
THX-UltraII is offline   Reply With Quote
Old 18th January 2010, 20:18   #10354  |  Link
rsd78
Registered User
 
Join Date: Jan 2009
Posts: 73
Quote:
Originally Posted by albain View Post
Okay, the video is very slow with compression artifacts (not green lines) and I have a 50% CPU charge on my Q9450 (2.66Ghz).

In overlay mode I have 22% CPU

I saw that you made some colorspaces conversions : there is a converter (with optimized CPU instructions). I don't know if it can handle the source & dest colorspace that you want
My hunch is that it may not be worth while performance wise for the full performance method.

I'm a little surprised that in overlay mode it still took 22% CPU, albain. How does that compare to software decoding mode for the same file?
rsd78 is offline   Reply With Quote
Old 18th January 2010, 20:48   #10355  |  Link
albain
Media Control author
 
Join Date: Dec 2006
Location: Paris
Posts: 1,014
Quote:
Originally Posted by tal.aloni View Post
also, there are still a few issues with the full processing methods (add support when input buffer pointer not aligned to 64, support of inputs with width which is not mod 64)

I wonder if it's worth the effort (full processing). (I'm also wondering if executing the copy in a different stage of the decoding would give us more time)
This may be multithreaded with boost as it seems that there are several frames in the buffer when we have to display them, but this is a huge amount of work

Last edited by albain; 18th January 2010 at 20:54.
albain is offline   Reply With Quote
Old 18th January 2010, 21:09   #10356  |  Link
albain
Media Control author
 
Join Date: Dec 2006
Location: Paris
Posts: 1,014
With software mode only (no DXVA) I have 30-40% of CPU charge
albain is offline   Reply With Quote
Old 18th January 2010, 22:10   #10357  |  Link
rsd78
Registered User
 
Join Date: Jan 2009
Posts: 73
Quote:
Originally Posted by albain View Post
With software mode only (no DXVA) I have 30-40% of CPU charge
Thanks Albain. What is the rough cpu usage using ffdshow's dxva but without any overlay? Just to get an idea of the comparative differences we are talking about here. Thanks again for all you and the others have done. Between bitstreaming, and dxva + subtitles I'm in heaven
rsd78 is offline   Reply With Quote
Old 18th January 2010, 22:48   #10358  |  Link
albain
Media Control author
 
Join Date: Dec 2006
Location: Paris
Posts: 1,014
Quote:
Originally Posted by rsd78 View Post
Thanks Albain. What is the rough cpu usage using ffdshow's dxva but without any overlay? Just to get an idea of the comparative differences we are talking about here. Thanks again for all you and the others have done. Between bitstreaming, and dxva + subtitles I'm in heaven
7%

But this is normal as there is no one-way and return to the GPU.

Also there are in overlay mode colorspaces conversions
albain is offline   Reply With Quote
Old 18th January 2010, 23:21   #10359  |  Link
mark0077
Registered User
 
Join Date: Apr 2008
Posts: 1,106
Can anyone confirm de-interlacing with yadif + double frame rate, no longer doubles frame rate. I have forced de-interlacing with many types of content, even non interlaced. I cannot get ffdshow to double the frame rate anymore. It does de-interlace content fine so it is being enabled, it just leaves the frame rate as is.
mark0077 is offline   Reply With Quote
Old 19th January 2010, 07:40   #10360  |  Link
Mr VacBob
Registered User
 
Join Date: Feb 2005
Posts: 140
Quote:
Originally Posted by clsid View Post
The FFmpeg devs are working on further improving their H.264 code. Their latest patches are not compatible with the slice based multi-threaded decoding patch that we use in ffdshow's libavcodec.

Here is a patch with the latest changes:
http://pastebin.com/m11865032

Should I remove the slice based multi-threading patch? Or is there anyone willing/able to fix it?

Edit: new patch
If you mean frame, I'll fix it tomorrow - slice threading was broken but fixed again in ffmpeg. Splitting the decoder files completely confused git, so I'll have to do a quite large merge by hand. Meant to do it over the weekend but people kept asking me to do other stuff…
Mr VacBob is offline   Reply With Quote
Reply

Tags
ffdshow, ffdshow tryouts, ffdshow-mt, ffplay, icl

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 14:26.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.