Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Video Encoding > MPEG-4 AVC / H.264

Reply
 
Thread Tools Search this Thread Display Modes
Old 2nd September 2005, 11:00   #21  |  Link
Sergey A. Sablin
Registered User
 
Join Date: Dec 2004
Location: Tomsk, Russia
Posts: 366
Quote:
Originally Posted by bond
well the mplayer devs recommend when wanting to do benchmarks to use the -vo null and -benchmark options, so i assume the output values are useable!?
Sure it usable. But if mplayer doesn't require output frame copy, then this values are only usable to compare different decoders inside mplayer - not with decoders inside DShow environment, do you agree?
But if mplayer also require to do this copy, then results are also comparable to DS filters.

(I just mean that if devs recommend this way to measure performance than we can't say that it measure like another tools)
Sergey A. Sablin is offline   Reply With Quote
Old 2nd September 2005, 11:01   #22  |  Link
akupenguin
x264 developer
 
akupenguin's Avatar
 
Join Date: Sep 2004
Posts: 2,392
Quote:
Originally Posted by Sergey A. Sablin
Does somebody know how mplayer process with this option? with or without the copy?
libavcodec returns a pointer into the same decoded picture buffer used for inter prediction. Then real vos copy it to video memory, and -vo null doesn't. (In some formats (including ASP but not yet implemented for H.264), frames that don't need to be kept (B-frames or Intra-only) can be decoded directly into video memory, or incrementally into a video filter if some filtering is performed.)

You're saying that in DShow, instead the vo passes a pointer to the decoder, and real vos pass a pointer to video memory while chegepuga passes a pointer to some dummy buffer that's never read?

Last edited by akupenguin; 2nd September 2005 at 11:05.
akupenguin is offline   Reply With Quote
Old 2nd September 2005, 11:16   #23  |  Link
Sergey A. Sablin
Registered User
 
Join Date: Dec 2004
Location: Tomsk, Russia
Posts: 366
Quote:
Originally Posted by akupenguin
libavcodec returns a pointer into the same decoded picture buffer used for inter prediction. Then real vos copy it to video memory, and -vo null doesn't. (In some formats (including ASP but not yet implemented for H.264), frames that don't need to be kept (B-frames or Intra-only) can be decoded directly into video memory, or incrementally into a video filter if some filtering is performed.)

You're saying that in DShow, instead the vo passes a pointer to the decoder, and real vos pass a pointer to video memory while chegepuga passes a pointer to some dummy buffer that's never read?
I mean that in DShow decoder after frame decoding should copy the output frame into another location which is indicated by pointer passed to decoder by downstream filter (which is in this case chegepuga and in real playback is video renderer), but I don't know whether mplayer requires this too.
Sergey A. Sablin is offline   Reply With Quote
Old 2nd September 2005, 11:20   #24  |  Link
akupenguin
x264 developer
 
akupenguin's Avatar
 
Join Date: Sep 2004
Posts: 2,392
MPlayer does not require that. Neither -vo null nor the real playback do that copy.
akupenguin is offline   Reply With Quote
Old 2nd September 2005, 14:01   #25  |  Link
alexcyn
Registered User
 
alexcyn's Avatar
 
Join Date: Jan 2003
Location: St.Petersburg, Russia
Posts: 11
To: Bond

To: Bond
> i used indeed an old version, and that was mainly done because i
> did the test to find out what decoder i could use for my encodes,
> so i used the last version you released which was unlimited
unlimited free version may violate h264 patents, thats why we had to limit it by 30 days :-(.

> i simply wasnt really interested in testing a decoder which becomes
> useless for me after 30 days
we will think, may be it makes sense to give you unlimited version for testing.

> hm i think your decoder works with the VSSH and H264 fourcc.
> actually i dont know a splitter which outputs this (eg from .mp4 or .mpg)
> your decoder could connect to, so theoretically your decoder can of
> course work with any format, but practically its not so easy till now
Yes, version 2.0 was limited by 2 fourccs, but version 2.2 accepts any fourcc (type=video, subtype=[any]).

> so the 20$ baseline version includes a main profile decoder?
YES
=Alexei, VSS
alexcyn is offline   Reply With Quote
Old 2nd September 2005, 14:06   #26  |  Link
bond
Registered User
 
Join Date: Nov 2001
Posts: 9,770
thx for the info!
__________________
Between the weak and the strong one it is the freedom which oppresses and the law that liberates (Jean Jacques Rousseau)
I know, that I know nothing (Socrates)

MPEG-4 ASP FAQ | AVC/H.264 FAQ | AAC FAQ | MP4 FAQ | MP4Menu stores DVD Menus in MP4 (guide)
Ogg Theora | Ogg Vorbis
use WM9 today and get Micro$oft controlling the A/V market tomorrow for free
bond is offline   Reply With Quote
Old 2nd September 2005, 14:28   #27  |  Link
Haali
Registered User
 
Join Date: Jul 2003
Posts: 282
Quote:
Originally Posted by Sergey A. Sablin
I mean that in DShow decoder after frame decoding should copy the output frame into another location which is indicated by pointer passed to decoder by downstream filter (which is in this case chegepuga and in real playback is video renderer), but I don't know whether mplayer requires this too.
DShow can also act in the same way as mplayer. In dshow buffer management is done in some nontrivial way. First you negotiate an allocator with the downstream filter, obviously in case of video renderer you want to use the renderer's allocator since it knows how to work with video memory. Second you set the allocator parameters like number of buffers, alignment and buffer size. Then during playback you call allocator's GetBuffer() and filter's Receive() after you are done processing. If you specify AM_GBF_NOTASYNCPOINT in GetBuffer() call, then it will return the buffer with unchanged contents from the previous frame. So to avoid extra copying overhead you can request one buffer from the allocator and use AM_GBF_NOTASYNCPOINT. In this case renderers like overlay mixer will return a pointer to the overlay's video memory. Most decoders do it that way, buf ffdshow still performs an extra copy internally afaik.
Haali is offline   Reply With Quote
Old 3rd September 2005, 03:35   #28  |  Link
TheBashar
Registered User
 
TheBashar's Avatar
 
Join Date: Jan 2002
Posts: 112
Quote:
Originally Posted by Nil Einne
I could be wrong but I was under the impression not all decoders are equal quality-wise.
I concur on this point. After reading your comparison, I tried out the Moonlight-Elecard MPEG Player that Bond linked to. In testing with some of my AVC high-profile encodes, I found it periodically produced dark macroblocks which did not appear when using nero's decoder.

Being fast is great, but not at the cost of decoding artifacts.

Here's a sample of what I'm talking about:


Last edited by TheBashar; 3rd September 2005 at 03:43.
TheBashar is offline   Reply With Quote
Old 3rd September 2005, 06:53   #29  |  Link
stephanV
gone
 
Join Date: Apr 2004
Posts: 1,706
This is a bug, not a difference in quality.
stephanV is offline   Reply With Quote
Old 4th September 2005, 01:19   #30  |  Link
bobololo
Registered User
 
Join Date: May 2003
Posts: 328
Haali timeCodec

For those who are interested in decoding filter benchmarking, Haali was kind enough to write a little dshow application extremely convenient for this purpose.

It's available at the url: http://haali.cs.msu.ru/mkv/timeCodec.exe

It requires the latest version of Haali Media Splitter available here.

With some decoding filters (like ateme's one), it may requires an additionnal filter you can find here (you have to register it manually): http://haali.cs.msu.ru/mkv/m2r.dll

I did a quick test using a clip posted during ateme hp beta (batman-ateme-3500k-hp.mp4) and I got those figures:

nero (nve 3.1.0.16): 46.4 fps
moonlight (0.9.0 build 50208 beta): 47.6 fps
ffdshow (20050822 - cd build): 48.1 fps
ateme (2.2.1.0): 56.5 fps

The tests were done on a Pentium 4 @ 3.0 GHz (with HT).

For all decoders, timeCodec uses Haali Splitter to parse the file and to feed the decoder filter.

This test used an updated version of ateme decoder filter that be provided in the next beta release.

Many thanks to Haali again for his great work !

ps: I've tried moonlight decoder on other clips, and I had some decoding issues like with this one. Also it doesn't seem to decode mbaff correctly ?

EDIT: I found my problem with ffdshow, the postprocessing was enabled. It's much better now and I udpdated all the results with a 3.0 GHz cpu measurements.

Last edited by bobololo; 4th September 2005 at 08:24.
bobololo is offline   Reply With Quote
Old 4th September 2005, 09:16   #31  |  Link
Sergey A. Sablin
Registered User
 
Join Date: Dec 2004
Location: Tomsk, Russia
Posts: 366
Quote:
Originally Posted by akupenguin
MPlayer does not require that. Neither -vo null nor the real playback do that copy.
Well, just a two questions:
1. Did you mean that decoders use video memory for storing reference pictures? How about reading from video memory?
2. Did you mean that all decoders use YV12 colorspace for rendering? YV12 is slower for rendering than YUY2 and UYVY on all video cards I know.

Quote:
Originally Posted by Haali
DShow can also act in the same way as mplayer. In dshow buffer management is done in some nontrivial way. First you negotiate an allocator with the downstream filter, obviously in case of video renderer you want to use the renderer's allocator since it knows how to work with video memory. Second you set the allocator parameters like number of buffers, alignment and buffer size. Then during playback you call allocator's GetBuffer() and filter's Receive() after you are done processing. If you specify AM_GBF_NOTASYNCPOINT in GetBuffer() call, then it will return the buffer with unchanged contents from the previous frame. So to avoid extra copying overhead you can request one buffer from the allocator and use AM_GBF_NOTASYNCPOINT. In this case renderers like overlay mixer will return a pointer to the overlay's video memory. Most decoders do it that way, buf ffdshow still performs an extra copy internally afaik.
Did you mean decoding directly to video memory? If yes - than try to use at least two decoders with such technique. It'll be very interesting.
Reference pictures are also can't be decoded into video memory, so they need a copy anyway.
Sergey A. Sablin is offline   Reply With Quote
Old 4th September 2005, 10:10   #32  |  Link
Haali
Registered User
 
Join Date: Jul 2003
Posts: 282
What I meant is decoders usually request only one buffer from the allocator, I don't know if they do an extra copy or not.
Haali is offline   Reply With Quote
Old 5th September 2005, 12:36   #33  |  Link
akupenguin
x264 developer
 
akupenguin's Avatar
 
Join Date: Sep 2004
Posts: 2,392
Quote:
Originally Posted by Sergey A. Sablin
Well, just a two questions:
1. Did you mean that decoders use video memory for storing reference pictures? How about reading from video memory?
libavcodec supports:
  1. Store picture in application specified pointer, usually video memory. Used for non-referenced pictures when no filtering is needed. If you use this mode for a referenced picture, it will try to read it back from the buffer.
  2. Callback after each row of macroblocks with a pointer to the decoded slice. Used for referenced pictures or simple filters.
  3. Return a pointer to the decoded frame. Used for MEncoder, -vo null, or complex filters.
The decoder does no copies in any of those. The -vo (other than null) does a copy in (2) and (3), or filters/encoders may use the (read-only) buffer as is.

Quote:
2. Did you mean that all decoders use YV12 colorspace for rendering? YV12 is slower for rendering than YUY2 and UYVY on all video cards I know.
Yes, all decoders output the same pixel format as the video actually stores (so usually YV12). All video cards I know of are fast enough to do scaling + YV12->RGB for any video resolution they support at all, so you can free a little CPU time by not doing software YV12->YUY2 conversion. If for some reason you or the -vo need another pixel format, then MPlayer will insert a conversion filter (or maybe it's not always automatic; anyway, it's equivalent to "-vf scale").

Last edited by akupenguin; 5th September 2005 at 12:44.
akupenguin is offline   Reply With Quote
Old 5th September 2005, 14:02   #34  |  Link
Sergey A. Sablin
Registered User
 
Join Date: Dec 2004
Location: Tomsk, Russia
Posts: 366
Quote:
Originally Posted by akupenguin
libavcodec supports:[list=1][*]Store picture in application specified pointer, usually video memory. Used for non-referenced pictures when no filtering is needed. If you use this mode for a referenced picture, it will try to read it back from the buffer.
reading from video memory is very low -> it is very unefficient to use it when postprocessing is used. That means it is very unefficient for H.264 (at least), cause deblocking are used for most cases.

Quote:
Originally Posted by akupenguin
Yes, all decoders output the same pixel format as the video actually stores (so usually YV12). All video cards I know of are fast enough to do scaling + YV12->RGB for any video resolution they support at all, so you can free a little CPU time by not doing software YV12->YUY2 conversion. If for some reason you or the -vo need another pixel format, then MPlayer will insert a conversion filter (or maybe it's not always automatic; anyway, it's equivalent to "-vf scale").
For modern video cards yes. But for old YV12 is much slower.

Also decoding directly to video memory (i.e. in YV12) for some video cards produce many errors - try to use two instances of ffdshow with YV12 output enabled to decode some stream and you will see these artifacts. (I've tried mpeg-2 sequence with matrox parhelia via libavcodec and libmpg2. You can either use two instances in one process or in two different processes - it doesn't matter)

I don't want to say that decoding directly to video memory or rendering in YV12 format is slower everywhere, but in most cases it is still either slower or buggy and comparing decoding methods that doesn't work correctly in all cases is not so correct.
Sergey A. Sablin is offline   Reply With Quote
Old 5th September 2005, 23:16   #35  |  Link
bond
Registered User
 
Join Date: Nov 2001
Posts: 9,770
updated some values with a new version of ateme:
ateme-new: ateme mp4 parser 1.2.5.3 / ateme decoder 2.2.1.0

HIGH PROFILE

Code:
x264_hp_2pass_720x288_B0_Ref5_p4x4-i8x8_loop-5_WBP_cabac.mp4
moonlight: 	63.09
ateme-new:	60.23
libav-mplayer:	57.59
nero:		56.17
libav-ffdshow:	50.20


x264_hp_2pass_720x288_B3-Ref_Ref5_p4x4-i8x8_loop-5_WBP.mp4
libav-mplayer:	59.45
ateme-new:	56.42
libav-ffdshow:	52.35
moonlight: 	50.24
nero:		44.35


x264_hp_2pass_720x288_B3-Ref_Ref5_p4x4-i8x8_loop-5_WBP_cabac_cqm-qmatrix.mp4
ateme-new:	48.91
moonlight: 	44.89
nero:		39.36
libav-ffdshow:	cqm not supported
libav-mplayer:	cqm not supported


x264_hp_2pass_720x288_B3-Ref_Ref5_p4x4-i8x8_loop+6_WBP_cabac.mp4
libav-mplayer:	47.24
ateme-new:	43.17
libav-ffdshow:	42.03
moonlight: 	40.12
nero:		38.93


x264_hp_2pass_720x288_B3-Ref_Ref5_p4x4-i8x8_WBP_cabac.mp4
libav-mplayer:	61.96
ateme-new:	53.99
libav-ffdshow:	53.76
nero:		50.37
moonlight: 	48.36
MAIN PROFILE

Code:
x264_mp_2pass_720x288_B2_Ref3_p8x8-i4x4_cabac.mp4
libav-mplayer:	67.33
nero:		65.08
moonlight:	62.67
ateme-new:	60.95
libav-ffdshow:	58.15
ateme: 		52.99
mainconcept:	47.49


x264_mp_2pass_720x288_B2-Ref_Ref3_p4x4-i4x4_cabac.mp4
libav-mplayer:	65.36
moonlight:	62.09
libav-ffdshow:	57.33
ateme-new:	57.29
nero:		55.63
mainconcept:	49.27
ateme: 		b-ref not supported


nero_2pass-777kbps_Cabac_Deblock-5-adapt_B2_Ref3_WPred+wbp_Qpel_p8x8_cartoon_psy2_extra.mp4
ateme-new:	65.86
libav-mplayer:	58.02
libav-ffdshow:	54.87
ateme: 		52.30
moonlight:	52.25
nero:		51.33
mainconcept:	39.97
BASELINE PROFILE

Code:
x264-r285_bp_720x288_B0_Ref5_p8x8-i4x4_loop-5_wbp_cabac_mp4box.mp4
moonlight:	75.34
libav-mplayer:	72.88
ateme-new:	72.28
ateme:		70.63
libav-ffdshow:	63.47
nero:		61.80
mainconcept:	51.16
videosoft:	33.46
very good, but also varying results, often very fast, but also often on par with other good decoders
__________________
Between the weak and the strong one it is the freedom which oppresses and the law that liberates (Jean Jacques Rousseau)
I know, that I know nothing (Socrates)

MPEG-4 ASP FAQ | AVC/H.264 FAQ | AAC FAQ | MP4 FAQ | MP4Menu stores DVD Menus in MP4 (guide)
Ogg Theora | Ogg Vorbis
use WM9 today and get Micro$oft controlling the A/V market tomorrow for free
bond is offline   Reply With Quote
Old 6th September 2005, 07:53   #36  |  Link
Haali
Registered User
 
Join Date: Jul 2003
Posts: 282
Quote:
Originally Posted by Sergey A. Sablin
Also decoding directly to video memory (i.e. in YV12) for some video cards produce many errors
That's because they support only one YV12 overlay, so two instances conflict when using the same hardware resource. AFAIK overlay is the only place where planar YV12 is supported by video hardware, even modern cards don't support YV12 textures.
Haali is offline   Reply With Quote
Old 6th September 2005, 13:40   #37  |  Link
bond
Registered User
 
Join Date: Nov 2001
Posts: 9,770
and some more findings with a new videosoft decoder:
videosoft-new: m$ avi parser 6.5.1.902 / videosoft decoder 2.3.1.5

MAIN PROFILE

Code:
x264_mp_2pass_720x288_B2_Ref3_p4x4-i4x4_cabac.mp4
libav-mplayer:	66.33
moonlight:	61.99
libav-ffdshow:	57.31
videosoft-new:	54.05
nero:		53.08
ateme: 		47.86
mainconcept:	47.86


x264_mp_2pass_720x288_B2_Ref3_p8x8-i4x4_cabac.mp4
libav-mplayer:	67.33
nero:		65.08
moonlight:	62.67
ateme-new:	60.95
libav-ffdshow:	58.15
videosoft-new:	57.29
ateme: 		52.99
mainconcept:	47.49


x264_mp_2pass_720x288_B2_Ref3_p8x8-i4x4_loop-5_cabac.mp4
moonlight:	58.33
libav-mplayer:	53.17
nero:		52.19
ateme: 		50.37
videosoft-new:	49.11
libav-ffdshow:	47.30
mainconcept:	40.01


x264_mp_2pass_640x256_B0_Ref3_p8x8-i4x4_loop_cabac.mp4
moonlight:	75.64
ateme: 		71.90
libav-mplayer:	68.44
nero:		67.89
libav-ffdshow:	64.37
videosoft-new:	60.95
mainconcept:	52.72
videosoft:	31.87


x264_mp_2pass_640x256_B2_Ref5_p4x4-i4x4_loop_cabac.mp4
libav-mplayer:	55.70
moonlight:	54.29
libav-ffdshow:	53.07
ateme: 		49.60
nero:		47.79
mainconcept:	41.60
videosoft-new:	38.97
videosoft:	24.25
BASELINE PROFILE

Code:
x264-r285_bp_720x288_B0_Ref5_p8x8-i4x4_loop-5_wbp_cabac_mp4box.mp4
moonlight:	75.34
libav-mplayer:	72.88
ateme-new:	72.28
ateme:		70.63
videosoft-new:	64.01
libav-ffdshow:	63.47
nero:		61.80
mainconcept:	51.16
videosoft:	33.46
the new vss decoder can definitely keep up in some cases with the others, but is also worse than the others in other cases
its definitely much better than the first version is tested (the last one freely available)

i also found some things:
- it can connect to the nero and haali parser, but doesnt show anything when playing (outputs MPEG2Video)
- it works fine with the avi parser (outputs H264 and VSSH)
- it also works with the ateme mp4 parser (outputs H264)
- b-ref crash the decoder
- high profile is not supported, altough videosoft already has a hp decoder, which i wasnt able to test
__________________
Between the weak and the strong one it is the freedom which oppresses and the law that liberates (Jean Jacques Rousseau)
I know, that I know nothing (Socrates)

MPEG-4 ASP FAQ | AVC/H.264 FAQ | AAC FAQ | MP4 FAQ | MP4Menu stores DVD Menus in MP4 (guide)
Ogg Theora | Ogg Vorbis
use WM9 today and get Micro$oft controlling the A/V market tomorrow for free
bond is offline   Reply With Quote
Old 21st September 2005, 20:07   #38  |  Link
saratoga
Registered User
 
Join Date: Nov 2003
Posts: 34
Quote:
a good old pentium3 866mhz
That CPU lacks SSE2 which is the new standard for floating point calculations on present x86 hardware. Its reasonable to think the SSE2 code is probably better developed and supported as x87 is about to be depreciated. Its possible x87 fp is provided purely as legacy support in some codecs, particularly ones like Nero which are aimed at commerical use.

It would be interesting to see if the relative results change any when you run the same test on an SSE2 capable processor.
saratoga is offline   Reply With Quote
Old 21st September 2005, 21:16   #39  |  Link
akupenguin
x264 developer
 
akupenguin's Avatar
 
Join Date: Sep 2004
Posts: 2,392
Codecs have nothing to do with floating-point. There aren't any x87 or SSE2 instructions at all in libav's H.264 decoder.
akupenguin is offline   Reply With Quote
Old 21st September 2005, 21:55   #40  |  Link
Manao
Registered User
 
Join Date: Jan 2002
Location: France
Posts: 2,856
Indeed.

And, furthermore, only amd64 and P4 have SSE2, and, for most amd64 ( if not all ), SSE2 ops are as slow as their MMX counterparts.

So, basically, SSE2 helps only for P4 and very recent amd64.
__________________
Manao is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 17:52.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.