ffdshow tryouts project: Discussion & Development - Page 523

clsid · 23rd January 2010, 19:28

I had a similar crash on exit problem yesterday when updating the ffmpeg-mt code. After a lot of trial and error I discovered it was related to some members of the struct H264Context (in h264.h), namely list_count/list_counts/ref_count. Just their order mattered for whether it crashed or not. So there may be some kind of alignment issue.

tal.aloni · 23rd January 2010, 22:08

Hey Guys,
It turns out I didn't take into account situations where frames are removed from the source (courtesy of avisynth filter),
to support this, I made another slight change to ffdshow: frames won't be delivered if the image filter does not eventually return S_OK.

however, for various reasons, some filters might not return the proper value.
I'm posting a beta so we could identify the problematic filters / settings before I commit, if you're getting black screens / missing frames, please let me know which filter / config is the cause.

Thanks!
Tal

build:
http://iknowu.net/files/public/ffdsh...eturnValue.exe

patch:
http://iknowu.net/files/public/ffdsh...urnValue.patch

Edit:
found first problem: DGBob.

BatKnight · 23rd January 2010, 23:10

I came across with this same situation and instead of copying other's words, I prefer to quote someone that already tested and provided the results:

Quote:

Originally Posted by el Filou at the SlySoft Forum - ReClock 1.8.6.0 - page 14

- Build number: 20100112 (updated to the last one just to be sure. Previously I used 20091209 and before that another older one that I forgot, but I confirm the behaviour is identical).
- Sources: MP2, MP3, AC3, DTS, some TrueHD test files (marketing material from Dolby), 24-bit FLAC (I normally use madFlac but forced ffdshow for this test) and a 24-bit LPCM track (Blu-ray).
- Config: I use ffdshow only for decoding. All processing plugins are unchecked.
For the test I only change the various int and float output formats in the "Output" section. I tried changing the allowed sample formats in the "Processing" section and it didn't change the results, so leeperry is right: this only affects processing.

Results:

1. everything checked:
- AC3 and DTS: libavcodec 16 int, liba52/libdts 32 float
Everything fine here: the ones that can output float do and libavcodec that is limited to 16 int outputs 16 int.
- MP2 and MP3: libavcodec and mp3lib 16 int, libmad 32 int
libmad doesn't want to output float. I think I've read here that 32 int has higher precision than 32 float, however the decoder surely must compute float values as the algorithm output even if it wants to output int afterwards, right? So libmad shouldn't gain anything outputting 32 int here, right? (I presume it doesn't work in 64 float internally). Weird.
EDIT: I've found on the official website that MAD natively outputs 24 int, so it falls under the same case as TrueHD, FLAC and LPCM: it outputs 24 int and then ffdshow does something with it, now is it padding or converting?
- TrueHD: 32 int. Doesn't make sense as it's not a lossy codec and the source is 24-bit at best so it doesn't gain anything by outputting at higher int precision, on the contrary.
- 24-bit LPCM: 32 int. Makes even less sense than TrueHD as it doesn't have anything to decode, so it's just a useless conversion by ffdshow here.

2. everything checked except 32 int:
TrueHD, LPCM, FLAC and madlib now output 16 int, even though 24 int and 32 float (which should in theory be the best choice for for madlib) (EDIT: MAD actually outputs 24 int) is enabled. Go figure...

3. only 16 int and 24 int checked:
This is to test liba52 and libdts when they can't output float, and they prefer to output 16 int rather than 24 int.

Now I have different possible conclusions:

1. there is absolutely no logic in ffddshow decoders. Some inside libavcodec (AC3 and DTS) are able to output 16 int even if something higher is checked, others always output the best (liba52 and libdts), and others always output the highest precision available even if the source is lossless and of lower precision. And none will output 24 int except when it's the only option available, even the lossy codecs, which could in theory gain something by outputting at 24 int as it would mean smaller rounding errors than 16 int.

2. ffdshow does a transparent internal conversion from what the decoder really outputs to the output format it considers best, without informing the user.
This would explain why everything is output at 32 int, but then wouldn't explain why ffdshow considers 16 int to be better than 24.
And obviously, some decoders (e.g. AC3 and DTS in libavcodec) would be able to bypass this conversion.

3. ffdshow pads 24 int to 32 int.

Now my part:
Would it be possible for ffdshow to output the same format as the input in lossless codecs and try to ouput 32bit float (and fallback to 32bit int or 24bit int, etc) in lossy codecs?
Because as you see in el Filou's point 1 and 2, ffdshow isn't making the most correct choices.

Bat

Mr VacBob · 23rd January 2010, 23:34

Quote:

Originally Posted by clsid

I had a similar crash on exit problem yesterday when updating the ffmpeg-mt code. After a lot of trial and error I discovered it was related to some members of the struct H264Context (in h264.h), namely list_count/list_counts/ref_count. Just their order mattered for whether it crashed or not. So there may be some kind of alignment issue.

It needs to be list_counts, ref_counts, list_count, like it already is, or mt will crash much earlier than when exiting. Or is this something else?

clsid · 23rd January 2010, 23:51

Yes, it works OK with the order that is in your git tree. That order is also used in ffdshow's copy of ffmpeg-mt. I initially used the order that is used in regular ffmpeg. That was when I got the crashes.

Is my assumption correct that this is an alignment issue? Or this there some other hidden bug?

Mr VacBob · 23rd January 2010, 23:56

It's because mt does a memcpy() of some fields between threads, and that one was added in the middle of one of them upstream. But it's not appropriate to copy that one (it's a pointer to a per-frame table) so I moved it up in the context rather than splitting the memcpy up.

CruNcher · 24th January 2010, 07:01

Quote:

Originally Posted by tal.aloni

http://www.mediafire.com/?sharekey=3...e79d7d0540e1e1

old builds > generic > page 2

let's narrow it down to a build, ok?

the crash started with 3217

yep 3223 still crashing

you do File->Close in MPC-HC and bye bye libavcodec.dll crashes it's not bad as the playback is over but it didn't happen before 3217

bigger problem though is that this 60 FPS sample is artifacting with ffdshow DXVA when with MPC-HC DXVA its fine

tal.aloni · 24th January 2010, 08:05

CruNcher, I'll review the modifications again.

also, there is again problem with AviSynth filter, I'm working on it.

Tal

Jeremy Duncan · 24th January 2010, 09:47

Quote:

Originally Posted by tal.aloni

I suggest you update the patch to the latest revision,
post a beta build so people can test,
also, if the patch include multiple functions (like x64 support), it's best to test and commit them one by one.

3223 build and patch

With these newer ffdshow trunk builds you also need to do this to get the benefit of the patch though. At least for mvtools2 frame doubler:
Decoder options tab
Uncheck "Detect soft telecine and average frame durations"
So if somebody says the patch doesn't fix their problem I would ask them if this is unchecked or not. It may be it only needs to be unchecked for mvtools2 frame doubler, but I don't check every plugin and option so I don't know.

I updated the ffdshow wiki: link

albain · 24th January 2010, 19:44

Quote:

Originally Posted by BatKnight

I came across with this same situation and instead of copying other's words, I prefer to quote someone that already tested and provided the results:

Now my part:
Would it be possible for ffdshow to output the same format as the input in lossless codecs and try to ouput 32bit float (and fallback to 32bit int or 24bit int, etc) in lossy codecs?
Because as you see in el Filou's point 1 and 2, ffdshow isn't making the most correct choices.

Bat

It depends also on codecs capabilities : for example libavcodec AC3 decoder is able to output 16 bits only, so you'll get 16 bits whatever you check. Same thing for DTS.
This is the reason why liba52 and libdts are better for AC3/DTS decoding and this is indeed the reason why we keep them inside FFDShow

About libavcodec TrueHD, I think that it is able to output 32 bits.

About FFDShow output logic, I am not very familiar with it but I know that for example if you let 16 bits check, you will get 16 bits only.
Try to test with 24 or 32 bits only

Also there is the windows mixer that should be set to output this sample format.

Reimar · 24th January 2010, 20:08

Quote:

Originally Posted by albain

Try to test with 24 or 32 bits only

libavcodec won't do 24 bit since that is a pain to work with - alignment issues, can't easily use SIMD instructions on it etc.
Ideally the decoders should set AVCodecContext.bits_per_raw_sample to indicate how many bits actually are relevant.
The AC3 decoder in libavcodec should be "trivial" to change to output either float or 32 bit integer (basically just replace the float_to_int16_interleave at the end of ac3_decode_frame). I'm not sure there's actually a point in doing so though.

jruggle · 24th January 2010, 20:59

Quote:

Originally Posted by Reimar

libavcodec won't do 24 bit since that is a pain to work with - alignment issues, can't easily use SIMD instructions on it etc.
Ideally the decoders should set AVCodecContext.bits_per_raw_sample to indicate how many bits actually are relevant.
The AC3 decoder in libavcodec should be "trivial" to change to output either float or 32 bit integer (basically just replace the float_to_int16_interleave at the end of ac3_decode_frame). I'm not sure there's actually a point in doing so though.

It is very trivial, and I have a working patch for it, but it is slower at decoding because the current float-to-int16 conversion combined with channel interleaving is faster than just plain float interleaving.

albain · 24th January 2010, 21:10

Quote:

Originally Posted by jruggle

It is very trivial, and I have a working patch for it, but it is slower at decoding because the current float-to-int16 conversion combined with channel interleaving is faster than just plain float interleaving.

Maybe but sound decoding and especially AC3 is not complex to decode for a CPU, even a non recent.

I think that you should propose your patch to ffmpeg team

In that way we will be able to drop liba52.

Same thing for DCA (DTS)

CruNcher · 24th January 2010, 22:20

Thx tal.aloni for fixing the crashing problem with libavcodec.dll in ffdshow DXVA on close

(that bug is history now)

albain · 24th January 2010, 22:42

I have bad news : I have spent a few hours on multithreading the copy of DXVA buffers into system memory to speed up full postprocessing, with no luck

Things are going a little faster, but that's all.

Either we are doing something wrong (but I begin to doubt it), or else the sense GPU=>CPU gives by designed slow transfers

I hope that one will be able to get in touch with the intel's guy who wrote this article (but I guess that he only tried with low res videos)

Otherwise there is the DXVA HD feature that lets do non linear resize but this is too much work (at least for now)

dann23 · 24th January 2010, 23:18

Quote:

Originally Posted by albain

I have bad news : I have spent a few hours on multithreading the copy of DXVA buffers into system memory to speed up full postprocessing, with no luck

Things are going a little faster, but that's all.

Either we are doing something wrong (but I begin to doubt it), or else the sense GPU=>CPU gives by designed slow transfers

I hope that one will be able to get in touch with the intel's guy who wrote this article (but I guess that he only tried with low res videos)

Otherwise there is the DXVA HD feature that lets do non linear resize but this is too much work (at least for now)

I have some questions. Maybe I'm missing something. You need to copy dxva buffers in memory to enable some filters and this is only for dxva 1. As I know with dxva 2 this is not needed. So are trying to do this because you want to implement dxva 1 in ffdshow? If that's the case then why not implement just subtitles for dxva 1 and use dxva 2 for the other case. And I know that windows xp has just dxva 1 but people are starting to use windows vista/7 so my oppinion is that it's enough to support dxva 1 with subtitles for windows xp.

tal.aloni · 24th January 2010, 23:33

dann23,
AFAIK, copying the frame buffer from the USWC ("GPU memory") is a prerequisite for custom (not part of the API) post-processing in both DXVA 1.0 and 2.0.

BatKnight · 24th January 2010, 23:34

Quote:

Originally Posted by albain

About libavcodec TrueHD, I think that it is able to output 32 bits.

About FFDShow output logic, I am not very familiar with it but I know that for example if you let 16 bits check, you will get 16 bits only..

Yes, TrueHD does output 32bits.
The thing is, that on lossless codecs is always better to output the same as a the source, for example it is better to output TrueHD 24bits if the source is also 24bits. But if the 32bits is ticked, it now outputs 32bit. It should somehow recognize TrueHD and don't output more than the source format. This should be the case for all other lossless codecs.
One may think it's better to decode a lossless codec in a higher precision, but it's not. It could sometimes even be worse.

On the other hand, lossy codecs like DTS, AC3, MP3, etc should always be decoded in the higher precision possible (32bit float when possible) no matter what the source is. leeperry could help me elaborate here if needed...

When one ticks 16bit, 24bit, 32bit and 32bit float, the goal isn't to always output the highest possible, but to identify the source codec and output the best format for that codec.

My question is, what can be done to ffdshow that could identify the type of codec and then choose the appropriate format considering what I just explained. The goal of ffdshow is to achieve the best decoding quality possible, isn't it?

Bat

PS: Another thing I've noticed is that checking LFE Crossover changes from 32bit integer to 32bit float, on a TrueHD track, for example. Why?

Jeremy Duncan · 25th January 2010, 02:14

Quote:

Originally Posted by albain

I have bad news : I have spent a few hours on multithreading the copy of DXVA buffers into system memory to speed up full postprocessing, with no luck

Things are going a little faster, but that's all.

Either we are doing something wrong (but I begin to doubt it), or else the sense GPU=>CPU gives by designed slow transfers

I have a opinion on this problem.
- The cpu reads the stuff from the ram in bits.
- The gpu has it's own hw that acts as a cpu and the gpu has it's own ram on the videocard. But the way the gpu reads the ram is also in bits.

With these two facts agreed on, then integrating them, one for postprocessing the other for dxva, you would need to harmonize the way the read the ram and interact with each other.
I think the problem is they don't mesh together and they are trying to make it primarily cpu or gpu.
To fix this problem let the cpu and gpu sense each other so they don't try and make it either cpu or gpu only.

Jeremy Duncan · 25th January 2010, 02:17

Quote:

Originally Posted by CruNcher

Thx tal.aloni for fixing the crashing problem with libavcodec.dll in ffdshow DXVA on close

(that bug is history now)

Do you have any other bug?

23rd January 2010, 19:28	#10441 \| Link
clsid ***** Join Date: Feb 2005 Posts: 5,646	I had a similar crash on exit problem yesterday when updating the ffmpeg-mt code. After a lot of trial and error I discovered it was related to some members of the struct H264Context (in h264.h), namely list_count/list_counts/ref_count. Just their order mattered for whether it crashed or not. So there may be some kind of alignment issue. __________________ MPC-HC 2.2.1

23rd January 2010, 22:08	#10442 \| Link
tal.aloni Registered User Join Date: Sep 2008 Posts: 496	Hey Guys, It turns out I didn't take into account situations where frames are removed from the source (courtesy of avisynth filter), to support this, I made another slight change to ffdshow: frames won't be delivered if the image filter does not eventually return S_OK. however, for various reasons, some filters might not return the proper value. I'm posting a beta so we could identify the problematic filters / settings before I commit, if you're getting black screens / missing frames, please let me know which filter / config is the cause. Thanks! Tal build: http://iknowu.net/files/public/ffdsh...eturnValue.exe patch: http://iknowu.net/files/public/ffdsh...urnValue.patch Edit: found first problem: DGBob. Last edited by tal.aloni; 23rd January 2010 at 22:27.

23rd January 2010, 23:51	#10445 \| Link
clsid ***** Join Date: Feb 2005 Posts: 5,646	Yes, it works OK with the order that is in your git tree. That order is also used in ffdshow's copy of ffmpeg-mt. I initially used the order that is used in regular ffmpeg. That was when I got the crashes. Is my assumption correct that this is an alignment issue? Or this there some other hidden bug? __________________ MPC-HC 2.2.1

24th January 2010, 22:20	#10454 \| Link
CruNcher Registered User Join Date: Apr 2002 Location: Germany Posts: 4,926	Thx tal.aloni for fixing the crashing problem with libavcodec.dll in ffdshow DXVA on close (that bug is history now) __________________ all my compares are riddles so please try to decipher them yourselves :) It is about Time Join the Revolution NOW before it is to Late ! http://forum.doom9.org/showthread.php?t=168004

Thread Tools	Search this Thread
Show Printable Version Email this Page	Search this Thread: Advanced Search
Display Modes
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode

23rd January 2010, 23:56	#10446 \| Link
Mr VacBob Registered User Join Date: Feb 2005 Posts: 140	It's because mt does a memcpy() of some fields between threads, and that one was added in the middle of one of them upstream. But it's not appropriate to copy that one (it's a pointer to a per-frame table) so I moved it up in the context rather than splitting the memcpy up.

24th January 2010, 08:05	#10448 \| Link
tal.aloni Registered User Join Date: Sep 2008 Posts: 496	CruNcher, I'll review the modifications again. also, there is again problem with AviSynth filter, I'm working on it. Tal

24th January 2010, 22:42	#10455 \| Link
albain Media Control author Join Date: Dec 2006 Location: Paris Posts: 1,014	I have bad news : I have spent a few hours on multithreading the copy of DXVA buffers into system memory to speed up full postprocessing, with no luck Things are going a little faster, but that's all. Either we are doing something wrong (but I begin to doubt it), or else the sense GPU=>CPU gives by designed slow transfers I hope that one will be able to get in touch with the intel's guy who wrote this article (but I guess that he only tried with low res videos) Otherwise there is the DXVA HD feature that lets do non linear resize but this is too much work (at least for now)

24th January 2010, 23:33	#10457 \| Link
tal.aloni Registered User Join Date: Sep 2008 Posts: 496	dann23, AFAIK, copying the frame buffer from the USWC ("GPU memory") is a prerequisite for custom (not part of the API) post-processing in both DXVA 1.0 and 2.0.