View Full Version : AviSynth audio handling
sh0dan
14th August 2002, 11:54
As some of you may know, we are working on some changes in how AVS handles audio, and I would like some developer opinions before implementing and committing anything.
Float implementation:
AviSynth should have internal float samples - it is widely used by developers, since many more advanced features will benefit from the increased precision.
First question: How should range be defined?
Instinctly I would set range to go from -1 to 1, where 1 would translate into 32767 and -1 would translate into -32767, when converted to 16 bits.
This is also how ssrc for instance does it.
Second: Should ALL internal samples be float?
This would simplify matters a lot - so if any filters are applied, the samples are always converted to float, and automatically converted back, before being delivered to the application
Third: Should float type be typedef'd?
IMO a #typedef float SFLOAT would be nice - it could make a port to I64 easier.
Channels
Q:Should number of channels be free?
It could be very useful for an AC3Source() for instance, but will make filters and usage of them a bit more complex. But external plugins could be used for extracting the sound (saving out to wav-files, compress to mp3/ogg - It would be GREAT to have a 'BeSynth' export plugin).
Number of samples limitation:
Would it make sense to modify the number of samples to be in64 (unsigned __int64) instead of (signed) int, as it is now? Again, if the number of samples above 31bits isn't supported by the AVI format, it'll only be useable if there are external export plugins coded.
GetAudio(..) return actual number of samples?
Should GetAudio(...) be able to return fewer samples than requested?
I don't know if there are some cases where a filter would like to return fewer samples than the number requested. I think (but don't know for sure) that some filters (like FFT) prefers to process only windows that are power of two. Does anyone know where such restrictions apply?
Avisynth.h modifications
This is my suggestion for a full-feature avisynth.h modification:
// Modifcations only
struct VideoInfo {
enum {SAMPLE_INT8 = 0, SAMPLE_INT16 = 1, SAMPLE_FLOAT = 2}
int audio_samples_per_second; // 0 means no audio
BYTE sample_type;
in64 num_audio_samples;
int nchannels;
// Most audio functions would have to be recoded, but their functionality will remain the same.
}
class IClip {
virtual void __stdcall GetAudio(void* buf, in64 start, in64 count, IScriptEnvironment* env) = 0; // start and count are in samples
}
Comments/ suggestions?
trbarry
14th August 2002, 17:58
sh0dan -
Very good. :)
I strongly support having lots of AC3 channels (or ANY AC3).
It probably isn't important but I think VS6 does most __int64 math via subroutine call. I don't know have any idea how many samples you would have. Is it close to overflowing in any reasonable circumstance?
- Tom
sh0dan
14th August 2002, 18:21
It currently overflows at 12 hours at 48000 smps/sec - so there is still headroom for most people, but I've seen one person reaching this limit, and that's enough for me to consider changing it.
Yes - int64 is probably slower than int32, but if we keep "count" as an int32 (which is reasonable), there will not be much code involving the int64's (all loops would be 32 bits). Good point!
sh0dan
14th August 2002, 18:38
Oh - I forgot another important question:
Compatebility
Is it ok to break compatebility for external audio plugins?
This will break compatebility with external plugins using audio. I didn't think there were any, but now I think of, we'll most probably break Edwins compressed audio support plugin, and maybe even his Link2 (I have no use for it, so I wont pay to have it). I would convert it myself, if the code was free.
Does anyone know if changing the videoinfo struct will break anything in video plugins, or will they adapt?
DSPguru
14th August 2002, 20:22
Originally posted by sh0dan
How should range be defined?
Instinctly I would set range to go from -1 to 1, where 1 would translate into 32767 and -1 would translate into -32767, when converted to 16 bits.
This is also how ssrc for instance does it.[-32768,32767]^N is very popular, but mostly on integer ("short") codes. i prefer [-1,1) when mode is floating-point..
this is also more logical since the input could be derived from a byte-long PCM sampels [0,255]^n, or long samples..
[-1,1) is always [-1,1), the only thing that changes is the quantization error step.
Second: Should ALL internal samples be float?
This would simplify matters a lot - so if any filters are applied, the samples are always converted to float, and automatically converted back, before being delivered to the applicationyes.
Third: Should float type be typedef'd?
IMO a #typedef float SFLOAT would be nice - it could make a port to I64 easier. yes.
Channels
Q:Should number of channels be free?
It could be very useful for an AC3Source() for instance, but will make filters and usage of them a bit more complex. But external plugins could be used for extracting the sound (saving out to wav-files, compress to mp3/ogg - It would be GREAT to have a 'BeSynth' export plugin). number of channels should be free, yes.
the easiest way to handle this is by working on a non-interleaved buffer of samples. meaning :
SFLOAT PCM**;
and then :PCM=(SFLOAT*)malloc(sizeof(SFLOAT*)*Number_Of_Channels);
for (i=0;i<Number_Of_Channels;i++)
PCM=(SFLOAT)malloc(sizeof(SFLOAT)*Samples_Limit);
Number of samples limitation:
[i]Would it make sense to modify the number of samples to be in64 (unsigned __int64) instead of (signed) int, as it is now? Again, if the number of samples above 31bits isn't supported by the AVI format, it'll only be useable if there are external export plugins coded.nope.
the elegant solution to overcome this uncommon problem, is to have another 'overflow' variable, to act as the exponent of a semi floating-point samples counter.
most of the time, this overflow exponent variable would be equal to 0, and this leads us to where we originally came from.
GetAudio(..) return actual number of samples?
Should GetAudio(...) be able to return fewer samples than requested?
I don't know if there are some cases where a filter would like to return fewer samples than the number requested. I think (but don't know for sure) that some filters (like FFT) prefers to process only windows that are power of two. Does anyone know where such restrictions apply?yes, there are different types of algos - (a) sample_by_sample, (b) block_by_block, (c) Nblock_by_Kblock.
asynchronus support is very important, but could implemented as an add-on layer. (would help to avoid the time-consuming overhead when redundent).
IMHO
Dg.
sh0dan
15th August 2002, 17:39
Dg, thanks for your input - appreciated!
Great to hear that my ramblings completely off.
What are the disadvantages of using __int64 - it just seems so much easier?
For example:
DelayAudio::DelayAudio(double delay, PClip _child)
: GenericVideoFilter(_child), delay_samples(int(delay * vi.audio_samples_per_second + 0.5))
{
vi.num_audio_samples += delay_samples;
}
void DelayAudio::GetAudio(void* buf, int start, int count, IScriptEnvironment* env)
{
child->GetAudio(buf, start-delay_samples, count, env);
}
would become (using __int64):
DelayAudio::DelayAudio(double delay, PClip _child)
: GenericVideoFilter(_child), delay_samples(int(delay * vi.audio_samples_per_second + 0.5))
{
vi.num_audio_samples += (__int64)delay_samples;
}
void DelayAudio::GetAudio(void* buf, __int64 start, int count, IScriptEnvironment* env)
{
child->GetAudio(buf, start-(__int64)delay_samples, count, env);
}
instead of: (using two ints)
DelayAudio::DelayAudio(double delay, PClip _child)
: GenericVideoFilter(_child), delay_samples(int(delay * vi.audio_samples_per_second + 0.5))
{
if ((__int64)vi.num_audio_samples +(__int64)(delay_samples) > MAX_INT) {
vi.num_audio_samples += delay_samples; // Count on overrun - could also be exactly calculated
vi.num_audio_samples_major++;
} else {
vi.num_audio_samples += delay_samples;
}
}
void DelayAudio::GetAudio(void* buf, int start, int major_start, int count, IScriptEnvironment* env)
{
// This would also be very difficult - you see why
child->GetAudio(buf, start-delay_samples, count, env);
}
In my world, using an __int64 is much easier.
dividee
15th August 2002, 21:54
Does anyone know if changing the videoinfo struct will break anything in video plugins, or will they adapt?
If the plugin doesn't touch any audio variable, it should work, as long as you keep all changed struct members after the video related data members (as it is now), since C++ access struct members with an offset from the start of the struct. I don't really know how it access function members (maybe some load-time linking when the dll is attached?).
IMHO, having public data members in an interface is a bad idea. If we only had public functions it would be easier to change internal representation while providing backward compatibility, even video related variables. Unfortunately, doing so now would break all plugins.
To me, __int64 seems easier too for num_audio_samples.
sh0dan
15th August 2002, 22:32
Completely agree - internal datastructures should be completely hidden from filters. I'll take that into consideration, and make all variables private.
Hopefully I'll get some time to try to do a complete conversion in the coming weekend, but I think a 2.0.x stable release should be done before implementing any radical changes, as this is.
DSPguru
16th August 2002, 08:45
forget it :)
i guess my suggestion is only relevant for DSP machines, when the machine's accumlator has an extension of an overflow registers, and has built-in scaling mode with flags like sticky & carry that updats automaticly without the need to execute extra code to know if you're in a 'floating-point' mode or in a 'fixed-point' mode.
anyway,
i'm not sure about execution speed, but working with __int64 could take longer than working with two seperate ints.
on the other hand, my knowledge of pentium's architecture is limited to one 4-days course i've taken 3 years ago. so i could be talking BS..
about a BeSynth plugin,
what would you need from that ?
currently, BeSweet.dll has a input function for mpa & ac3 inputs, i can add an interface for pcm inputs, and i can replace my fwrite() function with AviSynth's callback function.
would that suffice.. ?
WarpEnterprises
16th August 2002, 08:54
@DSPguru, BeSynth plugin: would that what you outlined make AviSynth capable of directly inputting e.g. a mp2-Audiostream (e.g. demuxed from a SVCD)?
This would be really great, together with Nic's new mpegsource we could import mpeg2 without any external programms.
DSPguru
16th August 2002, 08:57
Originally posted by WarpEnterprises
This would be really great, together with Nic's new mpegsource we could import mpeg2 without any external programms. Nic is my primary BeSweet.dll beta-tester. he already integrated BeSweet.dll inside his version of dvd2avi.
i promised him to release an updated version of BeSweet.dll after i release BeSweet v1.4 .
maybe i should release a new dll sooner.. ?
vBulletin® v3.8.5, Copyright ©2000-2012, Jelsoft Enterprises Ltd.