View Full Version : Fast AUDIO RETIMING
Honeyko
7th February 2008, 00:21
Goal of this thread: To find the FASTEST, EASIEST, least system intensive (CPU and drive space needed) means of performing one simple task: Retiming a piece of audio of X duration so that it matches a piece of video of Y duration. Several applications contain presets (such as PAL 25.000 to NTSC 23.976), but they are still often off by several seconds over the course of film.
(Request sticky, because audio retiming is such a kludge in so many applications, if implemented at all.)
==//==
Current project: Retime "tinny" audio streams from "lousy" 25.000 PAL DVDs of older films to mux into proper 23.976fps NTSC films.
Example Problem: 25.000>23.976 results in an audio which is still fifteen seconds too short over the course of the film. (Delaying isn't the issue.)
Desired Solution: A tool which will let me simply enter hh:mm:ss:mmm for both source and output, and have everything it needs in its installation file (or at least give me a series of pop-ups during installation to inform that I need to, say, go to Nero to get this widget).
==//==
Tools examined so far:
BeSweet GUI .07b8 (with BeSweet 1.5b31)
....no ability to simply enter time durations. :mad:
....doesn't warn you if your output is about to overwrite your source. :mad:
....didn't require unspooling audio to bloated .wav files. :cool:
....preset PAL 25.000 to NTSC 23.976 worked fine. :cool:
...."[x] Change Frame Rate" from [25.000] to [23.927]" would not produce an output file (it stopped at 0k). :mad:
Rating: Kludgy. Broke. :mad:
WavePad Masters Edition
....audio editing features. :cool:
....can't input AC3. :mad:
....unspools to .WAV; slow; requires 3gb space for 2hr audio. :mad:
....speed changing doesn't support decimal percentages (i.e., it's useless) :mad:
Rating: :mad:
BeHappy GUI v0.1.9.40818
....no ability to simply enter time durations :mad:
....neat "Preview" button :cool:
....didn't require unspooling audio to bloated .wav files. :cool:
....doesn't come with every widget it needs. :mad: Cryptic error result:Error: System.ApplicationException: Can't start encoder: The system cannot find the file specified ---> System.ComponentModel.Win32Exception: The system cannot find the file specified
at System.Diagnostics.Process.StartWithCreateProcess(ProcessStartInfo startInfo)
at System.Diagnostics.Process.Start()
at BeHappy.Encoder.createEncoderProcess(AviSynthClip x)
--- End of inner exception stack trace ---
at BeHappy.Encoder.createEncoderProcess(AviSynthClip x)
at BeHappy.Encoder.encode()Rating: Kludgy. Broke. :mad:
==//==
So, what's better/new/fixed out there?
Let's turn the frowns upside down!
mahsah
8th February 2008, 21:42
Have you tried BeHappy? It has a function on the main screen that can do just that.
Havn't tried it though, as I am a NTSC user and prefer to spend my time slamming my head against the table over IVTC then over audio speed :)
Atak_Snajpera
8th February 2008, 23:17
Use RipBot264
1) Select Output Speed to AssumeFPS -> 23.976
2) In A/V same length section select STRETCH
That's all :)
Honeyko
9th February 2008, 10:59
Guys? If I do a 25.000 to 23.976 stretch, the resultant output is useless, because as is sometimes the case when "cheap" PAL DVDs are being made of old 24fps Hollywood films, they don't play the master as the correct speed when they sample it -- which means that, if you've found that one DVD in Europe which contains a film's original rare monaural soundtrack (which I'll take any day, out of a nice set of JBL IIs, over distracting surround-sound), and want to mate it to a restored 23.976 print -- you're SOL unless you can specifically dictate a to-the-millisecond retiming of the audio track.
I tried BeHappy, and it didn't come with everything it needed, and didn't tell me what it needed -- so I've shoved that one into a dark corner of my drive for now. I got way too many irons in the fire now to blow half a day on widget-hunts.
RipBot264: Can't open stand-alone audio files.
Mug Funky
9th February 2008, 13:15
there's only 2 reasons why the sound would drift out at rates other than 24->25, or 23.976->25.
either it's a different cut of the film, or it was transferred on a MK1 telecine (these are very very old - '60s i think, wikipedia doesn't help there though. certainly pre 1975), and thus the film speed was not coupled with the video speed.
the solution to either problem is the same - conform both the NTSC and PAL versions to the same time base (i'd say conform the PAL version to 23.976), then actually cut the NTSC version to line up with the PAL version (i assume you're using the PAL audio?). this is slow and painstaking work, but is the only way really, especially considering the drift is probably due to frames lost on reel changes - if you were to change the audio rate by a constant amount, it'd still be out of sync in this case.
when cutting down, the best place to add/remove frames is at scenechanges. that way it'll not be noticed.
the end result will be clean video that fits perfectly with the audio from the dirty version.
note: i had to do this for 105 episodes of Astroboy '60s, and 52 episodes of Kimba the White Lion. it's a pain, but achievable if you make it a labour of love. you will need to get hold of an editing program that doesn't suck too much.
Honeyko
10th February 2008, 05:04
the solution to either problem is the same - conform both the NTSC and PAL versions to the same time base (i'd say conform the PAL version to 23.976), then actually cut the NTSC version to line up with the PAL version (i assume you're using the PAL audio?).
That will not work in the example I describe because after 25.000 to 23.976, I get audio that is *compressed* (not merely cut) fifteen seconds too short over the duration of the film.
I find it difficult to believe that there's not a stretcher/compressor out there that accepts hh.mm.ss.mmm timing and will read in a variety of audio formats.
Honeyko
13th February 2008, 22:29
VirtualDubMod
...changing FPS only changes the rate at which video plays, not audio.
Rating: :mad:
Atak_Snajpera
13th February 2008, 22:37
I find it difficult to believe that there's not a stretcher/compressor out there that accepts hh.mm.ss.mmm timing and will read in a variety of audio formats.
give me some time amigo :)
Honeyko
13th February 2008, 22:41
I'll alpha-test anytime you're ready with the module! :D
Atak_Snajpera
14th February 2008, 13:17
http://www.mediafire.com/?7mm5ylbbovo
Honeyko
18th February 2008, 08:36
http://www.mediafire.com/?7mm5ylbbovo
Man, you are da man. Fantastic little app....
Minor quibbles: Saved-file's time is one second less than requested time. (Whether it's a counting flub, or actually shorter I can't yet tell.)
Hmmm..... How about a version which defaults saves to AC3 190kbps 2-channel stereo? (While I specificied that it didn't matter above, it turns out that those would be the source types about 95% of the time.)
madshi
18th February 2008, 09:20
Goal of this thread: To find the FASTEST, EASIEST, least system intensive (CPU and drive space needed) means of performing one simple task: Retiming a piece of audio of X duration so that it matches a piece of video of Y duration. Several applications contain presets (such as PAL 25.000 to NTSC 23.976), but they are still often off by several seconds over the course of film.
I'm wondering: Don't you care about audio quality? Where is "good quality" in your list of goals? There are good and bad ways to "retime" audio tracks. Sometimes the faster solutions also sound worse.
tebasuna51
18th February 2008, 17:26
Instead AssumeSampleRate().ResampleAudio() BeHappy use:
TimeStretch(rate=(100*Actual_time/Desired_time))
In AviSynth docs:
"Adjusting rate is equivalent to using AssumeSampleRate and ResampleAudio"
but the float value (100*Actual_time/Desired_time) can obtain more precise output than the limited to int value in AssumeSampleRate().
Also TimeStretch(tempo=(100*Actual_time/Desired_time))
can preserve the pitch (slow).
This AviSynth function use SoundTouch library Copyright (c) Olli Parviainen 2002-2006.
Other benefit of use BeHappy is select the decoder. Here the use of DirectShowSource() is problematic because in each PC can have different decoders (ffdshow, Ac3Filter, ...) with different settings (DRC, DialNorm, Downmix, SPDIF_out, ...) in the DirectShow system.
And with BeHappy we can obtain the output in many formats:
multichannel_wav, mono_wav's, ac3, aac, mp3, mp2, ogg, ...
BTW, if you want output ac3 with ReTimeAudio is easy. Add Aften.exe to your package and use the final command line:
wavi audio.avs - | aften output.ac3
The default bitrate for stereo is 192 Kb/s and 448 Kb/s for 5.1
Honeyko
18th February 2008, 19:58
If you have a download link for a BeHappy zip that comes with everything (or redirects, or at least a pop-up warning of missing component) it needs to run properly (instead of just sitting there with a vacant stare or crashing if a widget is absent ), I'd love to try it.
Otherwise, like I said above, I have way too many things to do all lined up to sacrifice huge amounts of time trying to run missing parts down. (That's what "fast" referred to, not render time.) If you have such a link, I'll change ratings in the first post. I might even change "fast" to "easy" so quality-minded readers don't feel slighted.
-- Most tasks I'll be retiming are 192kbps AC3 crappy PAL transfer language dub tracks that weren't properly sequenced in the first place. The amount of stretching, therefore, is not going to be more than 5%, and audio loss shouldn't be noticeable with either process.
....or not?
madshi
18th February 2008, 20:57
The amount of stretching, therefore, is not going to be more than 5%, and audio loss shouldn't be noticeable with either process.
....or not?
Audio quality loss does not have that much to do with the amount of stretching. It doesn't matter much if you stretch by 5% or by 30%. The quality loss depends on which algorithm you're using for stretching and also on the quality of the AC3 encoder you're using. Remember, in order to do all this you need to decode the original AC3 track and then reencode it again to AC3 - which is a lossy process. Double lossy compression is never good. Of course there's nothing you can do about it. But you do have the choice which algorithm to use for stretching.
Honeyko
18th February 2008, 21:17
Well, most of these PAL dub tracks are so horribly mutilated that they couldn't be made to sound worse anyway....
I'm assuming that unspooling AC3 2.0 to wav, generating a 10x-sized file, is lossless. Going back to AC3 2.0, if it follows the same parameters which created it in the first place, will simply attempt to remove parts of the waveform that are already gone anyway (meaning that not much, if anything, is additionally removed). -- or is that complete bollocks on my part? (I'm just trying to reason my way through this.)
The ideal solution, of course, is one which simply stream-copies. (Analogy: Changing video FPS in VirtualDub.) ...or does the structure of contemporary audio formats prohibit that?
tebasuna51
18th February 2008, 21:22
@Honeyko
Use the last Shon3i BeHappy package (http://www.box.net/shared/nkihizx1dh) (2007-03-24 ) with installer and many plugins, DSP functions and encoders.
@madshi
Do you know if quality in r8brain library, used with eac3to, is better than soundtouch library?
Is a free tool for multichannel also?
We can preserve the pitch with r8brain?
madshi
18th February 2008, 22:05
I'm assuming that unspooling AC3 2.0 to wav, generating a 10x-sized file, is lossless. Going back to AC3 2.0, if it follows the same parameters which created it in the first place, will simply attempt to remove parts of the waveform that are already gone anyway (meaning that not much, if anything, is additionally removed).
I don't think it works that way. But then I'm not really an expert in lossless compression. So I don't really know...
The ideal solution, of course, is one which simply stream-copies. (Analogy: Changing video FPS in VirtualDub.)
That's not possible for audio.
madshi
18th February 2008, 22:10
@madshi
Do you know if quality in r8brain library, used with eac3to, is better than soundtouch library?
We can preserve the pitch with r8brain?
soundtouch tries to keep pitch the same, right? r8brain is a "simple" resampler. So pitch changes when using r8brain. That means it doesn't make much sense to compare it to soundtouch cause they do totally different things.
Is a free tool for multichannel also?
I've talked to the r8brain author and told him that I'm using it for multichannel stuff and he's ok with that. I have to do ugly tricks to make r8brain work that way, though. The dll can only do mono or stereo and is not thread safe. So for 5.1 tracks I'm copying the r8brain.dll 6 times, so I have 6 copies of the dll in my process. That's the only way I was able to stream full 5.1 through it.
Honeyko
18th February 2008, 22:24
soundtouch tries to keep pitch the same, right? r8brain is a "simple" resampler. So pitch changes when using r8brain. That means it doesn't make much sense to compare it to soundtouch cause they do totally different things.
Actually, if you're attempting to fix a "tinny" PAL language dub track, then you don't want to keep the pitch. You want it to change (so everybody talks in their natural deeper voice, as per the 23.976fps original).
@Atak_Snajpera....
Make that previous preset request for 192kbps lame MP3 stereo rather than AC3. (Thanks.)
tebasuna51
19th February 2008, 01:20
soundtouch tries to keep pitch the same, right? r8brain is a "simple" resampler. So pitch changes when using r8brain. That means it doesn't make much sense to compare it to soundtouch cause they do totally different things.
From AviSynth docs:
TimeStretch (clip, float "tempo", float "rate", float "pitch", int "sequence", int "seekwindow", int "overlap", bool "quickseek", int "aa")
TimeStretch allows changing the sound tempo, pitch and playback rate parameters independently from each other, i.e.:
* Sound tempo can be increased or decreased while maintaining the original pitch.
* Sound pitch can be increased or decreased while maintaining the original tempo.
* Change playback rate that affects both tempo and pitch at the same time.
* Choose any combination of tempo/pitch/rate.
Using "rate" is equivalent to r8brain, the pitch change with the tempo.
SoundTouch Library (http://www.surina.net/soundtouch/)
madshi
19th February 2008, 08:10
Actually, if you're attempting to fix a "tinny" PAL language dub track, then you don't want to keep the pitch. You want it to change (so everybody talks in their natural deeper voice, as per the 23.976fps original).
Yes, that's absolutely right. Of course first you need to check whether the PAL track does have the mickey mouse effect or not.
madshi
19th February 2008, 08:12
Using "rate" is equivalent to r8brain, the pitch change with the tempo.
In that case I'd expect r8brain to sound slightly better than SoundTouch's "rate" but I've not really compared both myself and I guess that the difference will be rather small.
Honeyko
19th February 2008, 19:55
Yes, that's absolutely right. Of course first you need to check whether the PAL track does have the mickey mouse effect or not.Certainly. (But virtually all older Hollywood films pressed to PAL are simply sped-up. E.g., Jerry Reed sings like he has his balls in a vice in "Smokey and the Bandit", and Burt Reynolds haw-haws like a helium-breathing chipmunk.)
madshi
19th February 2008, 20:18
Certainly. (But virtually all older Hollywood films pressed to PAL are simply sped-up. E.g., Jerry Reed sings like he has his balls in a vice in "Smokey and the Bandit", and Burt Reynolds haw-haws like a helium-breathing chipmunk.)
It's the same way here in Germany (don't know where you are). However, according to tebasuna51 in his country (Spain?) most tracks are pitch corrected.
Honeyko
19th February 2008, 22:32
It's the same way here in Germany (don't know where you are). However, according to tebasuna51 in his country (Spain?) most tracks are pitch corrected....at the cost of permanently marring the language dub track. (It's better, from the stand-point of a purist archiving 23.976fps films, to be able to just stretch a PAL audio track, and get "de-chipmunking" free in the bargain.)
If a 24fps film's DVD publisher would just implement 2:2:2:2:2:2:2:2:2:2:2:3 pulldown (http://en.wikipedia.org/wiki/Telecine#2:2:2:2:2:2:2:2:2:2:2:3_pulldown) for PAL releases, they could use the original audio without any problems since the resultant 25fps video will be of the same duration.
vBulletin® v3.8.4, Copyright ©2000-2009, Jelsoft Enterprises Ltd.